西亚试剂：All index cases had a normal karyotype

Subjects
All index cases had a normal karyotype, were negative for FMR1 repeat expansion, and in most of these large indels had been excluded using array CGH. The study was approved by all institutional review boards of the participating institutions, and written informed consent was obtained from all participants or their legal guardians.
Methods
For each family, DNA from one affected male was used for constructing a sequencing library using the Illumina Genomic DNA Single End Sample Prep kit (Illumina, San Diego, CA, USA). Enrichment of the X-chromosomal exome was then performed for each library using the Agilent SureSelect Human X Chromosome Kit (Agilent, Santa Clara, CA, USA), which contains 47 657 RNA baits for 7591 exons of 745 genes of the human X chromosome. Single-end deep sequencing was performed on the Illumina Genome Analyzer GAIIx (Illumina, San Diego, CA, USA). Read length was 76 nucleotides. For a subset of families of the second cohort, we performed droplet-based multiplex PCR (7367 amplicons, 757 genes, 1.54 Mb) similarly to the previously described study.23 Paired-end deep sequencing was performed on the HiSeq2000 platform (ATLAS, Berlin, Germay). A scheme outlining the variant discovery workflow is presented in Supplementary Figure 1.
Reads were extracted from qseq-files provided by the Illumina GAII system (Illumina). Reads containing ambiguous base calls were not considered for further analysis. The remaining reads were subsequently mapped to the human reference genome (hg18 without random fragments) with RazerS24 (parameters: -mcl 25 -pa -m 1 -dr 0 -i 93 -s 110101111001100010111 -t 4 -lm) tolerating up to 5 bp differences to the reference sequence per read. Only unique best matches were kept, whereas all remaining reads and those containing indels were subjected to a split mapping procedure of single end reads (SplazerS version 1.0,25 parameters: -m 1 -pa -i 95 -sm 23 -s 111001110011100111 -t 2 -maxG 50000) to detect short insertions (30 bp) and larger deletions (<50 kb). For detecting large insertions/deletions by analyzing changes in depth of coverage along the targeted regions we used ExomeCopy.26 We performed a quality-based clipping of reads after mapping but before calling variants to minimize the number of false-positive calls. Starting from each end of a read with a sliding window of 10 bp we trimmed the read until we observed a window with all 10 phred base quality values >10. If there was a variant within 3 bp distance to the clipped region then the trimming was expanded up to this potential sequencing error. For both mapping procedures (RazerS+SplazerS) the calling of a variant required at least three reads with different mapping coordinates to exclude potential amplification artifacts. Single-nucleotide polymorphisms (SNPs) and short indels (5 bp) were called with snpStore (parameters: -reb 0 -fc 10 -m 1 -mmp -mc 3 -oa -mp 1 -th 0.85 -mmq 10 -hr 0.001 -re -pws 1000), performing a realignment of the clipped mapped reads whenever at least three indel-containing reads were observed within close proximity. For an indel to be called no more than 75% of the spanning reads were allowed to contradict it. For single base variants we used the Maq consensus statistics27 integrated into the snpStore code. Larger deletions and small insertions were identified by examining the split mapping results for potential breakpoint positions. In case of multiple such positions implying varying indel lengths within a 20-bp range such candidate calls were assumed to be unreliable and were therefore discarded. To detect potential retrocopies, the boundaries of split read mappings were compared with known exon boundaries allowing a tolerance of ±5 bp. When both split ends coincided with exon boundaries these exons were defined as being part of a retrocopy event. Completeness of the retrocopy was defined by the highest fraction of exons per transcript for which exon-spanning reads were detected. One example is shown in Supplementary Figure 2. In a parallel approach, we processed the sequencing reads using an alternative software, Medical Resequencing Analysis Pipeline (MERAP), for mapping, variant calling, and annotation.28 Here, the mapping was performed using SOAP2.2029 allowing at most two mismatches. For the calling of single-nucleotide variants (SNVs) and indels a minimum of four reads and a more stringent Phred-like quality score of 20 were required. Finally, only those variants called by both approaches were kept to yield high-confidence candidate variants.

For in silico prioritization of variants, we integrated the following features: (a) gene/transcript annotations (downloaded from UCSC Genome Browser, hg19); (b) known sequence variants from the following data sources: dbSNP, 1000 Genomes project, 200 Danish exomes,30 NHLBI Exome Sequencing Project (ESP6500, version without indels). Base exchanges were considered as 'known' (with exception of SNVs observed as only heterozygous in ESP6500 and 1000 Genomes project) if position and type of the nucleotide were identical to entries in the reference databases. We did not use a cutoff based on minor allele frequency. In case of short indels, a tolerance in positional matching was applied based on repetitiveness of the deleted/inserted sequence in the SNV flanking sequence; (c) variants detected in the screen performed by Tarpey et al.3 were located in transcripts derived from ENSEMBL version 54. We defined the amino-acid coordinate shared by most transcripts of a gene as reference, which is sometimes different from the one annotated by Tarpey et al.3 Conversion of coordinates was successful for 1647 variants; (d) evolutionary conservation across 44 vertebrate species;31 (e) splice site detection for defining potential cryptic splice sites (software NNSplice; cutoff 0.9 (ref. 32)); (f) potential functional impact: PolyPhen2,33 SIFT34 and (g) Human Gene Mutation Database (HGMD): known variants with Pubmed entries were treated as potentially disease causing if they were listed in HGMD Professional and annotated in maximally one reference SNV database.

We thus defined a prioritization score (PS) based on basic, computationally tractable criteria like type of variant or evolutionary conservation. Polyphen2/SIFT produces a categorical output (benign/tolerated, possibly damaging/low confidence, probably damaging/damaging), which was assigned to ordinal variables 1, 2 or 3. Numbers are decreasing with decreasing functional impact, missing values are scored nil. Whenever only one of the methods scored >0, the zero score was set to 1 to avoid underestimation of the functional impact. PhyloP values were rounded down to decimal numbers, values >5 were set to PHY=5, for values <2 PHY= 1, for values <0 PHY= 0. Since deletions/insertions are usually not scored by PolyPhen2/SIFT, we defined the following adhoc weighting scheme: non-sense/frameshift: TYPE=20 (maximal PS), deletions (>50 bp): TYPE=9 (similar to maximal impact prediction by PolyPhen2 and SIFT), duplications, in-frame deletions, potential splice site variants: TYPE=3. The score for a change identified in a gene known to have a role in XLID before this study was set to 3. PS=PP2 * Sift+PHY+TYPE+XLID; if PS>20, PS=20.

We also used CADD (Combined Annotation-Dependent Depletion)35 as an additional tool for annotating and interpreting SNVs as well as small indels (see Supplementary Figure 3 for comparison of the scores).

以上资料由西亚试剂：http://www.xiyashiji.com/ 提供