H-DBAS RELEASE 4.0 (Jan. 30, 2009)
Transcripts from H-InvDB 6.0 (DDBJ 73 in origin)
Genomes from UCSC hg18 (human) and mm9 (mouse)
Top       Materials&Methods       Statistics       Download       Operating_manual       Glossary       Links      

Statistics

Representative AS variant (RASV) information about the number, position category, pattern, influence to protein function and conservation with mouse in H-Inv full-length cDNA datast is presented here. RASV information about the number, position catagory pattern and influence to protein function in H-Inv all transcript dataset and that about the number, position category and pattern in Mouse full-length cDNA datase are also presented respectively. Information of splice-site sequence and NAGNAG motif is shown as supplementary in three datasets.

Basic summary of RASVs in three datasets
  H-Inv full-length cDNA H-Inv all transcript Mouse full-length cDNA
Average number of RASVs in AS locus 2.7 3.6 2.9
Average number of AS exons in RASV 2.4 3.2 2.2
Percentage of GT-AG in all splice-site sequences 97.6% 98.4% 99%

Go to H-Inv full-length cDNAs
Go to H-Inv all transcripts
Go to Mouse full-length cDNAs

i) H-Inv full-length cDNAs

Table 1. Number and position category of RASVs
  #Locus #cDNA #Total exon #Alternative exon #Constitutive exon
Representative AS Variants (RASVs) 7801 20803 207399 50399 157000
5'-end 5141 14252 20803 8889 11914
Internal 6868 18400 165793 35162 130631
3'-end 3212 8945 20803 6348 14455
5'UTR* 4021 7259 11867 4572 7295
CDS* 7525 15752 144984 31549 113435
3'UTR* 1815 2385 5346 1768 3578
*They were analyzed by using RASVs whose ORF were identified as full-length.

Table 2. AS pattern of RASVs
    #Locus #cDNA
Cassette (Skipped exon)
[including multiple cassette]
3677 9821
Internal acceptor (Alternative 3' splice) 2269 6170
Internal donor (Alternative 5' splice) 2274 5992
Mutually exclusive 251 608
Retained intron 2287 6180
Alternative first 5'-end 2495 6700
Alternative first acceptor 5'-end 2524 7122
Alternative last 3'-end 952 2512
Alternative last donor 3'-end 1303 3470

Table 3. Numbers in which RASVs influence the possible protein functions
  #Locus #cDNA
AS affecting protein function total 3256 8237
Ptotein motif 2501 6303
GO term 896 2444
Subcelluler localization signal 2054 5435
Transmembrane domain 443 1101
Cpmplex AS pattern total 844 2262
Bridged* 71 298
Nested** 771 1972
Multiple CDS*** 37 79
*Two AS variants were arrayed tandemly without sharing any exons and another transcript 'bridged' them, sharing at least some of its exons with both of them and also sharing the same reading frame to their ORFs.
**CDS region of one AS variant was not shared with another variant.
***Different ORFs >200 bp in length were annotated independently for different AS variants sharing at least some of the exons but not sharing any reading frame.

Table 4. Genomic conservation of RASVs in humans and mice
  Total Non-conserved* Genome-conserved* Transcript-conserved* Equally spliced variant (ESV)* Conserved AS*
All exons 207399 27567 22396 157436 - -
AS exons 50399 11994 8757 29648 - -
AS variants 20803 8459 4469 2381 4995 499
AS loci 7801 1686 1258 815 3817 225
*See Genomic comparison between RASVs and mouse cDNAs.

Table 5. Relationship between conservation and splicing in RASV exons
  Total CDS Protein motif Retrotransposon ESE
C/CS exons* 141427 94751 25504 2686 137390
C/AS exons* 38405 19383 5055 2268 36486
NC/CS exons* 15573 6713 1737 2538 14894
NC/AS exons* 11994 3184 669 4540 11555
All exons 207399 124031 32965 12032 200325
*C: Conserved, NC: Non-conserved, CS: Constitutively spliced, AS: Alternatively spliced

Supplementary Table 1. Splice-site sequence of RASVs
Total GT - AG GC - AG AT - AC Others
186596 182163 1164 173 3096

Supplementary Table 2. NAGNAG motif of RASVs
Motif Observed E type I type E+I type
AAGAAG 64 50 16 2
AAGCAG 216 56 160 0
AAGGAG 274 266 8 0
AAGTAG 28 7 21 0
CAGAAG 1070 1039 31 0
CAGCAG 1003 645 366 8
CAGGAG 3497 3454 43 0
CAGTAG 106 77 29 0
GAGAAG 32 10 22 0
GAGCAG 284 8 276 0
GAGGAG 35 18 19 2
GAGTAG 68 5 63 0
TAGAAG 445 424 21 0
TAGCAG 291 202 89 0
TAGGAG 1532 1518 14 0
TAGTAG 88 65 23 0

ii) H-Inv all transcripts

Table 1. Number and position category of RASVs
  #Locus #Transcript #Total exon #Alternative exon #Constitutive exon
Representative AS Variants (RASVs) 13704 49308 478479 156581 321898
5'-end 8997 34416 49308 22560 26748
Internal 12459 45468 379863 114181 265682
3'-end 7499 29050 49308 19840 29468
5'UTR* 7346 17010 28410 13108 15302
CDS* 13286 37275 334147 100583 233564
3'UTR* 4139 6303 14532 5846 8686
*They were analyzed by using RASVs whose ORF was identified as full-length.

Table 2. AS patterns of RASVs
    #Locus #Transcript
Cassette (Skipped exon)
[including multiple cassette]
7970 30048
Internal acceptor (Alternative 3' splice) 4944 18376
Internal donor (Alternative 5' splice) 5150 18616
Mutually exclusive 915 2968
Retained intron 4303 15033
Alternative first 5'-end 5046 18581
Alternative first acceptor 5'-end 4487 16798
Alternative last 3'-end 3009 10920
Alternative last donor 3'-end 3844 13926

Table 3. Numbers in which RASVs influence the possible protein functions
  #Locus #Transcript
AS affecting protein function total 7215 22891
Protein motif 5858 19286
GO term 2432 7451
Subcelluler localization signal 4669 13550
Transmembrane domain 1179 4124
Cpmplex AS pattern total 2572 9427
Bridged* 509 2678
Nested** 2280 7484
Multiple CDS** 127 332
*Two AS variants were arrayed tandemly without sharing any exons and another transcript 'bridged' them, sharing at least some of its exons with both of them and also sharing the same reading frame to their ORFs.
**CDS region of one AS variant was not shared with another variant.
***Different ORFs >200 bp in length were annotated independently for different AS variants sharing at least some of the exons but not sharing any reading frame.

Supplementary Table 1. Splice site sequence of RASVs
Total GT - AG GC - AG AT - AC Others
1977773 1946428 8080 1525 21740

Supplementary Table 2. NAGNAG motif of RASVs
Motif Observed E type I type E+I type
AAGAAG 463 399 78 14
AAGCAG 1611 450 1164 0
AAGGAG 2020 1985 35 0
AAGTAG 310 85 225 0
CAGAAG 8871 8615 256 0
CAGCAG 7413 4905 2591 83
CAGGAG 27188 26912 276 0
CAGTAG 993 730 263 0
GAGAAG 153 34 119 0
GAGCAG 2131 60 2071 0
GAGGAG 262 160 111 9
GAGTAG 394 24 370 0
TAGAAG 3478 3362 116 0
TAGCAG 2406 1699 707 0
TAGGAG 12317 12239 78 0
TAGTAG 577 434 143 0

iii) Mouse full-length cDNAs

Table 1. Number and position category of RASVs
  #Locus #cDNA #Total exon #Alternative exon #Constitutive exon
Representative AS Variants (RASVs) 10237 29955 272441 65664 206777
5'-end 5306 16137 29955 10228 19727
Internal 8728 25661 212531 43291 169240
3'-end 6436 19997 29955 12145 17810

Table 2. AS pattern of RASVs
    #Locus #cDNA
Cassette (Skipped exon)
[including multiple cassette]
3707 11183
Internal acceptor (Alternative 3' splice) 2041 5797
Internal donor (Alternative 5' splice) 2225 6357
Mutually exclusive 193 561
Retained intron 2716 7615
Alternative first 5'-end 2861 8757
Alternative first acceptor 5'-end 1934 5529
Alternative last 3'-end 2265 6847
Alternative last donor 3'-end 4420 13765

Supplementary Table 1. Splice site sequence of RASVs
Total GT - AG GC - AG AT - AC Others
242486 241133 88 126 1139

Supplementary Table 2. NAGNAG motif of RASVs
Motif Observed E type I type E+I type
AAGAAG 131 108 26 3
AAGCAG 300 91 209 0
AAGGAG 319 309 10 0
AAGTAG 77 24 53 0
CAGAAG 1879 1801 78 0
CAGCAG 1605 1059 568 22
CAGGAG 5160 5113 47 0
CAGTAG 237 181 56 0
GAGAAG 40 12 28 0
GAGCAG 424 17 407 0
GAGGAG 35 24 12 1
GAGTAG 68 1 67 0
TAGAAG 708 671 37 0
TAGCAG 580 382 198 0
TAGGAG 2402 2378 24 0
TAGTAG 108 86 22 0