GPCR datasets

— Datasets used in the study —

Each dataset is available in three formats:
the list of accession numbers, sequences in FASTA format, and sequences in SwissProt format.

[Datasets used for within- & between-class tests]

See Tables 2 for the dataset description

Class A training dataset: 200 entries (AC#, FASTA, SP)
Class A test dataset: 200 entries (AC#, FASTA, SP)
Non-Class A training dataset: 81 entries (AC#, FASTA, SP)
Non-Class A test dataset: 81 entries (AC#, FASTA, SP)
Non-GPCR training dataset: 210 entries (AC#, FASTA)
Non-GPCR test dataset: 210 entries (AC#, FASTA)

(For Non-GPCR datasets, current ID's as long as the original ones are listed. See the note below.)

[Datasets used for Class A analysis]

See Tables 4 for the dataset description

NOTE: The sequence data used in this study were originally obtained in 2004 from GPCRDB (for positives) and SwissProt (for negatives). Some sequences may have been changed in these databases since then. For the most recent version, see each database:
- GPCRDB: Information system for G protein-coupled receptors (GPCRs)
- Swiss-Prot: Protein knowledgebase