Download Files
A tab separated table of all the point mutations in the Cosmic Cell Lines Project from the current release.
File Description
[column number:label] Heading
[1:A] Gene name
– The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.
[2:B] Accession Number
– The transcript identifier of the gene.
[3:C] Gene CDS length
– Length of the gene (base pair) counts.
[4:D] HGNC id
– if gene is in HGNC, this id helps linking it to HGNC.
[5:E] Sample name,Sample id,Id tumour
– A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.
[8:H] Primary Site
– The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.
[9:I] Site Subtype 1
– Further sub classification (level 1) of the samples tissue of origin.
[10:J] Site Subtype 2
– Further sub classification (level 2) of the samples tissue of origin.
[11:K] Site Subtype 3
– Further sub classification (level 3) of the samples tissue of origin.
[12:L] Primary Histology
– The histological classification of the sample.
[13:M] Histology Subtype 1
– Further histological classification (level 1) of the sample.
[14:N] Histology Subtype 2
– Further histological classification (level 2) of the sample.
[15:O] Histology Subtype 3
– Further histological classification (level 3) of the sample.
[16:P] Genome-wide screen
– If the entire genome/exome has been sequenced.
[17:Q] GENOMIC_MUTATION_ID
– Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.
[18:R] LEGACY_MUTATION_ID
– Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.
[19:S] MUTATION_ID
– An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.
[20:T] Mutation CDS
– The change that has occurred in the nucleotide sequence. Formatting is identical to the method used for the peptide sequence.
[21:U] Mutation AA
– The change that has occurred in the peptide sequence. Formatting is based on the recommendations made by the Human Genome Variation Society. The description of each type can be found by following the link to Mutation Overview page.
[22:V] Mutation Description
– Type of mutation at the amino acid level (substitution, deletion, insertion, complex, fusion etc.)
[23:W] Mutation zygosity
– Information on whether the mutation was reported to be homozygous , heterozygous or unknown within the sample.
[24:X] LOH
– LOH Information on whether the gene was reported to have loss of heterozygosity in the sample: yes, no or unknown.
[25:Y] GRCh
– The coordinate system used –
37 = GRCh37/Hg19
38 = GRCh38/Hg38
[26:Z] Mutation genome position
– The genomic coordinates of the mutation.
[27:AA] Mutation strand
– Positive or negative.
[28:AB] Mutation somatic status
– Information on whether the mutation was reported to be Confirmed Somatic, Previously Reported or Variant of unknown origin –
Variant of unknown origin = known to be somatic but the tumour was sequenced without a matched normal.
Confirmed Somatic = confimed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient.
Previously observed = mutation reported as somatic previously but not in the current paper.
[29:AC] Mutation Verificastion Status
– Information on whether the mutation has been validate –
Unverified = has not been reported in other datasets.
Verified = reported in other datasets including by the capilliary sequencing of the sample.
[30:AD] Pubmed_PMID
– The PUBMED ID for the paper that the sample was noted in, linking to pubmed to provide more details of the publication.
[31:AE] Study ID
– The Study ID for the sample.
[32:AF] Institute,Institute Address,Catalogue Number
– Availability details (cell line supplier).
[35:AI] Sample Type,Tumour origin
– where the sample has originated from including the tumour type.
[37:AK] Age
– Age of the sample (if this information is provided with the publications).
[38:AL] HGVSP
– Human Genome Variation Society peptide syntax.
[39:AM] HGVSC
– Human Genome Variation Society coding dna sequence syntax (CDS).
[40:AN] HGVSG
– Human Genome Variation Society genomic syntax (3′ shifted).