file fields

The file endpoint uses a file attributes first schema to enable faster search, but contains all the same data as the subject endpoint.

Column names that have a . between words denote that the term after the . is a nested field. Nesting structure can be more easily browsed in the file JSON schema

column_name description data_type
id The 'logical' identifier of the entity in the repository, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. STRING
identifier A 'business' identifier or accession number for the entity, typically as provided by an external system or authority, that persists across implementing systems (i.e. a 'logical' identifier). ARRAY>
identifier.system The system or namespace that defines the identifier. STRING
identifier.value The value of the identifier, as defined by the system. STRING
label Short name or abbreviation for dataset. Maps to rdfs:label. STRING
data_category STRING
data_type STRING
file_format String to identify the full file extension including compression extensions. STRING
associated_project A reference to the Project(s) of which this ResearchSubject is a member. The associated_project may be embedded using the $ref definition or may be a reference to the id for the Project - or a URI expressed as a string to an existing entity. STRING
drs_uri nan STRING
byte_size Size of the file in bytes. Maps to dcat:byteSize. INT64
checksum STRING
data_modality nan STRING
imaging_modality nan STRING
dbgap_accession_number nan STRING
crdc_series_uuid nan STRING
Subject A patient entity captures the study-independent metadata for research subjects. Human research subjects are usually not traceable to a particular person to protect the subject’s privacy. ARRAY>, species STRING, sex STRING, race STRING, ethnicity STRING, days_to_birth INT64, subject_associated_project ARRAY\, vital_status STRING, age_at_death INT64, cause_of_death STRING>>
Subject.id The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. STRING
Subject.identifier A 'business' identifier for the entity, typically as provided by an external system or authority, that persists across implementing systems (i.e. a 'logical' identifier). Uses a specialized, complex 'Identifier' data type to capture information about the source of the business identifier - or a URI expressed as a string to an existing entity. ARRAY>
Subject.identifier.system The system or namespace that defines the identifier. STRING
Subject.identifier.value The value of the identifier, as defined by the system. STRING
Subject.species nan STRING
Subject.sex nan STRING
Subject.race nan STRING
Subject.ethnicity nan STRING
Subject.days_to_birth Per GDC Dictionary, number of days between the date used for index and the date from a person's date of birth represented as a calculated negative number of days. INT64
Subject.subject_associated_project nan ARRAY
Subject.vital_status nan STRING
Subject.age_at_death nan INT64
Subject.cause_of_death nan STRING
ResearchSubject A research subject is the entity of interest in a specific research study or project, typically a human being or an animal, but can also be a device, group of humans or animals, or a tissue sample. Human research subjects are usually not traceable to a particular person to protect the subject’s privacy. This entity plays the role of the case_id in existing data. ARRAY>, member_of_research_project STRING, primary_diagnosis_condition STRING, primary_diagnosis_site STRING, Diagnosis ARRAY\>, primary_diagnosis STRING, age_at_diagnosis INT64, morphology STRING, stage STRING, grade STRING, method_of_diagnosis STRING, Treatment ARRAY\>, treatment_type STRING, treatment_outcome STRING, days_to_treatment_start INT64, days_to_treatment_end INT64, therapeutic_agent STRING, treatment_anatomic_site STRING, treatment_effect STRING, treatment_end_reason STRING, number_of_cycles INT64>>>>>>
ResearchSubject.id The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. For CDA, this is case_id. STRING
ResearchSubject.identifier A 'business' identifier for the entity, typically as provided by an external system or authority, that persists across implementing systems (i.e. a 'logical' identifier). Uses a specialized, complex 'Identifier' data type to capture information about the source of the business identifier - or a URI expressed as a string to an existing entity. ARRAY>
ResearchSubject.identifier.system The system or namespace that defines the identifier. STRING
ResearchSubject.identifier.value The value of the identifier, as defined by the system. STRING
ResearchSubject.member_of_research_project nan STRING
ResearchSubject.primary_diagnosis_condition nan STRING
ResearchSubject.primary_diagnosis_site nan STRING
ResearchSubject.Diagnosis nan ARRAY>, primary_diagnosis STRING, age_at_diagnosis INT64, morphology STRING, stage STRING, grade STRING, method_of_diagnosis STRING, Treatment ARRAY\>, treatment_type STRING, treatment_outcome STRING, days_to_treatment_start INT64, days_to_treatment_end INT64, therapeutic_agent STRING, treatment_anatomic_site STRING, treatment_effect STRING, treatment_end_reason STRING, number_of_cycles INT64>>>>
ResearchSubject.Diagnosis.id The 'logical' identifier of the entity in the repository, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. STRING
ResearchSubject.Diagnosis.identifier A 'business' identifier or accession number for the entity, typically as provided by an external system or authority, that persists across implementing systems (i.e. a 'logical' identifier). ARRAY>
ResearchSubject.Diagnosis.identifier.system The system or namespace that defines the identifier. STRING
ResearchSubject.Diagnosis.identifier.value The value of the identifier, as defined by the system. STRING
ResearchSubject.Diagnosis.primary_diagnosis nan STRING
ResearchSubject.Diagnosis.age_at_diagnosis nan INT64
ResearchSubject.Diagnosis.morphology nan STRING
ResearchSubject.Diagnosis.stage nan STRING
ResearchSubject.Diagnosis.grade nan STRING
ResearchSubject.Diagnosis.method_of_diagnosis nan STRING
ResearchSubject.Diagnosis.Treatment nan ARRAY>, treatment_type STRING, treatment_outcome STRING, days_to_treatment_start INT64, days_to_treatment_end INT64, therapeutic_agent STRING, treatment_anatomic_site STRING, treatment_effect STRING, treatment_end_reason STRING, number_of_cycles INT64>>
ResearchSubject.Diagnosis.Treatment.id The 'logical' identifier of the entity in the repository, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. STRING
ResearchSubject.Diagnosis.Treatment.identifier A 'business' identifier or accession number for the entity, typically as provided by an external system or authority, that persists across implementing systems (i.e. a 'logical' identifier). ARRAY>
ResearchSubject.Diagnosis.Treatment.identifier.system The system or namespace that defines the identifier. STRING
ResearchSubject.Diagnosis.Treatment.identifier.value The value of the identifier, as defined by the system. STRING
ResearchSubject.Diagnosis.Treatment.treatment_type Text name for treatment type; this will ultimately be defined by a common vocabulary STRING
ResearchSubject.Diagnosis.Treatment.treatment_outcome Text name for treatment outcome; this will ultimately be defined by a common vocabulary STRING
ResearchSubject.Diagnosis.Treatment.days_to_treatment_start The date and optionally time that the treatment was started in integer. INT64
ResearchSubject.Diagnosis.Treatment.days_to_treatment_end nan INT64
ResearchSubject.Diagnosis.Treatment.therapeutic_agent nan STRING
ResearchSubject.Diagnosis.Treatment.treatment_anatomic_site nan STRING
ResearchSubject.Diagnosis.Treatment.treatment_effect nan STRING
ResearchSubject.Diagnosis.Treatment.treatment_end_reason nan STRING
ResearchSubject.Diagnosis.Treatment.number_of_cycles nan INT64
Specimen Any material taken as a sample from a biological entity (living or dead), or from a physical object or the environment. Specimens are usually collected as an example of their kind, often for use in some investigation. ARRAY>, associated_project STRING, age_at_collection INT64, primary_disease_type STRING, anatomical_site STRING, source_material_type STRING, specimen_type STRING, derived_from_specimen STRING, derived_from_subject STRING>>
Specimen.id The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system. STRING
Specimen.identifier A 'business' identifier or accession number for the entity, typically as provided by an external system or authority, that persists across implementing systems (i.e. a 'logical' identifier). ARRAY>
Specimen.identifier.system The system or namespace that defines the identifier. STRING
Specimen.identifier.value The value of the identifier, as defined by the system. STRING
Specimen.associated_project nan STRING
Specimen.age_at_collection The age of the Patient when this sample was taken. INT64
Specimen.primary_disease_type nan STRING
Specimen.anatomical_site Per GDC Dictionary, the text term that represents the name of the primary disease site of the submitted tumor sample; recommend dropping tumor; biospecimen_anatomic_site. STRING
Specimen.source_material_type The general kind of material from which the specimen was derived, indicating the physical nature of the source material. STRING
Specimen.specimen_type The high-level type of the specimen, based on its how it has been derived from the original extracted sample. STRING
Specimen.derived_from_specimen A source/parent specimen from which this one was directly derived. STRING
Specimen.derived_from_subject The Patient/ResearchSubject, or Biologically Derived Materal (e.g. a cell line, tissue culture, organoid) from which the specimen was directly or indirectly derived. STRING

Last update: 2022-06-17
Back to top