{ "cells": [ { "cell_type": "markdown", "id": "614a478f", "metadata": {}, "source": [ "# Build a Cohort" ] }, { "cell_type": "markdown", "id": "a8004e87", "metadata": {}, "source": [ "**Example use case:** \n", "\n", "\"alt_text\"\n", "Julia is an oncologist that specializes in female reproductive health. As part of her research, she is interested in using existing data on uterine cancers. If possible, she would like to see multiple datatypes (gross imaging, genomic data, proteomic data, histology) that come from the same patient, so she can look for shared phenotypes to test for their potential as early diagnostics. Julia heard that the Cancer Data Aggregator has made it easy to search across multiple datasets created by NCI, and so has decided to start her search there.\n", "\n" ] }, { "cell_type": "markdown", "id": "e0cadd7d", "metadata": {}, "source": [ "Before Julia does any work, she needs to import several functions from cdapython:\n", "\n", "- `Q` and `query` which power the search\n", "- `columns` which lets us view entity field names\n", "- `unique_terms` which lets view entity field contents\n", "\n", "She also asks cdapython to report it's version so she can be sure she's using the one she means to." ] }, { "cell_type": "code", "execution_count": 1, "id": "a5265d4d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022.6.21\n" ] } ], "source": [ "from cdapython import Q, columns, unique_terms, query\n", "import cdapython\n", "import pandas as pd \n", "print(cdapython.__version__)\n", "Q.set_host_url(\"http://35.192.60.10:8080/\")" ] }, { "cell_type": "markdown", "id": "75eef23e", "metadata": {}, "source": [ "
\n", " \n", "CDA data comes from three sources:\n", " \n", " \n", "The CDA makes this data searchable in four main endpoints:\n", "\n", "\n", "and two endpoints that offer deeper information about data in the researchsubject endpoint:\n", "\n", "Any metadata field can be searched from any endpoint, the only difference between search types is what type of data is returned by default. This means that you can think of the CDA as a really, really enormous spreadsheet full of data. To search this enormous spreadsheet, you'd want select columns, and then filter rows.\n", "
\n" ] }, { "cell_type": "markdown", "id": "391bc9a7", "metadata": {}, "source": [ "\n", " \n", " \n", " Accordingly, to see what search fields are available, Julia starts by using the command `columns`:" ] }, { "cell_type": "code", "execution_count": 2, "id": "ef0dd8e5", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['File.id',\n", " 'File.identifier.system',\n", " 'File.identifier.value',\n", " 'File.label',\n", " 'File.data_category',\n", " 'File.data_type',\n", " 'File.file_format',\n", " 'File.associated_project',\n", " 'File.drs_uri',\n", " 'File.byte_size',\n", " 'File.checksum',\n", " 'File.data_modality',\n", " 'File.imaging_modality',\n", " 'File.dbgap_accession_number',\n", " 'id',\n", " 'identifier.system',\n", " 'identifier.value',\n", " 'species',\n", " 'sex',\n", " 'race',\n", " 'ethnicity',\n", " 'days_to_birth',\n", " 'subject_associated_project',\n", " 'vital_status',\n", " 'age_at_death',\n", " 'cause_of_death',\n", " 'ResearchSubject.id',\n", " 'ResearchSubject.identifier.system',\n", " 'ResearchSubject.identifier.value',\n", " 'ResearchSubject.member_of_research_project',\n", " 'ResearchSubject.primary_diagnosis_condition',\n", " 'ResearchSubject.primary_diagnosis_site',\n", " 'ResearchSubject.Diagnosis.id',\n", " 'ResearchSubject.Diagnosis.identifier.system',\n", " 'ResearchSubject.Diagnosis.identifier.value',\n", " 'ResearchSubject.Diagnosis.primary_diagnosis',\n", " 'ResearchSubject.Diagnosis.age_at_diagnosis',\n", " 'ResearchSubject.Diagnosis.morphology',\n", " 'ResearchSubject.Diagnosis.stage',\n", " 'ResearchSubject.Diagnosis.grade',\n", " 'ResearchSubject.Diagnosis.method_of_diagnosis',\n", " 'ResearchSubject.Diagnosis.Treatment.id',\n", " 'ResearchSubject.Diagnosis.Treatment.identifier.system',\n", " 'ResearchSubject.Diagnosis.Treatment.identifier.value',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_type',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_outcome',\n", " 'ResearchSubject.Diagnosis.Treatment.days_to_treatment_start',\n", " 'ResearchSubject.Diagnosis.Treatment.days_to_treatment_end',\n", " 'ResearchSubject.Diagnosis.Treatment.therapeutic_agent',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_anatomic_site',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_effect',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_end_reason',\n", " 'ResearchSubject.Diagnosis.Treatment.number_of_cycles',\n", " 'ResearchSubject.Specimen.id',\n", " 'ResearchSubject.Specimen.identifier.system',\n", " 'ResearchSubject.Specimen.identifier.value',\n", " 'ResearchSubject.Specimen.associated_project',\n", " 'ResearchSubject.Specimen.age_at_collection',\n", " 'ResearchSubject.Specimen.primary_disease_type',\n", " 'ResearchSubject.Specimen.anatomical_site',\n", " 'ResearchSubject.Specimen.source_material_type',\n", " 'ResearchSubject.Specimen.specimen_type',\n", " 'ResearchSubject.Specimen.derived_from_specimen',\n", " 'ResearchSubject.Specimen.derived_from_subject']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columns().to_list()" ] }, { "cell_type": "markdown", "id": "bd05eba2", "metadata": {}, "source": [ "\n", " \n", " \n", "There are a lot of columns in the CDA data, but Julia is most interested in diagnosis data, so she filters the list to only those:" ] }, { "cell_type": "code", "execution_count": 3, "id": "536970c4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ResearchSubject.primary_diagnosis_condition',\n", " 'ResearchSubject.primary_diagnosis_site',\n", " 'ResearchSubject.Diagnosis.id',\n", " 'ResearchSubject.Diagnosis.identifier.system',\n", " 'ResearchSubject.Diagnosis.identifier.value',\n", " 'ResearchSubject.Diagnosis.primary_diagnosis',\n", " 'ResearchSubject.Diagnosis.age_at_diagnosis',\n", " 'ResearchSubject.Diagnosis.morphology',\n", " 'ResearchSubject.Diagnosis.stage',\n", " 'ResearchSubject.Diagnosis.grade',\n", " 'ResearchSubject.Diagnosis.method_of_diagnosis',\n", " 'ResearchSubject.Diagnosis.Treatment.id',\n", " 'ResearchSubject.Diagnosis.Treatment.identifier.system',\n", " 'ResearchSubject.Diagnosis.Treatment.identifier.value',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_type',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_outcome',\n", " 'ResearchSubject.Diagnosis.Treatment.days_to_treatment_start',\n", " 'ResearchSubject.Diagnosis.Treatment.days_to_treatment_end',\n", " 'ResearchSubject.Diagnosis.Treatment.therapeutic_agent',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_anatomic_site',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_effect',\n", " 'ResearchSubject.Diagnosis.Treatment.treatment_end_reason',\n", " 'ResearchSubject.Diagnosis.Treatment.number_of_cycles']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columns().to_list(filters=\"diagnosis\")" ] }, { "cell_type": "markdown", "id": "a63b4cf0", "metadata": {}, "source": [ "
\n", "\n", "To search the CDA, a user also needs to know what search terms are available. Each column will contain a huge amount of data, so retreiving all of the rows would be overwhelming. Instead, the CDA has a `unique_terms()` function that will return all of the unique values that populate the requested column. Like `columns`, `unique_terms` defaults to giving us an overview of the results, and can be filtered.\n", " \n", "
\n", "\n", "\n", " \n", "Since Julia is interested specificially in uterine cancers, she uses the `unique_terms` function to see what data is available for 'ResearchSubject.Diagnosis.Treatment.treatment_anatomic_site' and 'ResearchSubject.primary_diagnosis_site' to see if 'uterine' appears:" ] }, { "cell_type": "code", "execution_count": 4, "id": "4527dde5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Brain',\n", " 'Cervix',\n", " 'Head - Face Or Neck, Nos',\n", " 'Lymph Node(s) Paraaortic',\n", " 'Other',\n", " 'Pelvis',\n", " 'Spine',\n", " 'Unknown']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unique_terms(\"ResearchSubject.Diagnosis.Treatment.treatment_anatomic_site\").to_list()" ] }, { "cell_type": "code", "execution_count": 5, "id": "740e5955", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['Abdomen',\n", " 'Abdomen, Mediastinum',\n", " 'Adrenal Glands',\n", " 'Adrenal gland',\n", " 'Anus and anal canal',\n", " 'Base of tongue',\n", " 'Bile Duct',\n", " 'Bladder',\n", " 'Bones, joints and articular cartilage of limbs',\n", " 'Bones, joints and articular cartilage of other and unspecified sites',\n", " 'Brain',\n", " 'Breast',\n", " 'Bronchus and lung',\n", " 'Cervix',\n", " 'Cervix uteri',\n", " 'Chest',\n", " 'Chest-Abdomen-Pelvis, Leg, TSpine',\n", " 'Colon',\n", " 'Connective, subcutaneous and other soft tissues',\n", " 'Corpus uteri',\n", " 'Ear',\n", " 'Esophagus',\n", " 'Extremities',\n", " 'Eye and adnexa',\n", " 'Floor of mouth',\n", " 'Gallbladder',\n", " 'Gum',\n", " 'Head',\n", " 'Head and Neck',\n", " 'Head-Neck',\n", " 'Heart, mediastinum, and pleura',\n", " 'Hematopoietic and reticuloendothelial systems',\n", " 'Hypopharynx',\n", " 'Intraocular',\n", " 'Kidney',\n", " 'Larynx',\n", " 'Lip',\n", " 'Liver',\n", " 'Liver and intrahepatic bile ducts',\n", " 'Lung',\n", " 'Lung Phantom',\n", " 'Lymph nodes',\n", " 'Marrow, Blood',\n", " 'Meninges',\n", " 'Mesothelium',\n", " 'Nasal cavity and middle ear',\n", " 'Nasopharynx',\n", " 'Not Reported',\n", " 'Oropharynx',\n", " 'Other and ill-defined digestive organs',\n", " 'Other and ill-defined sites',\n", " 'Other and ill-defined sites in lip, oral cavity and pharynx',\n", " 'Other and ill-defined sites within respiratory system and intrathoracic organs',\n", " 'Other and unspecified female genital organs',\n", " 'Other and unspecified major salivary glands',\n", " 'Other and unspecified male genital organs',\n", " 'Other and unspecified parts of biliary tract',\n", " 'Other and unspecified parts of mouth',\n", " 'Other and unspecified parts of tongue',\n", " 'Other and unspecified urinary organs',\n", " 'Other endocrine glands and related structures',\n", " 'Ovary',\n", " 'Palate',\n", " 'Pancreas',\n", " 'Pancreas ',\n", " 'Pelvis, Prostate, Anus',\n", " 'Penis',\n", " 'Peripheral nerves and autonomic nervous system',\n", " 'Phantom',\n", " 'Prostate',\n", " 'Prostate gland',\n", " 'Rectosigmoid junction',\n", " 'Rectum',\n", " 'Renal pelvis',\n", " 'Retroperitoneum and peritoneum',\n", " 'Skin',\n", " 'Small intestine',\n", " 'Spinal cord, cranial nerves, and other parts of central nervous system',\n", " 'Stomach',\n", " 'Testicles',\n", " 'Testis',\n", " 'Thymus',\n", " 'Thyroid',\n", " 'Thyroid gland',\n", " 'Tonsil',\n", " 'Trachea',\n", " 'Unknown',\n", " 'Ureter',\n", " 'Uterus',\n", " 'Uterus, NOS',\n", " 'Vagina',\n", " 'Various',\n", " 'Various (11 locations)',\n", " 'Vulva']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unique_terms(\"ResearchSubject.primary_diagnosis_site\").to_list()" ] }, { "cell_type": "markdown", "id": "b005036b", "metadata": {}, "source": [ "
\n", " \n", "CDA makes multiple datasets searchable from a common interface, but does not harmonize the data. This means that researchers should review all the terms in a column, and not just choose the first one that fits, as there may be other similar terms available as well.\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "73e6b8dc", "metadata": {}, "source": [ "\n", " \n", "Julia sees that \"treatment_anatomic_site\" does not have 'Uterine', but does have 'Cervix'. She also notes that both 'Uterus' and 'Uterus, NOS' are listed in the \"primary_diagnosis_site\" results. As she was initially looking for \"uterine\", Julia decides to expand her search a bit to account for variable naming schemes. So, she runs a fuzzy match filter on the \"ResearchSubject.primary_diagnosis_site\" for 'uter' as that should cover all variants:" ] }, { "cell_type": "code", "execution_count": 6, "id": "31064125", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Cervix uteri', 'Corpus uteri', 'Uterus', 'Uterus, NOS']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unique_terms(\"ResearchSubject.primary_diagnosis_site\").to_list(filters=\"uter\")" ] }, { "cell_type": "markdown", "id": "9311a49e", "metadata": {}, "source": [ "\n", " \n", "Just to be sure, Julia also searches for any other instances of \"cervix\":" ] }, { "cell_type": "code", "execution_count": 7, "id": "2038a8cf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Cervix', 'Cervix uteri']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unique_terms(\"ResearchSubject.primary_diagnosis_site\").to_list(filters=\"cerv\")" ] }, { "cell_type": "markdown", "id": "29c4de58", "metadata": {}, "source": [ "\n", " \n", "With all her likely terms found, Julia begins to create a search that will get data for all of her terms. She does this by writing a series of `Q` statements that define what rows should be returned from each column. For the \"treatment_anatomic_site\", only one term is of interest, so she uses the `=` operator to get only exact matches:" ] }, { "cell_type": "code", "execution_count": 8, "id": "951fcc8f", "metadata": {}, "outputs": [], "source": [ "Tsite = Q('ResearchSubject.Diagnosis.Treatment.treatment_anatomic_site = \"Cervix\"')" ] }, { "cell_type": "markdown", "id": "12cb5f72", "metadata": {}, "source": [ "\n", " \n", "However, for \"primary_diagnosis_site\", Julia has several terms she wants to search with. Luckily, `Q` also can run fuzzy searches. It can also search more than one term at a time, so Julia writes one big `Q` statement to grab everything that is either 'uter' or 'cerv':" ] }, { "cell_type": "code", "execution_count": 9, "id": "36cfd8a4", "metadata": {}, "outputs": [], "source": [ "Dsite = Q('ResearchSubject.primary_diagnosis_site = \"%uter%\" OR ResearchSubject.primary_diagnosis_site = \"%cerv%\"')" ] }, { "cell_type": "markdown", "id": "349af6f2", "metadata": {}, "source": [ "\n", " \n", "Finally, Julia adds her two queries together into one large one:" ] }, { "cell_type": "code", "execution_count": 10, "id": "9f5f9e4f", "metadata": {}, "outputs": [], "source": [ "ALLDATA = Tsite.OR(Dsite)" ] }, { "cell_type": "markdown", "id": "c1f5cb55", "metadata": {}, "source": [ "\n", " \n", "Now that Julia has a query, she can use it to look for data in any of the CDA endpoints. She starts by getting an overall summary of what data is available using `count`:" ] }, { "cell_type": "code", "execution_count": 11, "id": "355b1706", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3346 ms\n" ] }, { "data": { "text/plain": [ "\n", " QueryID: cd87701d-7844-4410-a04c-3a363eab6ae5\n", " \n", " Offset: 0\n", " Count: 1\n", " Total Row Count: 1\n", " More pages: False\n", " " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ALLDATA.count.run()" ] }, { "cell_type": "markdown", "id": "b7ce25fc", "metadata": {}, "source": [ "\n", " \n", "It seems there's a lot of data that might work for Julias study! Since she is interested in the beginings of cancer, she decides to start by looking at the researchsubject information, since that is where most of the diagnosis information is. She again gets a summary using `count`:" ] }, { "cell_type": "code", "execution_count": 12, "id": "55b0cdeb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3611 ms\n" ] }, { "data": { "text/plain": [ "\n", " QueryID: 7db53011-5f7a-4dad-80a5-5cc2cb332e69\n", " \n", " Offset: 0\n", " Count: 100\n", " Total Row Count: 4867\n", " More pages: True\n", " " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ALLDATA.researchsubject.run()" ] }, { "cell_type": "markdown", "id": "86a323e2", "metadata": {}, "source": [ "\n", " \n", "Browsing the primary_diagnosis_condition data, Julia notices that there are a large number of research subjects that are Adenomas and Adenocarcinomas. Since Julia wants to look for common phenotypes in early cancers, she decides it might be easier to exclude the endocrine related data, as they might have different mechanisms. So she adds a new filter to her query:" ] }, { "cell_type": "code", "execution_count": 13, "id": "0d526198", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3449 ms\n" ] }, { "data": { "text/html": [ "
    total : 3196    \n",
       "
\n" ], "text/plain": [ " total : \u001b[1;36m3196\u001b[0m \n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
   files : 297923   \n",
       "
\n" ], "text/plain": [ " files : \u001b[1;36m297923\u001b[0m \n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
systemcount
PDC104
GDC1918
IDC1174
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
primary_diagnosis_conditioncount
Uterine Corpus Endometrial Carcinoma104
Cystic, Mucinous and Serous Neoplasms487
Squamous Cell Neoplasms609
Complex Mixed and Stromal Neoplasms320
None1175
Myomatous Neoplasms187
Not Reported12
Epithelial Neoplasms, NOS230
Complex Epithelial Neoplasms27
Soft Tissue Tumors and Sarcomas, NOS14
Neoplasms, NOS12
Trophoblastic neoplasms13
Mesonephromas5
Neuroepitheliomatous Neoplasms1
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
primary_diagnosis_sitecount
Uterus, NOS961
Corpus uteri373
Cervix uteri688
Uterus867
Cervix307
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Noadeno = Q('ResearchSubject.primary_diagnosis_condition != \"Adenomas and Adenocarcinomas\"')\n", "\n", "NoAdenoData = ALLDATA.AND(Noadeno)\n", "\n", "NoAdenoData.researchsubject.count.run()" ] }, { "cell_type": "markdown", "id": "40a0191d", "metadata": {}, "source": [ "\n", " \n", "She then previews the actual metadata for researchsubject, subject, and file, to make sure that they have all the information she will need for her work:" ] }, { "cell_type": "code", "execution_count": 14, "id": "d186b837", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3379 ms\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ididentifiermember_of_research_projectprimary_diagnosis_conditionprimary_diagnosis_sitesubject_id
0146bd9db-1645-4950-bd18-de30d0db2487[{'system': 'GDC', 'value': '146bd9db-1645-495...CGCI-HTMCP-CCSquamous Cell NeoplasmsCervix uteriHTMCP-03-06-02138
132e83039-7663-422b-a541-6d9149851560[{'system': 'GDC', 'value': '32e83039-7663-422...GENIE-GRCCComplex Mixed and Stromal NeoplasmsUterus, NOSGENIE-GRCC-4f168dad
237063f74-ccc7-426e-ac1c-ad733f2f7e95[{'system': 'GDC', 'value': '37063f74-ccc7-426...GENIE-UHNEpithelial Neoplasms, NOSCorpus uteriGENIE-UHN-247706
33878f58e-76ba-4480-a784-88505bd464d0[{'system': 'GDC', 'value': '3878f58e-76ba-448...TCGA-UCECCystic, Mucinous and Serous NeoplasmsCorpus uteriTCGA-FI-A2EX
43df6abe2-2123-4bfa-a4e4-88df5f940c04[{'system': 'GDC', 'value': '3df6abe2-2123-4bf...TCGA-CESCSquamous Cell NeoplasmsCervix uteriTCGA-JX-A3PZ
.....................
95fa219ae6-def1-4200-972a-3fd17d688d34[{'system': 'GDC', 'value': 'fa219ae6-def1-420...FM-ADSquamous Cell NeoplasmsCervix uteriAD7747
96fb6f2e38-9281-4085-923c-ef99955fd5ea[{'system': 'GDC', 'value': 'fb6f2e38-9281-408...CGCI-HTMCP-CCSquamous Cell NeoplasmsCervix uteriHTMCP-03-06-02062
9713d72130-604c-4d79-95cc-53c2e25d91b0[{'system': 'GDC', 'value': '13d72130-604c-4d7...TCGA-CESCSquamous Cell NeoplasmsCervix uteriTCGA-ZJ-AAX4
9815d1d0ad-4196-49d1-8eb3-38c75b7db58c[{'system': 'GDC', 'value': '15d1d0ad-4196-49d...GENIE-MSKMyomatous NeoplasmsUterus, NOSGENIE-MSK-P-0005582
991d6f367d-a00d-4bd0-9a8b-0a25e37fc1cd[{'system': 'GDC', 'value': '1d6f367d-a00d-4bd...GENIE-DFCICystic, Mucinous and Serous NeoplasmsUterus, NOSGENIE-DFCI-001660
\n", "

100 rows × 6 columns

\n", "
" ], "text/plain": [ " id \\\n", "0 146bd9db-1645-4950-bd18-de30d0db2487 \n", "1 32e83039-7663-422b-a541-6d9149851560 \n", "2 37063f74-ccc7-426e-ac1c-ad733f2f7e95 \n", "3 3878f58e-76ba-4480-a784-88505bd464d0 \n", "4 3df6abe2-2123-4bfa-a4e4-88df5f940c04 \n", ".. ... \n", "95 fa219ae6-def1-4200-972a-3fd17d688d34 \n", "96 fb6f2e38-9281-4085-923c-ef99955fd5ea \n", "97 13d72130-604c-4d79-95cc-53c2e25d91b0 \n", "98 15d1d0ad-4196-49d1-8eb3-38c75b7db58c \n", "99 1d6f367d-a00d-4bd0-9a8b-0a25e37fc1cd \n", "\n", " identifier \\\n", "0 [{'system': 'GDC', 'value': '146bd9db-1645-495... \n", "1 [{'system': 'GDC', 'value': '32e83039-7663-422... \n", "2 [{'system': 'GDC', 'value': '37063f74-ccc7-426... \n", "3 [{'system': 'GDC', 'value': '3878f58e-76ba-448... \n", "4 [{'system': 'GDC', 'value': '3df6abe2-2123-4bf... \n", ".. ... \n", "95 [{'system': 'GDC', 'value': 'fa219ae6-def1-420... \n", "96 [{'system': 'GDC', 'value': 'fb6f2e38-9281-408... \n", "97 [{'system': 'GDC', 'value': '13d72130-604c-4d7... \n", "98 [{'system': 'GDC', 'value': '15d1d0ad-4196-49d... \n", "99 [{'system': 'GDC', 'value': '1d6f367d-a00d-4bd... \n", "\n", " member_of_research_project primary_diagnosis_condition \\\n", "0 CGCI-HTMCP-CC Squamous Cell Neoplasms \n", "1 GENIE-GRCC Complex Mixed and Stromal Neoplasms \n", "2 GENIE-UHN Epithelial Neoplasms, NOS \n", "3 TCGA-UCEC Cystic, Mucinous and Serous Neoplasms \n", "4 TCGA-CESC Squamous Cell Neoplasms \n", ".. ... ... \n", "95 FM-AD Squamous Cell Neoplasms \n", "96 CGCI-HTMCP-CC Squamous Cell Neoplasms \n", "97 TCGA-CESC Squamous Cell Neoplasms \n", "98 GENIE-MSK Myomatous Neoplasms \n", "99 GENIE-DFCI Cystic, Mucinous and Serous Neoplasms \n", "\n", " primary_diagnosis_site subject_id \n", "0 Cervix uteri HTMCP-03-06-02138 \n", "1 Uterus, NOS GENIE-GRCC-4f168dad \n", "2 Corpus uteri GENIE-UHN-247706 \n", "3 Corpus uteri TCGA-FI-A2EX \n", "4 Cervix uteri TCGA-JX-A3PZ \n", ".. ... ... \n", "95 Cervix uteri AD7747 \n", "96 Cervix uteri HTMCP-03-06-02062 \n", "97 Cervix uteri TCGA-ZJ-AAX4 \n", "98 Uterus, NOS GENIE-MSK-P-0005582 \n", "99 Uterus, NOS GENIE-DFCI-001660 \n", "\n", "[100 rows x 6 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NoAdenoData.researchsubject.run().to_dataframe()" ] }, { "cell_type": "markdown", "id": "086697b3", "metadata": {}, "source": [ "---\n", "\n", "
\n", "\n", "

ResearchSubject Field Definitions

\n", "\n", "A research subject is the entity of interest in a research study, typically a human being or an animal, but can also be a device, group of humans or animals, or a tissue sample. Human research subjects are usually not traceable to a particular person to protect the subject’s privacy. An individual who participates in 3 studies will have 3 researchsubject IDs\n", " \n", " \n", "\n", "
\n", " \n", "---" ] }, { "cell_type": "code", "execution_count": 15, "id": "8d0f5e2f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3404 ms\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ididentifierspeciessexraceethnicitydays_to_birthsubject_associated_projectvital_statusage_at_deathcause_of_death
0AD2728[{'system': 'GDC', 'value': 'AD2728'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
1C3N-01876[{'system': 'IDC', 'value': 'C3N-01876'}]Homo sapiensNoneNoneNoneNaN[cptac_ucec]NoneNaNNone
2GENIE-DFCI-007276[{'system': 'GDC', 'value': 'GENIE-DFCI-007276'}]Homo sapiensfemalewhitenot hispanic or latino-18627.0[GENIE-DFCI]Not ReportedNaNNone
3GENIE-DFCI-009140[{'system': 'GDC', 'value': 'GENIE-DFCI-009140'}]Homo sapiensfemalewhitenot hispanic or latino-24837.0[GENIE-DFCI]Not ReportedNaNNone
4GENIE-DFCI-009144[{'system': 'GDC', 'value': 'GENIE-DFCI-009144'}]Homo sapiensfemalewhitenot hispanic or latino-19723.0[GENIE-DFCI]Not ReportedNaNNone
....................................
95AD14317[{'system': 'GDC', 'value': 'AD14317'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
96AD3008[{'system': 'GDC', 'value': 'AD3008'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
97AD6414[{'system': 'GDC', 'value': 'AD6414'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
98AD7975[{'system': 'GDC', 'value': 'AD7975'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
99C3L-00157[{'system': 'GDC', 'value': 'C3L-00157'}, {'sy...Homo sapiensfemalewhitehispanic or latino-22118.0[CPTAC3-Discovery, CPTAC-3, cptac_ucec]Dead1396.0Cancer Related
\n", "

100 rows × 11 columns

\n", "
" ], "text/plain": [ " id identifier \\\n", "0 AD2728 [{'system': 'GDC', 'value': 'AD2728'}] \n", "1 C3N-01876 [{'system': 'IDC', 'value': 'C3N-01876'}] \n", "2 GENIE-DFCI-007276 [{'system': 'GDC', 'value': 'GENIE-DFCI-007276'}] \n", "3 GENIE-DFCI-009140 [{'system': 'GDC', 'value': 'GENIE-DFCI-009140'}] \n", "4 GENIE-DFCI-009144 [{'system': 'GDC', 'value': 'GENIE-DFCI-009144'}] \n", ".. ... ... \n", "95 AD14317 [{'system': 'GDC', 'value': 'AD14317'}] \n", "96 AD3008 [{'system': 'GDC', 'value': 'AD3008'}] \n", "97 AD6414 [{'system': 'GDC', 'value': 'AD6414'}] \n", "98 AD7975 [{'system': 'GDC', 'value': 'AD7975'}] \n", "99 C3L-00157 [{'system': 'GDC', 'value': 'C3L-00157'}, {'sy... \n", "\n", " species sex race ethnicity days_to_birth \\\n", "0 Homo sapiens female not reported not reported NaN \n", "1 Homo sapiens None None None NaN \n", "2 Homo sapiens female white not hispanic or latino -18627.0 \n", "3 Homo sapiens female white not hispanic or latino -24837.0 \n", "4 Homo sapiens female white not hispanic or latino -19723.0 \n", ".. ... ... ... ... ... \n", "95 Homo sapiens female not reported not reported NaN \n", "96 Homo sapiens female not reported not reported NaN \n", "97 Homo sapiens female not reported not reported NaN \n", "98 Homo sapiens female not reported not reported NaN \n", "99 Homo sapiens female white hispanic or latino -22118.0 \n", "\n", " subject_associated_project vital_status age_at_death \\\n", "0 [FM-AD] Not Reported NaN \n", "1 [cptac_ucec] None NaN \n", "2 [GENIE-DFCI] Not Reported NaN \n", "3 [GENIE-DFCI] Not Reported NaN \n", "4 [GENIE-DFCI] Not Reported NaN \n", ".. ... ... ... \n", "95 [FM-AD] Not Reported NaN \n", "96 [FM-AD] Not Reported NaN \n", "97 [FM-AD] Not Reported NaN \n", "98 [FM-AD] Not Reported NaN \n", "99 [CPTAC3-Discovery, CPTAC-3, cptac_ucec] Dead 1396.0 \n", "\n", " cause_of_death \n", "0 None \n", "1 None \n", "2 None \n", "3 None \n", "4 None \n", ".. ... \n", "95 None \n", "96 None \n", "97 None \n", "98 None \n", "99 Cancer Related \n", "\n", "[100 rows x 11 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NoAdenoData.subject.run().to_dataframe()" ] }, { "cell_type": "markdown", "id": "dec76132", "metadata": {}, "source": [ "---\n", "\n", "
\n", "\n", "

Subject Field Definitions

\n", "\n", "A subject is a specific, unique individual: for e.g. a single human. When consent allows, a given entity will have a single subject ID that can be connected to all their studies and data across all datasets\n", "\n", " \n", " \n", "\n", "
\n", " \n", "---" ] }, { "cell_type": "code", "execution_count": 16, "id": "04e04136", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3746 ms\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ididentifierlabeldata_categorydata_typefile_formatassociated_projectdrs_uribyte_sizechecksumdata_modalityimaging_modalitydbgap_accession_numberresearchsubject_specimen_idresearchsubject_idsubject_id
0d3151fb9-9dd5-470e-b181-4d920f686068[{'system': 'GDC', 'value': 'd3151fb9-9dd5-470...TCGA-B5-A11E-01A-21-A163-20_RPPA_data.tsvProteome ProfilingProtein Expression QuantificationTSVTCGA-UCECdrs://dg.4DFC:d3151fb9-9dd5-470e-b181-4d920f68...22341f44fc349969dda464ddf37f5e1f149f1GenomicNoneNoneTCGA-B5-A11E
12200d48f-d10d-4e0c-aff6-a71958fc2b1b[{'system': 'GDC', 'value': '2200d48f-d10d-4e0...TCGA-A5-A0G9-01A-21-A162-20_RPPA_data.tsvProteome ProfilingProtein Expression QuantificationTSVTCGA-UCECdrs://dg.4DFC:2200d48f-d10d-4e0c-aff6-a71958fc...242858edb8c63f398d0d6dab0655d62b1cd93GenomicNoneNoneTCGA-A5-A0G9
2e6ee1e9e-9c28-4db8-9f7f-3916f5351717[{'system': 'GDC', 'value': 'e6ee1e9e-9c28-4db...TCGA-N7-A4Y5-01A-21-A41P-20_RPPA_data.tsvProteome ProfilingProtein Expression QuantificationTSVTCGA-UCSdrs://dg.4DFC:e6ee1e9e-9c28-4db8-9f7f-3916f535...2202673159e8898216b617ac3e135af51d87eGenomicNoneNoneTCGA-N7-A4Y5
381674772-fd6d-48b6-93b1-fa585d1ed568[{'system': 'GDC', 'value': '81674772-fd6d-48b...49b02eb4-8e31-42cd-a3e7-065611836434.wgs.BRASS...Somatic Structural VariationStructural RearrangementBEDPECPTAC-3drs://dg.4DFC:81674772-fd6d-48b6-93b1-fa585d1e...997764560c17caa67fa25411218ef57101a6GenomicNonephs001287C3L-01307
4c3392a1e-1241-4068-9bca-31fd836148de[{'system': 'GDC', 'value': 'c3392a1e-1241-406...TCGA-BG-A0MA-01A-21-A18Q-20_RPPA_data.tsvProteome ProfilingProtein Expression QuantificationTSVTCGA-UCECdrs://dg.4DFC:c3392a1e-1241-4068-9bca-31fd8361...223243da3113805454ac4fca6482fbaf4b4b1GenomicNoneNoneTCGA-BG-A0MA
...................................................
95b42cdaba-46c8-4a02-b7cf-86ceb5d1f712[{'system': 'GDC', 'value': 'b42cdaba-46c8-4a0...TCGA-HG-A2PA-01A-21-A40H-20_RPPA_data.tsvProteome ProfilingProtein Expression QuantificationTSVTCGA-CESCdrs://dg.4DFC:b42cdaba-46c8-4a02-b7cf-86ceb5d1...22070398e6cca19ff30d932a1a78669254710GenomicNoneNoneTCGA-HG-A2PA
961731c20f-1f10-4f80-8793-99593f81515f[{'system': 'GDC', 'value': '1731c20f-1f10-4f8...TCGA-B5-A11P-01B-21-A18Q-20_RPPA_data.tsvProteome ProfilingProtein Expression QuantificationTSVTCGA-UCECdrs://dg.4DFC:1731c20f-1f10-4f80-8793-99593f81...223448bcbded9fbd5a48a58f77ae1e3ea829fGenomicNoneNoneTCGA-B5-A11P
9736f66d66-f71f-49be-9e51-ac640b826d3f[{'system': 'GDC', 'value': '36f66d66-f71f-49b...TCGA-EY-A1GH-01A-21-A18Q-20_RPPA_data.tsvProteome ProfilingProtein Expression QuantificationTSVTCGA-UCECdrs://dg.4DFC:36f66d66-f71f-49be-9e51-ac640b82...2233852a07dbf0ebfcafeb41958d4a1e2b489GenomicNoneNoneTCGA-EY-A1GH
9859a4c826-87d1-43ab-9b2e-3c6088275fd7[{'system': 'GDC', 'value': '59a4c826-87d1-43a...de3cbd77-822b-4c86-80e8-9be54ca8b324.wgs.BRASS...Somatic Structural VariationStructural RearrangementBEDPECGCI-HTMCP-CCdrs://dg.4DFC:59a4c826-87d1-43ab-9b2e-3c608827...121947c9bcda0d917caf81773efd8e2f827ebbGenomicNonephs000528HTMCP-03-06-02040
99a7c7ba3e-7d9d-4735-afff-22442a0e9a84[{'system': 'GDC', 'value': 'a7c7ba3e-7d9d-473...44c6a116-146f-43fd-aa1e-7e8b1636a722.wgs.BRASS...Somatic Structural VariationStructural RearrangementVCFCGCI-HTMCP-CCdrs://dg.4DFC:a7c7ba3e-7d9d-4735-afff-22442a0e...763415e513d0fae6e32b5425c647c9d8d3ba3GenomicNonephs000528HTMCP-03-06-02144
\n", "

100 rows × 16 columns

\n", "
" ], "text/plain": [ " id \\\n", "0 d3151fb9-9dd5-470e-b181-4d920f686068 \n", "1 2200d48f-d10d-4e0c-aff6-a71958fc2b1b \n", "2 e6ee1e9e-9c28-4db8-9f7f-3916f5351717 \n", "3 81674772-fd6d-48b6-93b1-fa585d1ed568 \n", "4 c3392a1e-1241-4068-9bca-31fd836148de \n", ".. ... \n", "95 b42cdaba-46c8-4a02-b7cf-86ceb5d1f712 \n", "96 1731c20f-1f10-4f80-8793-99593f81515f \n", "97 36f66d66-f71f-49be-9e51-ac640b826d3f \n", "98 59a4c826-87d1-43ab-9b2e-3c6088275fd7 \n", "99 a7c7ba3e-7d9d-4735-afff-22442a0e9a84 \n", "\n", " identifier \\\n", "0 [{'system': 'GDC', 'value': 'd3151fb9-9dd5-470... \n", "1 [{'system': 'GDC', 'value': '2200d48f-d10d-4e0... \n", "2 [{'system': 'GDC', 'value': 'e6ee1e9e-9c28-4db... \n", "3 [{'system': 'GDC', 'value': '81674772-fd6d-48b... \n", "4 [{'system': 'GDC', 'value': 'c3392a1e-1241-406... \n", ".. ... \n", "95 [{'system': 'GDC', 'value': 'b42cdaba-46c8-4a0... \n", "96 [{'system': 'GDC', 'value': '1731c20f-1f10-4f8... \n", "97 [{'system': 'GDC', 'value': '36f66d66-f71f-49b... \n", "98 [{'system': 'GDC', 'value': '59a4c826-87d1-43a... \n", "99 [{'system': 'GDC', 'value': 'a7c7ba3e-7d9d-473... \n", "\n", " label \\\n", "0 TCGA-B5-A11E-01A-21-A163-20_RPPA_data.tsv \n", "1 TCGA-A5-A0G9-01A-21-A162-20_RPPA_data.tsv \n", "2 TCGA-N7-A4Y5-01A-21-A41P-20_RPPA_data.tsv \n", "3 49b02eb4-8e31-42cd-a3e7-065611836434.wgs.BRASS... \n", "4 TCGA-BG-A0MA-01A-21-A18Q-20_RPPA_data.tsv \n", ".. ... \n", "95 TCGA-HG-A2PA-01A-21-A40H-20_RPPA_data.tsv \n", "96 TCGA-B5-A11P-01B-21-A18Q-20_RPPA_data.tsv \n", "97 TCGA-EY-A1GH-01A-21-A18Q-20_RPPA_data.tsv \n", "98 de3cbd77-822b-4c86-80e8-9be54ca8b324.wgs.BRASS... \n", "99 44c6a116-146f-43fd-aa1e-7e8b1636a722.wgs.BRASS... \n", "\n", " data_category data_type \\\n", "0 Proteome Profiling Protein Expression Quantification \n", "1 Proteome Profiling Protein Expression Quantification \n", "2 Proteome Profiling Protein Expression Quantification \n", "3 Somatic Structural Variation Structural Rearrangement \n", "4 Proteome Profiling Protein Expression Quantification \n", ".. ... ... \n", "95 Proteome Profiling Protein Expression Quantification \n", "96 Proteome Profiling Protein Expression Quantification \n", "97 Proteome Profiling Protein Expression Quantification \n", "98 Somatic Structural Variation Structural Rearrangement \n", "99 Somatic Structural Variation Structural Rearrangement \n", "\n", " file_format associated_project \\\n", "0 TSV TCGA-UCEC \n", "1 TSV TCGA-UCEC \n", "2 TSV TCGA-UCS \n", "3 BEDPE CPTAC-3 \n", "4 TSV TCGA-UCEC \n", ".. ... ... \n", "95 TSV TCGA-CESC \n", "96 TSV TCGA-UCEC \n", "97 TSV TCGA-UCEC \n", "98 BEDPE CGCI-HTMCP-CC \n", "99 VCF CGCI-HTMCP-CC \n", "\n", " drs_uri byte_size \\\n", "0 drs://dg.4DFC:d3151fb9-9dd5-470e-b181-4d920f68... 22341 \n", "1 drs://dg.4DFC:2200d48f-d10d-4e0c-aff6-a71958fc... 24285 \n", "2 drs://dg.4DFC:e6ee1e9e-9c28-4db8-9f7f-3916f535... 22026 \n", "3 drs://dg.4DFC:81674772-fd6d-48b6-93b1-fa585d1e... 9977 \n", "4 drs://dg.4DFC:c3392a1e-1241-4068-9bca-31fd8361... 22324 \n", ".. ... ... \n", "95 drs://dg.4DFC:b42cdaba-46c8-4a02-b7cf-86ceb5d1... 22070 \n", "96 drs://dg.4DFC:1731c20f-1f10-4f80-8793-99593f81... 22344 \n", "97 drs://dg.4DFC:36f66d66-f71f-49be-9e51-ac640b82... 22338 \n", "98 drs://dg.4DFC:59a4c826-87d1-43ab-9b2e-3c608827... 121947 \n", "99 drs://dg.4DFC:a7c7ba3e-7d9d-4735-afff-22442a0e... 76341 \n", "\n", " checksum data_modality imaging_modality \\\n", "0 f44fc349969dda464ddf37f5e1f149f1 Genomic None \n", "1 8edb8c63f398d0d6dab0655d62b1cd93 Genomic None \n", "2 73159e8898216b617ac3e135af51d87e Genomic None \n", "3 64560c17caa67fa25411218ef57101a6 Genomic None \n", "4 3da3113805454ac4fca6482fbaf4b4b1 Genomic None \n", ".. ... ... ... \n", "95 398e6cca19ff30d932a1a78669254710 Genomic None \n", "96 8bcbded9fbd5a48a58f77ae1e3ea829f Genomic None \n", "97 52a07dbf0ebfcafeb41958d4a1e2b489 Genomic None \n", "98 c9bcda0d917caf81773efd8e2f827ebb Genomic None \n", "99 5e513d0fae6e32b5425c647c9d8d3ba3 Genomic None \n", "\n", " dbgap_accession_number researchsubject_specimen_id researchsubject_id \\\n", "0 None \n", "1 None \n", "2 None \n", "3 phs001287 \n", "4 None \n", ".. ... ... ... \n", "95 None \n", "96 None \n", "97 None \n", "98 phs000528 \n", "99 phs000528 \n", "\n", " subject_id \n", "0 TCGA-B5-A11E \n", "1 TCGA-A5-A0G9 \n", "2 TCGA-N7-A4Y5 \n", "3 C3L-01307 \n", "4 TCGA-BG-A0MA \n", ".. ... \n", "95 TCGA-HG-A2PA \n", "96 TCGA-B5-A11P \n", "97 TCGA-EY-A1GH \n", "98 HTMCP-03-06-02040 \n", "99 HTMCP-03-06-02144 \n", "\n", "[100 rows x 16 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NoAdenoData.file.run().to_dataframe()" ] }, { "cell_type": "markdown", "id": "8cf9f2d3", "metadata": {}, "source": [ "\n", "---\n", "\n", "
\n", "\n", "

File Field Definitions

\n", "\n", "A file is an information-bearing electronic object that contains a physical embodiment of some information using a particular character encoding.\n", "\n", " \n", " \n", "\n", "
\n", " \n", "---\n" ] }, { "cell_type": "markdown", "id": "ba6aadbe", "metadata": {}, "source": [ "\n", " \n", "Finally, Julia wants to save these results to use for the future. Since the preview dataframes only show the first 100 results of each search, she uses the `paginator` function to get all the data from the subject and researchsubject endpoints into their own dataframes:" ] }, { "cell_type": "code", "execution_count": 17, "id": "c2cec2bc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3286 ms\n" ] } ], "source": [ "researchsubs = NoAdenoData.researchsubject.run()\n", "rsdf = pd.DataFrame()\n", "for i in researchsubs.paginator(to_df=True):\n", " rsdf = pd.concat([rsdf, i])" ] }, { "cell_type": "code", "execution_count": 18, "id": "a1258057", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 3374 ms\n" ] } ], "source": [ "subs = NoAdenoData.subject.run()\n", "subsdf = pd.DataFrame()\n", "for i in subs.paginator(to_df=True):\n", " subsdf = pd.concat([subsdf, i])" ] }, { "cell_type": "code", "execution_count": 19, "id": "04cd73df", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ididentifiermember_of_research_projectprimary_diagnosis_conditionprimary_diagnosis_sitesubject_id
0146bd9db-1645-4950-bd18-de30d0db2487[{'system': 'GDC', 'value': '146bd9db-1645-495...CGCI-HTMCP-CCSquamous Cell NeoplasmsCervix uteriHTMCP-03-06-02138
132e83039-7663-422b-a541-6d9149851560[{'system': 'GDC', 'value': '32e83039-7663-422...GENIE-GRCCComplex Mixed and Stromal NeoplasmsUterus, NOSGENIE-GRCC-4f168dad
237063f74-ccc7-426e-ac1c-ad733f2f7e95[{'system': 'GDC', 'value': '37063f74-ccc7-426...GENIE-UHNEpithelial Neoplasms, NOSCorpus uteriGENIE-UHN-247706
33878f58e-76ba-4480-a784-88505bd464d0[{'system': 'GDC', 'value': '3878f58e-76ba-448...TCGA-UCECCystic, Mucinous and Serous NeoplasmsCorpus uteriTCGA-FI-A2EX
43df6abe2-2123-4bfa-a4e4-88df5f940c04[{'system': 'GDC', 'value': '3df6abe2-2123-4bf...TCGA-CESCSquamous Cell NeoplasmsCervix uteriTCGA-JX-A3PZ
.....................
91TCGA-N9-A4Q7__tcga_ucs[{'system': 'IDC', 'value': 'TCGA-N9-A4Q7'}]tcga_ucsNoneUterusTCGA-N9-A4Q7
92TCGA-QS-A744__tcga_ucec[{'system': 'IDC', 'value': 'TCGA-QS-A744'}]tcga_ucecNoneUterusTCGA-QS-A744
93c64d5576-df00-4772-a3d1-1f8863000750[{'system': 'GDC', 'value': 'c64d5576-df00-477...CGCI-HTMCP-CCSquamous Cell NeoplasmsCervix uteriHTMCP-03-06-02099
94cc500ada-7440-412f-b54c-4966c8098dcb[{'system': 'GDC', 'value': 'cc500ada-7440-412...GENIE-DFCICystic, Mucinous and Serous NeoplasmsUterus, NOSGENIE-DFCI-000331
95d7a75bf5-5189-4978-99d9-fcef91c9fbd2[{'system': 'GDC', 'value': 'd7a75bf5-5189-497...TCGA-CESCSquamous Cell NeoplasmsCervix uteriTCGA-EK-A2R7
\n", "

3196 rows × 6 columns

\n", "
" ], "text/plain": [ " id \\\n", "0 146bd9db-1645-4950-bd18-de30d0db2487 \n", "1 32e83039-7663-422b-a541-6d9149851560 \n", "2 37063f74-ccc7-426e-ac1c-ad733f2f7e95 \n", "3 3878f58e-76ba-4480-a784-88505bd464d0 \n", "4 3df6abe2-2123-4bfa-a4e4-88df5f940c04 \n", ".. ... \n", "91 TCGA-N9-A4Q7__tcga_ucs \n", "92 TCGA-QS-A744__tcga_ucec \n", "93 c64d5576-df00-4772-a3d1-1f8863000750 \n", "94 cc500ada-7440-412f-b54c-4966c8098dcb \n", "95 d7a75bf5-5189-4978-99d9-fcef91c9fbd2 \n", "\n", " identifier \\\n", "0 [{'system': 'GDC', 'value': '146bd9db-1645-495... \n", "1 [{'system': 'GDC', 'value': '32e83039-7663-422... \n", "2 [{'system': 'GDC', 'value': '37063f74-ccc7-426... \n", "3 [{'system': 'GDC', 'value': '3878f58e-76ba-448... \n", "4 [{'system': 'GDC', 'value': '3df6abe2-2123-4bf... \n", ".. ... \n", "91 [{'system': 'IDC', 'value': 'TCGA-N9-A4Q7'}] \n", "92 [{'system': 'IDC', 'value': 'TCGA-QS-A744'}] \n", "93 [{'system': 'GDC', 'value': 'c64d5576-df00-477... \n", "94 [{'system': 'GDC', 'value': 'cc500ada-7440-412... \n", "95 [{'system': 'GDC', 'value': 'd7a75bf5-5189-497... \n", "\n", " member_of_research_project primary_diagnosis_condition \\\n", "0 CGCI-HTMCP-CC Squamous Cell Neoplasms \n", "1 GENIE-GRCC Complex Mixed and Stromal Neoplasms \n", "2 GENIE-UHN Epithelial Neoplasms, NOS \n", "3 TCGA-UCEC Cystic, Mucinous and Serous Neoplasms \n", "4 TCGA-CESC Squamous Cell Neoplasms \n", ".. ... ... \n", "91 tcga_ucs None \n", "92 tcga_ucec None \n", "93 CGCI-HTMCP-CC Squamous Cell Neoplasms \n", "94 GENIE-DFCI Cystic, Mucinous and Serous Neoplasms \n", "95 TCGA-CESC Squamous Cell Neoplasms \n", "\n", " primary_diagnosis_site subject_id \n", "0 Cervix uteri HTMCP-03-06-02138 \n", "1 Uterus, NOS GENIE-GRCC-4f168dad \n", "2 Corpus uteri GENIE-UHN-247706 \n", "3 Corpus uteri TCGA-FI-A2EX \n", "4 Cervix uteri TCGA-JX-A3PZ \n", ".. ... ... \n", "91 Uterus TCGA-N9-A4Q7 \n", "92 Uterus TCGA-QS-A744 \n", "93 Cervix uteri HTMCP-03-06-02099 \n", "94 Uterus, NOS GENIE-DFCI-000331 \n", "95 Cervix uteri TCGA-EK-A2R7 \n", "\n", "[3196 rows x 6 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rsdf # view the researchsubject dataframe" ] }, { "cell_type": "code", "execution_count": 20, "id": "92a6f811", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ididentifierspeciessexraceethnicitydays_to_birthsubject_associated_projectvital_statusage_at_deathcause_of_death
0AD2728[{'system': 'GDC', 'value': 'AD2728'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
1C3N-01876[{'system': 'IDC', 'value': 'C3N-01876'}]Homo sapiensNoneNoneNoneNaN[cptac_ucec]NoneNaNNone
2GENIE-DFCI-007276[{'system': 'GDC', 'value': 'GENIE-DFCI-007276'}]Homo sapiensfemalewhitenot hispanic or latino-18627.0[GENIE-DFCI]Not ReportedNaNNone
3GENIE-DFCI-009140[{'system': 'GDC', 'value': 'GENIE-DFCI-009140'}]Homo sapiensfemalewhitenot hispanic or latino-24837.0[GENIE-DFCI]Not ReportedNaNNone
4GENIE-DFCI-009144[{'system': 'GDC', 'value': 'GENIE-DFCI-009144'}]Homo sapiensfemalewhitenot hispanic or latino-19723.0[GENIE-DFCI]Not ReportedNaNNone
....................................
3TCGA-EY-A72D[{'system': 'GDC', 'value': 'TCGA-EY-A72D'}, {...Homo sapiensfemaleblack or african americannot hispanic or latino-31818.0[TCGA-UCEC, tcga_ucec]AliveNaNNone
4TCGA-IE-A4EH[{'system': 'GDC', 'value': 'TCGA-IE-A4EH'}, {...Homo sapiensfemalewhitenot hispanic or latino-12871.0[tcga_sarc, TCGA-SARC]AliveNaNNone
5TCGA-IS-A3KA[{'system': 'GDC', 'value': 'TCGA-IS-A3KA'}, {...Homo sapiensfemalewhitenot hispanic or latino-26775.0[tcga_sarc, TCGA-SARC]Dead413.0None
6TCGA-NA-A4QY[{'system': 'GDC', 'value': 'TCGA-NA-A4QY'}, {...Homo sapiensfemalewhitenot hispanic or latino-22756.0[tcga_ucs, TCGA-UCS]Dead114.0None
7TCGA-VS-A9V3[{'system': 'GDC', 'value': 'TCGA-VS-A9V3'}, {...Homo sapiensfemalewhitenot reported-22990.0[TCGA-CESC, tcga_cesc]AliveNaNNone
\n", "

2608 rows × 11 columns

\n", "
" ], "text/plain": [ " id identifier \\\n", "0 AD2728 [{'system': 'GDC', 'value': 'AD2728'}] \n", "1 C3N-01876 [{'system': 'IDC', 'value': 'C3N-01876'}] \n", "2 GENIE-DFCI-007276 [{'system': 'GDC', 'value': 'GENIE-DFCI-007276'}] \n", "3 GENIE-DFCI-009140 [{'system': 'GDC', 'value': 'GENIE-DFCI-009140'}] \n", "4 GENIE-DFCI-009144 [{'system': 'GDC', 'value': 'GENIE-DFCI-009144'}] \n", ".. ... ... \n", "3 TCGA-EY-A72D [{'system': 'GDC', 'value': 'TCGA-EY-A72D'}, {... \n", "4 TCGA-IE-A4EH [{'system': 'GDC', 'value': 'TCGA-IE-A4EH'}, {... \n", "5 TCGA-IS-A3KA [{'system': 'GDC', 'value': 'TCGA-IS-A3KA'}, {... \n", "6 TCGA-NA-A4QY [{'system': 'GDC', 'value': 'TCGA-NA-A4QY'}, {... \n", "7 TCGA-VS-A9V3 [{'system': 'GDC', 'value': 'TCGA-VS-A9V3'}, {... \n", "\n", " species sex race ethnicity \\\n", "0 Homo sapiens female not reported not reported \n", "1 Homo sapiens None None None \n", "2 Homo sapiens female white not hispanic or latino \n", "3 Homo sapiens female white not hispanic or latino \n", "4 Homo sapiens female white not hispanic or latino \n", ".. ... ... ... ... \n", "3 Homo sapiens female black or african american not hispanic or latino \n", "4 Homo sapiens female white not hispanic or latino \n", "5 Homo sapiens female white not hispanic or latino \n", "6 Homo sapiens female white not hispanic or latino \n", "7 Homo sapiens female white not reported \n", "\n", " days_to_birth subject_associated_project vital_status age_at_death \\\n", "0 NaN [FM-AD] Not Reported NaN \n", "1 NaN [cptac_ucec] None NaN \n", "2 -18627.0 [GENIE-DFCI] Not Reported NaN \n", "3 -24837.0 [GENIE-DFCI] Not Reported NaN \n", "4 -19723.0 [GENIE-DFCI] Not Reported NaN \n", ".. ... ... ... ... \n", "3 -31818.0 [TCGA-UCEC, tcga_ucec] Alive NaN \n", "4 -12871.0 [tcga_sarc, TCGA-SARC] Alive NaN \n", "5 -26775.0 [tcga_sarc, TCGA-SARC] Dead 413.0 \n", "6 -22756.0 [tcga_ucs, TCGA-UCS] Dead 114.0 \n", "7 -22990.0 [TCGA-CESC, tcga_cesc] Alive NaN \n", "\n", " cause_of_death \n", "0 None \n", "1 None \n", "2 None \n", "3 None \n", "4 None \n", ".. ... \n", "3 None \n", "4 None \n", "5 None \n", "6 None \n", "7 None \n", "\n", "[2608 rows x 11 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subsdf # view the subject dataframe" ] }, { "cell_type": "markdown", "id": "75bcbe86", "metadata": {}, "source": [ "\n", " \n", "Then Julia uses the `id` fields in each result to join them together into one big dataset:" ] }, { "cell_type": "code", "execution_count": 21, "id": "9b7a3383", "metadata": {}, "outputs": [], "source": [ "allmetadata = rsdf.set_index(\"subject_id\").join(subsdf.set_index(\"id\"), lsuffix='resub', rsuffix=\"subject\")\n" ] }, { "cell_type": "code", "execution_count": 22, "id": "a01f8c5c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ididentifierresubmember_of_research_projectprimary_diagnosis_conditionprimary_diagnosis_siteidentifiersubjectspeciessexraceethnicitydays_to_birthsubject_associated_projectvital_statusage_at_deathcause_of_death
AD1000f08e2e9-9983-4204-972f-a630b7ab2c25[{'system': 'GDC', 'value': '0f08e2e9-9983-420...FM-ADSquamous Cell NeoplasmsCervix uteri[{'system': 'GDC', 'value': 'AD100'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
AD10266d9d6cb9-652f-4749-b4c5-aa9e6b80de69[{'system': 'GDC', 'value': '6d9d6cb9-652f-474...FM-ADComplex Mixed and Stromal NeoplasmsUterus, NOS[{'system': 'GDC', 'value': 'AD1026'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
AD10328514fc104-1ee5-4701-8f45-9a011143f1e2[{'system': 'GDC', 'value': '514fc104-1ee5-470...FM-ADSquamous Cell NeoplasmsCervix uteri[{'system': 'GDC', 'value': 'AD10328'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
AD104608c36611d-be2f-432a-afde-e684ab4333ea[{'system': 'GDC', 'value': '8c36611d-be2f-432...FM-ADCystic, Mucinous and Serous NeoplasmsUterus, NOS[{'system': 'GDC', 'value': 'AD10460'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
AD104850ad0fdda-dd96-48df-8edd-e5e471e9f680[{'system': 'GDC', 'value': '0ad0fdda-dd96-48d...FM-ADCystic, Mucinous and Serous NeoplasmsUterus, NOS[{'system': 'GDC', 'value': 'AD10485'}]Homo sapiensfemalenot reportednot reportedNaN[FM-AD]Not ReportedNaNNone
................................................
TCGA-ZJ-AB0HTCGA-ZJ-AB0H__tcga_cesc[{'system': 'IDC', 'value': 'TCGA-ZJ-AB0H'}]tcga_cescNoneCervix[{'system': 'GDC', 'value': 'TCGA-ZJ-AB0H'}, {...Homo sapiensfemalenot reportednot reported-17869.0[TCGA-CESC, tcga_cesc]AliveNaNNone
TCGA-ZJ-AB0Ia4f13656-a941-498a-9ac9-f020ed559b35[{'system': 'GDC', 'value': 'a4f13656-a941-498...TCGA-CESCSquamous Cell NeoplasmsCervix uteri[{'system': 'GDC', 'value': 'TCGA-ZJ-AB0I'}, {...Homo sapiensfemalewhitenot hispanic or latino-9486.0[TCGA-CESC, tcga_cesc]AliveNaNNone
TCGA-ZJ-AB0ITCGA-ZJ-AB0I__tcga_cesc[{'system': 'IDC', 'value': 'TCGA-ZJ-AB0I'}]tcga_cescNoneCervix[{'system': 'GDC', 'value': 'TCGA-ZJ-AB0I'}, {...Homo sapiensfemalewhitenot hispanic or latino-9486.0[TCGA-CESC, tcga_cesc]AliveNaNNone
TCGA-ZX-AA5X4756acc0-4e96-44d4-b359-04d64dc7eb84[{'system': 'GDC', 'value': '4756acc0-4e96-44d...TCGA-CESCSquamous Cell NeoplasmsCervix uteri[{'system': 'GDC', 'value': 'TCGA-ZX-AA5X'}, {...Homo sapiensfemalewhitenot hispanic or latino-23440.0[TCGA-CESC, tcga_cesc]AliveNaNNone
TCGA-ZX-AA5XTCGA-ZX-AA5X__tcga_cesc[{'system': 'IDC', 'value': 'TCGA-ZX-AA5X'}]tcga_cescNoneCervix[{'system': 'GDC', 'value': 'TCGA-ZX-AA5X'}, {...Homo sapiensfemalewhitenot hispanic or latino-23440.0[TCGA-CESC, tcga_cesc]AliveNaNNone
\n", "

3196 rows × 15 columns

\n", "
" ], "text/plain": [ " id \\\n", "AD100 0f08e2e9-9983-4204-972f-a630b7ab2c25 \n", "AD1026 6d9d6cb9-652f-4749-b4c5-aa9e6b80de69 \n", "AD10328 514fc104-1ee5-4701-8f45-9a011143f1e2 \n", "AD10460 8c36611d-be2f-432a-afde-e684ab4333ea \n", "AD10485 0ad0fdda-dd96-48df-8edd-e5e471e9f680 \n", "... ... \n", "TCGA-ZJ-AB0H TCGA-ZJ-AB0H__tcga_cesc \n", "TCGA-ZJ-AB0I a4f13656-a941-498a-9ac9-f020ed559b35 \n", "TCGA-ZJ-AB0I TCGA-ZJ-AB0I__tcga_cesc \n", "TCGA-ZX-AA5X 4756acc0-4e96-44d4-b359-04d64dc7eb84 \n", "TCGA-ZX-AA5X TCGA-ZX-AA5X__tcga_cesc \n", "\n", " identifierresub \\\n", "AD100 [{'system': 'GDC', 'value': '0f08e2e9-9983-420... \n", "AD1026 [{'system': 'GDC', 'value': '6d9d6cb9-652f-474... \n", "AD10328 [{'system': 'GDC', 'value': '514fc104-1ee5-470... \n", "AD10460 [{'system': 'GDC', 'value': '8c36611d-be2f-432... \n", "AD10485 [{'system': 'GDC', 'value': '0ad0fdda-dd96-48d... \n", "... ... \n", "TCGA-ZJ-AB0H [{'system': 'IDC', 'value': 'TCGA-ZJ-AB0H'}] \n", "TCGA-ZJ-AB0I [{'system': 'GDC', 'value': 'a4f13656-a941-498... \n", "TCGA-ZJ-AB0I [{'system': 'IDC', 'value': 'TCGA-ZJ-AB0I'}] \n", "TCGA-ZX-AA5X [{'system': 'GDC', 'value': '4756acc0-4e96-44d... \n", "TCGA-ZX-AA5X [{'system': 'IDC', 'value': 'TCGA-ZX-AA5X'}] \n", "\n", " member_of_research_project \\\n", "AD100 FM-AD \n", "AD1026 FM-AD \n", "AD10328 FM-AD \n", "AD10460 FM-AD \n", "AD10485 FM-AD \n", "... ... \n", "TCGA-ZJ-AB0H tcga_cesc \n", "TCGA-ZJ-AB0I TCGA-CESC \n", "TCGA-ZJ-AB0I tcga_cesc \n", "TCGA-ZX-AA5X TCGA-CESC \n", "TCGA-ZX-AA5X tcga_cesc \n", "\n", " primary_diagnosis_condition primary_diagnosis_site \\\n", "AD100 Squamous Cell Neoplasms Cervix uteri \n", "AD1026 Complex Mixed and Stromal Neoplasms Uterus, NOS \n", "AD10328 Squamous Cell Neoplasms Cervix uteri \n", "AD10460 Cystic, Mucinous and Serous Neoplasms Uterus, NOS \n", "AD10485 Cystic, Mucinous and Serous Neoplasms Uterus, NOS \n", "... ... ... \n", "TCGA-ZJ-AB0H None Cervix \n", "TCGA-ZJ-AB0I Squamous Cell Neoplasms Cervix uteri \n", "TCGA-ZJ-AB0I None Cervix \n", "TCGA-ZX-AA5X Squamous Cell Neoplasms Cervix uteri \n", "TCGA-ZX-AA5X None Cervix \n", "\n", " identifiersubject species \\\n", "AD100 [{'system': 'GDC', 'value': 'AD100'}] Homo sapiens \n", "AD1026 [{'system': 'GDC', 'value': 'AD1026'}] Homo sapiens \n", "AD10328 [{'system': 'GDC', 'value': 'AD10328'}] Homo sapiens \n", "AD10460 [{'system': 'GDC', 'value': 'AD10460'}] Homo sapiens \n", "AD10485 [{'system': 'GDC', 'value': 'AD10485'}] Homo sapiens \n", "... ... ... \n", "TCGA-ZJ-AB0H [{'system': 'GDC', 'value': 'TCGA-ZJ-AB0H'}, {... Homo sapiens \n", "TCGA-ZJ-AB0I [{'system': 'GDC', 'value': 'TCGA-ZJ-AB0I'}, {... Homo sapiens \n", "TCGA-ZJ-AB0I [{'system': 'GDC', 'value': 'TCGA-ZJ-AB0I'}, {... Homo sapiens \n", "TCGA-ZX-AA5X [{'system': 'GDC', 'value': 'TCGA-ZX-AA5X'}, {... Homo sapiens \n", "TCGA-ZX-AA5X [{'system': 'GDC', 'value': 'TCGA-ZX-AA5X'}, {... Homo sapiens \n", "\n", " sex race ethnicity days_to_birth \\\n", "AD100 female not reported not reported NaN \n", "AD1026 female not reported not reported NaN \n", "AD10328 female not reported not reported NaN \n", "AD10460 female not reported not reported NaN \n", "AD10485 female not reported not reported NaN \n", "... ... ... ... ... \n", "TCGA-ZJ-AB0H female not reported not reported -17869.0 \n", "TCGA-ZJ-AB0I female white not hispanic or latino -9486.0 \n", "TCGA-ZJ-AB0I female white not hispanic or latino -9486.0 \n", "TCGA-ZX-AA5X female white not hispanic or latino -23440.0 \n", "TCGA-ZX-AA5X female white not hispanic or latino -23440.0 \n", "\n", " subject_associated_project vital_status age_at_death \\\n", "AD100 [FM-AD] Not Reported NaN \n", "AD1026 [FM-AD] Not Reported NaN \n", "AD10328 [FM-AD] Not Reported NaN \n", "AD10460 [FM-AD] Not Reported NaN \n", "AD10485 [FM-AD] Not Reported NaN \n", "... ... ... ... \n", "TCGA-ZJ-AB0H [TCGA-CESC, tcga_cesc] Alive NaN \n", "TCGA-ZJ-AB0I [TCGA-CESC, tcga_cesc] Alive NaN \n", "TCGA-ZJ-AB0I [TCGA-CESC, tcga_cesc] Alive NaN \n", "TCGA-ZX-AA5X [TCGA-CESC, tcga_cesc] Alive NaN \n", "TCGA-ZX-AA5X [TCGA-CESC, tcga_cesc] Alive NaN \n", "\n", " cause_of_death \n", "AD100 None \n", "AD1026 None \n", "AD10328 None \n", "AD10460 None \n", "AD10485 None \n", "... ... \n", "TCGA-ZJ-AB0H None \n", "TCGA-ZJ-AB0I None \n", "TCGA-ZJ-AB0I None \n", "TCGA-ZX-AA5X None \n", "TCGA-ZX-AA5X None \n", "\n", "[3196 rows x 15 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "allmetadata" ] }, { "cell_type": "markdown", "id": "024da831", "metadata": {}, "source": [ "\n", " \n", "And saves it out to a csv so she can browse it with Excel:" ] }, { "cell_type": "code", "execution_count": 23, "id": "b6628de4", "metadata": {}, "outputs": [], "source": [ "allmetadata.to_csv(\"allmetadata.csv\")" ] }, { "cell_type": "markdown", "id": "246644d3", "metadata": {}, "source": [ "\n", " \n", "Julia knows from her subject count summary that there are 33480 files associated with her subjects, which is likely far more than she needs. To help her decide what files she wants, Julia uses endpoint chaining to get summary information about the files that are assigned to researchsubjects for her search criteria\n" ] }, { "cell_type": "code", "execution_count": 26, "id": "ae1ae079", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Getting results from database\n",
       "\n",
       "
\n" ], "text/plain": [ "Getting results from database\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\"timestamp\":1656003440040,\"status\":404,\"error\":\"Not \n",
       "Found\",\"path\":\"//api/v1/researchsubjects/files/counts/all_Subjects_v3_0_w_RS\"}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\u001b[32m\"timestamp\"\u001b[0m:\u001b[1;36m1656003440040\u001b[0m,\u001b[32m\"status\"\u001b[0m:\u001b[1;36m404\u001b[0m,\u001b[32m\"error\"\u001b[0m:\u001b[32m\"Not \u001b[0m\n", "\u001b[32mFound\"\u001b[0m,\u001b[32m\"path\"\u001b[0m:\u001b[32m\"//api/v1/researchsubjects/files/counts/all_Subjects_v3_0_w_RS\"\u001b[0m\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total execution time: 57 ms\n" ] } ], "source": [ "NoAdenoData.researchsubject.file.count.run()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.13" } }, "nbformat": 4, "nbformat_minor": 5 }