MyDisease.info documentation

Introduction

mydisease.info

MyDisease.info provides simple-to-use REST web services to query/retrieve all disease annotation data. It’s designed with an emphasis on simplicity and performance.

Quick start

MyDisease.info provides two simple web services: one for querying disease objects and the other for disease annotation retrieval by common IDs (e.g. mondo, doid, etc.). Both return results in JSON format.

Disease query service

URL

http://mydisease.info/v1/query

Examples

http://mydisease.info/v1/query?q=GIST
http://mydisease.info/v1/query?q=_exists_:ctd
http://mydisease.info/v1/query?q=q=disgenet.genes_related_to_disease.gene_name:OFD1&fields=disgenet

Hint

View nicely formatted JSON result in your browser with this handy add-on: JSON formatter for Chrome or JSONView for Firefox.

To learn more

Disease annotation service

URL

http://mydisease.info/v1/disease/<disease_id>

<disease_id> can be any one of the following common disease identifiers:

  • MONDO,
  • Disease Ontology ID <https://disease-ontology.org/>.

Examples

http://mydisease.info/v1/disease/MONDO:0016575
http://mydisease.info/v1/disease/MONDO:0020753?fields=mondo
http://mydisease.info/v1/disease/MONDO:0011996?fields=disease_ontology

To learn more

Documentation

Disease annotation data

Data sources

We currently obtain disease annotation data from several data resources and keep them up-to-date, so that you don’t have to do it:

* key name: this is the key for the specific annotation data in a disease object.

The most updated information can be accessed here.

Note

Each data source may have its own usage restrictions. Please refer to the data source pages above for their specific restrictions.

Disease object

Disease annotation data are both stored and returned as a disease object, which is essentially a collection of fields (attributes) and their values:

{
    "_id": "MONDO:0020753",
    "_version": 1,
    "disease_ontology": {
        "_license": "https://github.com/DiseaseOntology/HumanDiseaseOntology/blob/master/DO_LICENSE.txt",
        "ancestors": [
            "DOID:0050117",
            "DOID:4",
            "DOID:934"
        ],
        "children": [
            "DOID:0080600",
            "DOID:0080642",
            "DOID:2945"
        ],
        "def": "\"A viral infectious disease that has_material_basis_in Coronavirus.\" [url:https\\://www.cdc.gov/coronavirus/, url:https\\://www.ncbi.nlm.nih.gov/books/NBK7782/, url:https\\://www.who.int/health-topics/coronavirus]",
        "descendants": [
            "DOID:0080600",
            "DOID:0080642",
            "DOID:2945"
        ],
        "doid": "DOID:0080599",
        "name": "Coronavirus infectious disease",
        "parents": [
            "DOID:934"
        ],
        "synonyms": {},
        "xrefs": {}
    },
    "mondo": {
        "children": [
            "MONDO:0005091",
            "MONDO:0025404",
            "MONDO:0025420",
            "MONDO:0025491",
            "MONDO:0100096",
            "MONDO:0100116"
        ],
        "definition": "Infectious disease causes by viruses in the subfamily Orthocoronavirinae (coronaviruses). In humans, coronaviruses cause respiratory tract infections that can be mild, such as some cases of the common cold (among other possible causes, predominantly rhinoviruses), and others that can be lethal, such as SARS, MERS, and COVID-19.",
        "label": "Orthocoronavirinae infectious disease",
        "mondo": "MONDO:0020753",
        "parents": "MONDO:0005718",
        "synonyms": "coronavirus infectious disease",
        "xrefs": {
            "doid": "DOID:0080599"
        }
    }
}

The example above omits many of the available fields. For a full example, check out this example disease, or try the interactive API page.

_id field

Each individual disease object contains an “_id” field as the primary key. Where possible, MyDisease.info disease objects use MONDO as their “_id”. If a MONDO isn’t available, any one of the following datasource IDs may be used:

_score field

You will often see a “_score” field in the returned disease object, which is the internal score representing how well the query matches the returned disease object. It probably does not mean much in disease annotation service when only one disease object is returned. In disease query service, by default, the returned disease hits are sorted by the scores in descending order.

Available fields

The table below lists all of the possible fields that could be in a disease object, as well as all of their parents (for nested fields). If the field is indexed, it may also be directly queried.

Field Indexed Type Notes

Data release notes

This page contains metadata about each MyDisease.info data release. Click a link to see more.

MyDisease Releases

Loading release data . . .

Disease query service

This page describes the reference for MyDisease.info disease query web service. It’s also recommended to try it live on our interactive API page.

Service endpoint

http://mydisease.info/v1/query

GET request

Query parameters
q
Required, passing user query. The detailed query syntax for parameter “q” we explained below.
fields
Optional, a comma-separated string to limit the fields returned from the matching disease hits. The supported field names can be found from any disease object (e.g. here). Note that it supports dot notation, and wildcards as well, e.g., you can pass “mondo”, “mondo.xrefs”, or “ctd.chemical_related_to_disease.*”. If “fields=all”, all available fields will be returned. Default: “all”.
size
Optional, the maximum number of matching disease hits to return (with a cap of 1000 at the moment). Default: 10.
from
Optional, the number of matching disease hits to skip, starting from 0. Default: 0

Hint

The combination of “size” and “from” parameters can be used to get paging for large query:

q=ctd.chemical_related_to_disease.chemical_name:zidovudine*&size=50                     first 50 hits
q=ctd.chemical_related_to_disease.chemical_name:zidovudine*&size=50&from=50             the next 50 hits
fetch_all
Optional, a boolean, which when TRUE, allows fast retrieval of all unsorted query hits. The return object contains a _scroll_id field, which when passed as a parameter to the query endpoint, returns the next 1000 query results. Setting fetch_all = TRUE causes the results to be inherently unsorted, therefore the sort parameter is ignored. For more information see examples using fetch_all here. Default: FALSE.
scroll_id
Optional, a string containing the _scroll_id returned from a query request with fetch_all = TRUE. Supplying a valid scroll_id will return the next 1000 unordered results. If the next results are not obtained within 1 minute of the previous set of results, the scroll_id becomes stale, and a new one must be obtained with another query request with fetch_all = TRUE. All other parameters are ignored when the scroll_id parameter is supplied. For more information see examples using scroll_id here.
sort
Optional, the comma-separated fields to sort on. Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in descending order.
facets
Optional, a single field or comma-separated fields to return facets, can only be used on non-free text fields. E.g. “facets=disgenet.genes_related_to_disease.YearFinal”. See examples of faceted queries here.
facet_size
Optional, an integer (1 <= facet_size <= 1000) that specifies how many buckets to return in a faceted query.
callback
Optional, you can pass a “callback” parameter to make a JSONP call.
dotfield
Optional, can be used to control the format of the returned disease object. If “dotfield” is true, the returned data object is returned flattened (no nested objects) using dotfield notation for key names. Default: false.
email
Optional, if you are regular users of our services, we encourage you to provide us an email, so that we can better track the usage or follow up with you.
Query syntax

Examples of query parameter “q”:

Simple queries

search for everything:

q=coronavirus                        # search all default fields for term
Fielded queries
q=mondo.synonyms:PCD               # for matching value on a specific field

q=mondo.synonyms:(PCD)            # multiple values for a field
q=mondo.synonyms:(PCD OR PIST)         # multiple values for a field using OR

q=_exists_:ctd                        # having ctd field
q=NOT _exists_:ctd                   # missing ctd field

Hint

For a list of available fields, see here.

Range queries
q=disgenet.genes_related_to_disease.DSI:<0.5
q=disgenet.genes_related_to_disease.DSI:>0.5

q=disgenet.genes_related_to_disease.YearFinal:[1998 TO 2008]         # bounded (including 200 and 500)
Wildcard queries

Wildcard character “*” or “?” is supported in either simple queries or fielded queries:

q=mondo.label:primary*

Note

Wildcard character can not be the first character. It will be ignored.

Scrolling queries

If you want to return ALL results of a very large query, sometimes the paging method described above can take too long. In these cases, you can use a scrolling query. This is a two-step process that turns off database sorting to allow very fast retrieval of all query results. To begin a scrolling query, you first call the query endpoint as you normally would, but with an extra parameter fetch_all = TRUE. For example, a GET request to:

http://mydisease.info/v1/query?q=_exists_:disgenet&fields=disgenet.genes_related_to_disease.gene_name&fetch_all=TRUE

Returns the following object:

{
  "_scroll_id": "cXVlcnlUaGVuRmV0Y2g7MTA7Njg4ODAwOTI6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzs2ODg4MDA5MTpKZTRyT2gxNlRQcXJFeVhJM09LakxnOzY4ODgwMDkzOkplNHJPaDE2VFBxckV5WEkzT0tqTGc7Njg4ODAwOTQ6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzs2ODg4MDEwMDpKZTRyT2gxNlRQcXJFeVhJM09LakxnOzY4ODgwMDk2OkplNHJPaDE2VFBxckV5WEkzT0tqTGc7Njg4ODAwOTg6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzs2ODg4MDA5NzpKZTRyT2gxNlRQcXJFeVhJM09LakxnOzY4ODgwMDk5OkplNHJPaDE2VFBxckV5WEkzT0tqTGc7Njg4ODAwOTU6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzswOw==",
  "max_score": 1.0,
  "took": 2042,
  "total": 10869,
  "hits": [
    {
        '_id': 'C4017543',
        '_score': 1.0,
        'disgenet': {
            'genes_related_to_disease': {
                'gene_name': 'BCHE'
            }
        }
    },
    {
        '_id': 'C4017544',
        '_score': 1.0,
        'disgenet': {
            'genes_related_to_disease': {
                'gene_name': 'BCHE'
            }
        }
    },
    .
    .
    .
  ],
}

At this point, the first 1000 hits have been returned (of ~11,000 total), and a scroll has been set up for your query. To get the next batch of 1000 unordered results, simply execute a GET request to the following address, supplying the _scroll_id from the first step into the scroll_id parameter in the second step:

http://mydisease.info/v1/query?scroll_id=cXVlcnlUaGVuRmV0Y2g7MTA7Njg4ODAwOTI6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzs2ODg4MDA5MTpKZTRyT2gxNlRQcXJFeVhJM09LakxnOzY4ODgwMDkzOkplNHJPaDE2VFBxckV5WEkzT0tqTGc7Njg4ODAwOTQ6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzs2ODg4MDEwMDpKZTRyT2gxNlRQcXJFeVhJM09LakxnOzY4ODgwMDk2OkplNHJPaDE2VFBxckV5WEkzT0tqTGc7Njg4ODAwOTg6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzs2ODg4MDA5NzpKZTRyT2gxNlRQcXJFeVhJM09LakxnOzY4ODgwMDk5OkplNHJPaDE2VFBxckV5WEkzT0tqTGc7Njg4ODAwOTU6SmU0ck9oMTZUUHFyRXlYSTNPS2pMZzswOw==

Hint

Your scroll will remain active for 1 minute from the last time you requested results from it. If your scroll expires before you get the last batch of results, you must re-request the scroll_id by setting fetch_all = TRUE as in step 1.

Hint

When you need to use this “scrolling query” feature via “fetch_all” parameter, we recommend you to use our Python client “biothings_client”.

Boolean operators and grouping

You can use AND/OR/NOT boolean operators and grouping to form complicated queries:

q=_exists_:ctd AND _exists_:disgenet                              AND operator
q=_exists_:ctd AND NOT _exists_:disgenet                         NOT operator
q=_exists_:ctd OR (_exists_:disgenet AND _exists_:hpo)           grouping with ()
Escaping reserved characters

If you need to use these reserved characters in your query, make sure to escape them using a back slash (”"):

+ - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
Returned object

A GET request like this:

http://mydisease.info/v1/query?q=disgenet.genes_related_to_disease.gene_name:OFD1&fields=mondo.label

should return hits as:

{
    'took': 4,
    'total': 14,
    'max_score': 6.271054,
    'hits': [
        {
            '_id': 'C4277690',
            '_score': 6.271054
        },
        {
            '_id': 'MONDO:0017892',
            '_score': 6.271054,
            'mondo': {
                'label': 'autosomal recessive myogenic arthrogryposis multiplex congenita'
            }
        },
        {
            '_id': 'MONDO:0009360',
            '_score': 6.271054,
            'mondo': {
                'label': 'hydrocephalus, nonsyndromic, autosomal recessive 1'
            }
        },
        {
            '_id': 'MONDO:0010702',
            '_score': 6.271054,
            'mondo': {
                'label': 'orofaciodigital syndrome I'
            }
        },
        {
            '_id': 'MONDO:0010704',
            '_score': 6.271054,
            'mondo': {
                'label': 'otopalatodigital syndrome type 1'
            }
        },
        {
            '_id': 'MONDO:0016023',
            '_score': 6.271054,
            'mondo': {
                'label': 'ocular coloboma'
            }
        },
        {
            '_id': 'MONDO:0010176',
            '_score': 6.271054,
            'mondo': {
                'label': 'orofaciodigital syndrome type 6'
            }
        },
        {
            '_id': 'MONDO:0010265',
            '_score': 6.271054,
            'mondo': {
                'label': 'Simpson-Golabi-Behmel syndrome type 2'
            }
        },
        {
            '_id': 'MONDO:0010320',
            '_score': 6.271054,
            'mondo': {
                'label': 'retinitis pigmentosa 23'
            }
        },
        {
            '_id': 'MONDO:0010431',
            '_score': 6.271054,
            'mondo': {
                'label': 'Joubert syndrome 10'
            }
        }
    ]
}

total” in the output gives the total number of matching hits, while the actual hits are returned under “hits” field. “size” parameter controls how many hits will be returned in one request (default is 10). Adjust “size” parameter and “from” parameter to retrieve the additional hits.

Faceted queries

If you need to perform a faceted query, you can pass an optional “facets” parameter.

A GET request like this:

http://mydisease.info/v1/query?q=disgenet.genes_related_to_disease.gene_name:OFD1&facets=disgenet.genes_related_to_disease.YearFinal&size=0

should return hits as:

{
    'took': 130,
    'total': 14,
    'max_score': 0.0,
    'facets': {
        'disgenet.genes_related_to_disease.YearFinal': {
            '_type': 'terms',
            'terms': [
                {
                    'count': 9,
                    'term': 2013.0
                },
                {
                    'count': 9,
                    'term': 2017.0
                },
                {
                    'count': 7,
                    'term': 2007.0
                },
                {
                    'count': 7,
                    'term': 2015.0
                },
                {
                    'count': 7,
                    'term': 2016.0
                },
                {
                    'count': 7,
                    'term': 2018.0
                },
                {
                    'count': 6,
                    'term': 2010.0
                },
                {
                    'count': 6,
                    'term': 2012.0
                },
                {
                    'count': 6,
                    'term': 2014.0
                },
                {
                    'count': 5,
                    'term': 2011.0
                }
            ],
            'other': 38,
            'missing': 0,
            'total': 69
        }
    },
    'hits': []
}

Batch queries via POST

Although making simple GET requests above to our disease query service is sufficient for most use cases, there are times you might find it more efficient to make batch queries (e.g., retrieving disease annotation for multiple diseases). Fortunately, you can also make batch queries via POST requests when you need:

URL: http://mydisease.info/v1/query
HTTP method:  POST
Query parameters
q
Required, multiple query terms seperated by comma (also support “+” or white space), but no wildcard, e.g., ‘q=SDUQYLNIPVEERB-QPPQHZFASA-N,SESFRYSPDFLNCH-UHFFFAOYSA-N’
scopes
Optional, specify one or more fields (separated by comma) as the search “scopes”, e.g., “scopes=ctd”. The available “fields” can be passed to “scopes” parameter are listed here. Default:
fields
Optional, a comma-separated string to limit the fields returned from the matching disease hits. The supported field names can be found from any disease object. Note that it supports dot notation, and wildcards as well, e.g., you can pass “ctd”, “mondo.label”, or “mondo.xrefs.*”. If “fields=all”, all available fields will be returned. Default: “all”.
email
Optional, if you are regular users of our services, we encourage you to provide us an email, so that we can better track the usage or follow up with you.
Example code

Unlike GET requests, you can easily test them from browser, make a POST request is often done via a piece of code. Here is a sample python snippet using httplib2 module:

import httplib2
h = httplib2.Http()
headers = {'content-type': 'application/x-www-form-urlencoded'}
params = 'q=DOID:14566,DOID:1240&scopes=disease_ontology.doid&fields=disease_ontology.name'
res, con = h.request('http://mydisease.info/v1/query', 'POST', params, headers=headers)

or this example using requests module:

import requests
params = {'q': 'DOID:14566,DOID:1240', 'scopes': 'disease_ontology.doid', 'fields': 'disease_ontology.name'}
res = requests.post('http://mydisease.info/v1/query', params)
con = res.json()
Returned object

Returned result (the value of “con” variable above) from above example code should look like this:

[
    {
        'query': 'DOID:14566',
        '_id': 'MONDO:0005070',
        '_score': 8.824187,
        'disease_ontology': {
            '_license': 'https://github.com/DiseaseOntology/HumanDiseaseOntology/blob/master/DO_LICENSE.txt',
            'name': 'disease of cellular proliferation'
        }
    },
    {
        'query': 'DOID:1240',
        '_id': 'MONDO:0005059',
        '_score': 8.824187,
        'disease_ontology': {
            '_license': 'https://github.com/DiseaseOntology/HumanDiseaseOntology/blob/master/DO_LICENSE.txt',
            'name': 'leukemia'
        }
    }
]

Tip

“query” field in returned object indicates the matching query term.

If a query term has no match, it will return with “notfound” field as “true”:

[
  ...,
  {'query': '...',
   'notfound': true},
  ...
]

Disease annotation service

This page describes the reference for the MyDisease.info disease annotation web service. It’s also recommended to try it live on our interactive API page.

Service endpoint

http://mydisease.info/v1/disease

GET request

Obtaining the disease annotation via our web service is as simple as calling this URL:

http://mydisease.info/v1/disease/<diseaseid>

diseaseid above is any one of several common disease identifiers: MONDO.

By default, this will return the complete disease annotation object in JSON format. See here for an example and here for more details. If the input diseaseid is not valid, 404 (NOT FOUND) will be returned.

Optionally, you can pass a “fields” parameter to return only the annotation you want (by filtering returned object fields):

http://mydisease.info/v1/disease/MONDO:0016575?fields=mondo

fields” accepts any attributes (a.k.a fields) available from the disease object. Multiple attributes should be separated by commas. If an attribute is not available for a specific disease object, it will be ignored. Note that the attribute names are case-sensitive.

Just like the disease query service, you can also pass a “callback” parameter to make a JSONP call.

Query parameters
fields
Optional, can be a comma-separated fields to limit the fields returned from the disease object. If “fields=all”, all available fields will be returned. Note that it supports dot notation as well, e.g., you can pass “mondo.label”. Default: “fields=all”.
callback
Optional, you can pass a “callback” parameter to make a JSONP call.
filter
Alias for “fields” parameter.
email
Optional, if you are regular users of our services, we encourage you to provide us an email, so that we can better track the usage or follow up with you.

Returned object

A GET request like this:

http://mydisease.info/v1/disease/MONDO:0016575?fields=mondo

should return a disease object below:

Batch queries via POST

Although making simple GET requests above to our disease query service is sufficient in most use cases, there are some times you might find it’s easier to batch query (e.g., retrieving disease annotations for multiple diseases). Fortunately, you can also make batch queries via POST requests when you need:

URL: http://mydisease.info/v1/disease
HTTP method:  POST
Query parameters
ids
Required. Accept multiple disease ids separated by comma, e.g., “ids=SDUQYLNIPVEERB-QPPQHZFASA-N,SESFRYSPDFLNCH-UHFFFAOYSA-N,SHGAZHPCJJPHSC-ZVCIMWCZSA-N”. Note that currently we only take the input ids up to 1000 maximum, the rest will be omitted.
fields
Optional, can be a comma-separated fields to limit the fields returned from the matching hits. If “fields=all”, all available fields will be returned. Note that it supports dot notation as well, e.g., you can pass “ctd” or “mondo.label”. Default: “all”.
email
Optional, if you are regular users of our services, we encourage you to provide us an email, so that we can better track the usage or follow up with you.
Example code

Unlike GET requests, you can easily test them from browser, make a POST request is often done via a piece of code, still trivial of course. Here is a sample python snippe using httplib2 modulet:

import httplib2
h = httplib2.Http()
headers = {'content-type': 'application/x-www-form-urlencoded'}
params = 'ids=MONDO:0016575,MONDO:0020753,MONDO:0011996&fields=mondo.xrefs.doid'
res, con = h.request('http://mydisease.info/v1/disease', 'POST', params, headers=headers)

or this example using requests module:

import requests
params = {'ids': 'MONDO:0016575,MONDO:0020753,MONDO:0011996', 'fields': 'mondo.xrefs.doid'}
res = requests.post('http://mydisease.info/v1/disease', params)
con = res.json()
Returned object

Returned result (the value of “con” variable above) from above example code should look like this:

[
    {
        'query': 'MONDO:0016575',
        '_id': 'MONDO:0016575',
        '_version': 1,
        'mondo': {
            'xrefs': {
                'doid': 'DOID:0050144'
            }
        }
    },
    {
        'query': 'MONDO:0020753',
        '_id': 'MONDO:0020753',
        '_version': 1,
        'mondo': {
            'xrefs': {
                'doid': 'DOID:0080599'
            }
        }
    },
    {
        'query': 'MONDO:0011996',
        '_id': 'MONDO:0011996',
        '_version': 1,
        'mondo': {
            'xrefs': {
                'doid': 'DOID:8552'
            }
        }
    }
]

Server response

The MyDisease.info server returns a variety of query responses, and response status codes. They are listed here.

Note

These examples show query responses using the python requests package.

Status code 200

A 200 status code indicates a successful query, and is accompanied by the query response payload.

In [1]: import requests

In [2]: r = requests.get('http://mydisease.info/v1/query?q=_exists_:ctd')

In [3]: r.status_code
Out[3]: 200

In [4]: data = r.json()

In [5]: data.keys()
Out[5]: dict_keys(['total', 'max_score', 'took', 'hits'])

Status code 400

A 400 status code indicates an improperly formed query, and is accompanied by a response payload describing the source of the error.

In [6]: r = requests.get('http://mydisease.info/v1/query?q=_exists_:ctd&size=u')

In [7]: r.status_code
Out[7]: 400

In [8]: data = r.json()

In [9]: data
Out[9]:
{'error': "Expected 'size' parameter to have integer type.  Couldn't convert 'u' to integer",
 'success': False}

Status code 404

A 404 status code indicates either an unrecognized URL, as in (/query is misspelled /quer resulting in an unrecognized URL):

In [10]: r = requests.get('http://mydisease.info/v1/quer?q=_exists_:ctd')

In [11]: r.status_code
Out[11]: 404

or, for the /disease endpoint, a 404 status code could be from querying for a nonexistent disease ID, as in:

In [12]: r = requests.get('http://mydisease.info/v1/disease/5')

In [13]: r.status_code
Out[13]: 404

In [14]: data = r.json()

In [15]: data
Out[15]:
{'error': "ID '5' not found",
 'success': False}

Status code 5xx

Any 5xx status codes are the result of uncaught query errors. Ideally, these should never occur. We routinely check our logs for these types of errors and add code to catch them, but if you see any status 5xx responses, please submit a bug report to biothings@googlegroups.com.

Biothings_client python module

You can access the MyDisease.info services programmatically with our biothings_client unified python client.