How does the HPO project create the annotations?
The HPO project is indebted to OMIM at the very beginning of our project, we used text-mining to derive the first set of annotations for each disease from the Clinical Synopsis section of the corresponding OMIM entry (See PMID:18950739). Our project used these text-mining scripts until about 2018. Since then, we have used manual biocuration to add phenotype annotations directly from the primary literature. We rely on OMIM to confirm and name newly discovered diseases, and our annotations are made to MIM identifiers, such as OMIM:120100 for Familial cold inflammatory syndrome 1.
All HPO annotations are made available in the phenotype.hpoa file. See that page for more information on file format. The evidence code (IEA, TAS, or PCS) in field 6 indicates what type of biocuration was used for each line of the file.
IEA
These are lines that were "inferred by electronic annotation" (IEA), meaning that these annotations were created by text mining from the OMIM Clinical Synopsis. The following example shows such a line in tabular form (Click to show table).
Example IEA row (12 fields)
| Field | Value |
|---|---|
| database_id | OMIM:611705 |
| disease_name | Congenital myopathy 5 with cardiomyopathy |
| qualifier | — |
| hpo_id | HP:0030059 |
| reference | OMIM:611705 |
| evidence | IEA |
| onset | — |
| frequency | — |
| sex | — |
| modifier | — |
| aspect | P |
| biocuration | HPO:skoehler[2018-10-08] |
This means that text-mining was used (IEA) using code by Sebastian Koehler, performed in 2018-Oct-08.
TAS
These lines have a "traceable author statement" (TAS). If the reference is indicated as OMIM, this means that the biocurator confirmed the annotation by consulting the OMIM page.
Example TAS row (12 fields)
| Field | Value |
|---|---|
| database_id | OMIM:183400 |
| disease_name | Split lower lip |
| qualifier | — |
| hpo_id | HP:0000178 |
| reference | OMIM:183400 |
| evidence | TASP |
| onset | — |
| frequency | — |
| sex | — |
| modifier | — |
| aspect | — |
| biocuration | HPO:lccarmody[2018-10-03] |
PCS
These lines are derived from a published clinical study (PCS).
They were created by manual biocuration of the indicated article (PMID).
Example PCS row (12 fields)
| Field | Value |
|---|---|
| database_id | OMIM:619371 |
| disease_name | Cardiomyopathy, dilated, 2D |
| qualifier | — |
| hpo_id | HP:0003593 |
| reference | PMID:32514796;PMID:32870709 |
| evidence | PCS |
| onset | — |
| frequency | 9/12 |
| sex | — |
| modifier | — |
| aspect | C |
| biocuration | HPO:probinson[2022-07-04] |
Orphanet annotations
The annotations of the HPO project are to OMIM disease identifiers. We additionally offer annotations created by expert panels by the Orphanet consortium. These annotations represent a complementary resource. Orphanet annotations use ORPHA identifiers, e.g., ORPHA:221139.
In general, we recommend using either the HPO project or the ORPHA annotations unless you have a specific reason for combining the two sources.