Skip to content

GA4GH Phenopackets

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample (Jacobsen et al., 2022).

The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals.

Phenopacket Schema

Phenopacket schema overview. The GA4GH Phenopacket schema consists of several optional elements, each of which contains information about a certain topic, such as phenotype, variant, or pedigree. An element can contain other elements, which allows a hierarchical representation of data. For instance, Phenopacket contains elements of type Individual, PhenotypicFeature, Biosample, and so on. Individual elements can therefore be regarded as building blocks that are combined to create larger structures.

Tutorial

We have published a detailed example and tutorial for how to encode the clinical data of an individual with a Mendelian rare disease (retinoblastoma) in Ladewig et al. 2022.

The schema is available on its GitHub repository in addition to detailed documentation.

Phenopackets and HPO

The GA4GH Phenopacket Schema allows more context to be provided for phenotypic abnormalities than a list of HPO terms without additional data. For instance, we can specify the age of onset, the severity, the resolution (abatement, or “offset”) of a feature, other modifiers from the HPO’s Clinical Modifier subontology, and also provides a standard syntax for reporting that a particular feature was explicitly excluded by clinical examination.

Phenopacket Schema

Overview of the PhenotypicFeature element of the GA4GH Phenopacket Schema.

We have provided recommendations of how to encode clinical data with HPO terms that can be used as a guide to creating phenopackets for individuals with rare disease (Oien et al., 2019).