Skip to content

Database Configuration

LAPIS and SILO need a database_config.yaml. Its main purpose is to define the database schema for the sequence metadata. See the tutorial for an example, or use our config generator to generate your own config. More examples can be found in our tests.

The database config is considered static configuration that doesn’t change with data updates. This page contains the technical specification of the database config.

The database_config.yaml permits the following top-level keys:

KeyTypeRequiredDescription
schemaobjecttrueThe schema object.
defaultNucleotideSequencestringfalseName of the default nucleotide sequence segment. Only meaningful when there is more than one segment.
defaultAminoAcidSequencestringfalseName of the default amino acid gene
siloClientThreadCountintfalseHow many threads (connections) LAPIS uses to talk to SILO.

The schema object permits the following fields:

KeyTypeRequiredDescription
instanceNamestringtrueThe name assigned to the instance. Used for display purposes.
metadataarraytrueA list of metadata objects describing the metadata fields available on the underlying sequence data.
primaryKeystringtrueThe name of the metadata field that serves as the primary key. The value must match one of the entries in metadata.
featuresarrayfalseA list of feature objects that enable additional query capabilities. Defaults to no features.

Each entry in schema.metadata describes a single metadata field. The following keys are permitted:

KeyTypeRequiredDescription
namestringtrueThe name of the metadata field. Must be unique within metadata.
typeenumtrueThe type of the metadata.
generateIndexbooleanfalseIf true, SILO builds an index for this field so that filter queries become a trivial lookup. See Generating an index. Only valid for fields of type string.
generateLineageIndexstringfalseIf set, SILO treats the field as a lineage-indexed field belonging to the named lineage system. See Lineage-indexed fields. Only valid for fields of type string.
isPhyloTreeFieldbooleanfalseIf true, marks the field as a phylogenetic tree field. Sequences can then be queried by their position in a tree (e.g. via mostRecentCommonAncestor). Only valid for fields of type string.

LAPIS supports the following metadata types:

  • string: Arbitrary text values.

  • int: Integer values.

  • float: Floating-point values.

  • boolean: true or false.

  • date: Values must be valid dates in the form YYYY-MM-DD.

For string fields, setting generateIndex: true makes SILO precompute bitmaps for the field’s distinct values, turning queries against the field into very fast lookups.

Setting generateLineageIndex: <systemName> on a string field tells SILO that the values form a hierarchy (e.g. Pango lineages). The value of generateLineageIndex is the name of the lineage system — a SILO-side definition that lists how the lineages relate to each other (parent/child relationships, aliases). Multiple metadata fields can share the same lineage system.

The lineage definitions themselves are provided to SILO at preprocessing time and are not part of the LAPIS database config. See SILO’s documentation for how to supply lineage definitions.

Setting isPhyloTreeField: true on a string field declares that the field stores identifiers in a phylogenetic tree (for example node labels of an UShER tree). The tree itself is supplied to SILO at preprocessing time.

Each entry in schema.features enables a feature in LAPIS:

KeyTypeRequiredDescription
namestringtrueThe name of the feature.

The following feature names are recognized. Any other value will cause LAPIS to fail on startup.

Feature nameDescription
sarsCoV2VariantQueryEnables the SARS-CoV-2-specific variant query language, exposed via the variantQuery request parameter. The feature is used for CoV-Spectrum and it is not recommended to use it otherwise.
generalizedAdvancedQueryEnables the generic advanced query language, exposed via the advancedQuery request parameter. Recommended for non-SARS-CoV-2 instances.