lamindb.core.CanValidate

class lamindb.core.CanValidate

Bases: object

Base class providing Registry-based validation.

Methods

add_synonym(synonym, force=False, save=None)

Add synonyms to a record.

Parameters:
  • synonym (str | List[str] | Series | array)

  • force (bool, default: False)

  • save (bool | None, default: None)

See also

remove_synonym()

Remove synonyms.

Examples

>>> import bionty as bt
>>> bt.CellType.from_public(name="T cell").save()
>>> lookup = bt.CellType.lookup()
>>> record = lookup.t_cell
>>> record.synonyms
'T-cell|T lymphocyte|T-lymphocyte'
>>> record.add_synonym("T cells")
>>> record.synonyms
'T cells|T-cell|T-lymphocyte|T lymphocyte'
classmethod inspect(values, field=None, *, mute=False, organism=None, public_source=None)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (List[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Mute logging.

  • organism (str | Registry | None, default: None) – An Organism name or record.

  • public_source (Registry | None, default: None) – A PublicSource record.

Return type:

InspectResult

See also

validate()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol)
✅ 2 terms (50.00%) are validated
🔶 2 terms (50.00%) are not validated
    🟠 detected synonyms
    to increase validated terms, standardize them via .standardize()
>>> result.validated
['A1CF', 'A1BG']
>>> result.non_validated
['FANCD1', 'FANCD20']

.

classmethod map_synonyms(synonyms, *, return_mapper=False, case_sensitive=False, keep='first', synonyms_field='synonyms', field=None, **kwargs)

{}.

Return type:

list[str] | dict[str, str]

remove_synonym(synonym)

Remove synonyms from a record.

Parameters:

synonym (str | List[str] | Series | array) – The synonym value.

See also

add_synonym()

Add synonyms

Examples

>>> import bionty as bt
>>> bt.CellType.from_public(name="T cell").save()
>>> lookup = bt.CellType.lookup()
>>> record = lookup.t_cell
>>> record.synonyms
'T-cell|T lymphocyte|T-lymphocyte'
>>> record.remove_synonym("T-cell")
'T lymphocyte|T-lymphocyte'
set_abbr(value)
Parameters:

value (str) – A value for an abbreviation.

See also

add_synonym()

Add synonyms.

Examples

>>> import bionty as bt
>>> bt.ExperimentalFactor.from_public(name="single-cell RNA sequencing").save()
>>> scrna = bt.ExperimentalFactor.filter(name="single-cell RNA sequencing").one()
>>> scrna.abbr
None
>>> scrna.synonyms
'single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing'
>>> scrna.set_abbr("scRNA")
>>> scrna.abbr
'scRNA'
>>> scrna.synonyms
'scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq'
>>> scrna.save()
classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, public_aware=True, keep='first', synonyms_field='synonyms', organism=None)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field (str | None, default: None) – The field to return. Defaults to field.

  • return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.

  • case_sensitive (bool, default: False) – Whether the mapping is case sensitive.

  • mute (bool, default: False) – Mute logging.

  • public_aware (bool, default: True) – Whether to standardize from Bionty reference. Defaults to True for Bionty registries.

  • keep (Literal['first', 'last', False], default: 'first') –

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated:
    • "first": returns the first mapped standardized name

    • "last": returns the last mapped standardized name

    • False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.

  • organism (str | Registry | None, default: None) – An Organism name or record.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> standardized_names = bt.Gene.standardize(gene_synonyms)
>>> standardized_names
['A1CF', 'A1BG', 'BRCA2', 'FANCD20']

.

classmethod validate(values, field=None, *, mute=False, organism=None)

Validate values against existing values of a string field.

Note this is strict validation, only asserts exact matches.

Parameters:
  • values (List[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Mute logging.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol)
✅ 2 terms (50.00%) are validated
🔶 2 terms (50.00%) are not validated
array([ True,  True, False, False])

.