mt_metadata.utils.summarize

Created on Tue Feb 23 11:52:35 2021

copyright:

Jared Peacock (jpeacock@usgs.gov)

license:

MIT

This module provides functionality to summarize metadata standards from both legacy BaseDict-based objects and modern Pydantic v2 MetadataBase objects.

The main functions are: - summarize_timeseries_standards(): Legacy function for BaseDict objects - summarize_pydantic_standards(): New function for Pydantic v2 MetadataBase objects - extract_metadata_fields_from_pydantic(): Extract fields from individual Pydantic classes - summarize_standards(): Unified interface supporting both legacy and Pydantic systems

Example usage:

# For Pydantic v2 objects (recommended) >>> df = summarize_standards(metadata_type=”pydantic”)

# Extract fields from individual class >>> from mt_metadata.timeseries import Survey >>> fields = extract_metadata_fields_from_pydantic(Survey)

# Get BaseDict-compatible summary >>> summary = summarize_pydantic_standards()

Attributes

SUMMARIZE_DTYPE

Functions

extract_metadata_fields_from_pydantic(metadata_class)

Extract field information from a Pydantic v2 MetadataBase class definition

collect_basemodel_objects(module)

Collect all MetadataBase subclasses from a given module.

summarize_pydantic_standards([module])

Summarize the standards for metadata using Pydantic v2 MetadataBase classes.

summary_to_array(summary_dict[, dtype])

Summarize all metadata from a summarized dictionary of standards

summary_to_dataframe(summary_dict)

Convert a summary dictionary to a pandas DataFrame.

summarize_standards([module, csv_fn, output_type, dtype])

Summarize standards into a numpy array and write a csv if specified

Module Contents

mt_metadata.utils.summarize.SUMMARIZE_DTYPE
mt_metadata.utils.summarize.extract_metadata_fields_from_pydantic(metadata_class)

Extract field information from a Pydantic v2 MetadataBase class definition and convert it to a format compatible with BaseDict.

Parameters:

metadata_class (type) – A MetadataBase class (not instance)

Returns:

Dictionary with field information compatible with BaseDict

Return type:

dict

mt_metadata.utils.summarize.collect_basemodel_objects(module)

Collect all MetadataBase subclasses from a given module.

Parameters:

module (str) – The module to inspect (e.g., ‘mt_metadata.timeseries’)

Returns:

Dictionary mapping class objects to their names

Return type:

dict[type[MetadataBase], str]

mt_metadata.utils.summarize.summarize_pydantic_standards(module='timeseries')

Summarize the standards for metadata using Pydantic v2 MetadataBase classes. Similar to summarize_timeseries_standards but works with the new Pydantic structure.

Parameters:

module (str, optional) – The module to inspect, by default “timeseries”

Returns:

BaseDict object containing summarized field information

Return type:

BaseDict

mt_metadata.utils.summarize.summary_to_array(summary_dict, dtype=SUMMARIZE_DTYPE)

Summarize all metadata from a summarized dictionary of standards

Parameters:

summary_dict (dict) – Dictionary of summarized standards

Returns:

numpy structured array

Return type:

np.array

mt_metadata.utils.summarize.summary_to_dataframe(summary_dict)

Convert a summary dictionary to a pandas DataFrame.

Parameters:

summary_dict (dict) – Dictionary of summarized standards

Returns:

DataFrame containing the summarized standards

Return type:

pd.DataFrame

mt_metadata.utils.summarize.summarize_standards(module='timeseries', csv_fn=None, output_type='dataframe', dtype=SUMMARIZE_DTYPE)

Summarize standards into a numpy array and write a csv if specified

Parameters:
  • module (str, optional) – Module to summarize, by default “timeseries”

  • csv_fn (str or Path, optional) – Full path to write a csv file, by default None

Returns:

If output_type is “array”, returns a numpy structured array. If output_type is “dataframe”, returns a pandas DataFrame.

Return type:

numpy.ndarray | pd.DataFrame