API Reference¶

Complete API documentation for sunstone-py.

pandas Module¶

Drop-in replacement for pandas with lineage tracking.

from sunstone import pandas as pd

Functions¶

`read_dataset(slug, project_path=None, strict=None, fetch_from_url=True, format=None, **kwargs)`¶

Read a dataset by slug with automatic format detection.

Parameters:

slug (str): Dataset slug to look up in datasets.yaml
project_path (str | Path | None): Path to project directory. Defaults to Path.cwd()
strict (bool | None): Enable strict mode. If None, reads from SUNSTONE_DATAFRAME_STRICT env var
fetch_from_url (bool): If True and dataset has a source URL but no local file, fetch automatically
format (str | None): Format override ('csv', 'json', 'excel', 'parquet', 'tsv'). Auto-detected from extension if not provided
**kwargs: Additional arguments passed to the underlying pandas reader

Returns: DataFrame with lineage tracking

Example:

df = pd.read_dataset('official-un-member-states')
df = pd.read_dataset('my-data', format='json', project_path='/path/to/project')

`read_csv(filepath, project_path=None, strict=None, **kwargs)`¶

Read CSV file with lineage tracking.

Parameters:

filepath (str | Path): Path to CSV file, URL, or dataset slug
project_path (str | Path | None): Path to project directory containing datasets.yaml. Defaults to Path.cwd()
strict (bool | None): If True, dataset must be pre-registered in datasets.yaml. If None, reads from SUNSTONE_DATAFRAME_STRICT env var
**kwargs: Additional arguments passed to pandas.read_csv()

Returns: DataFrame with lineage tracking

Raises:

DatasetNotFoundError: If dataset not found in datasets.yaml
StrictModeError: If strict=True and dataset not pre-registered

Example:

df = pd.read_csv(
    'data/schools.csv',
    project_path='/path/to/project',
    strict=True,
    encoding='utf-8'
)

`read_excel(filepath, project_path=None, strict=None, fetch_from_url=True, **kwargs)`¶

Read Excel file (.xlsx/.xls) with lineage tracking.

Parameters:

filepath (str | Path): Path to Excel file or dataset slug
project_path (str | Path | None): Path to project directory containing datasets.yaml. Defaults to Path.cwd()
strict (bool | None): If True, dataset must be pre-registered. If None, reads from SUNSTONE_DATAFRAME_STRICT env var
fetch_from_url (bool): If True and dataset has a source URL but no local file, automatically fetch from URL
**kwargs: Additional arguments passed to pandas.read_excel()

Returns: DataFrame with lineage tracking

Raises:

DatasetNotFoundError: If dataset not found in datasets.yaml
FileNotFoundError: If datasets.yaml doesn't exist

Example:

# Load by slug (recommended)
df = pd.read_excel('my-excel-data', project_path='/path/to/project')

# Load by file path
df = pd.read_excel('data/schools.xlsx', project_path='/path/to/project', sheet_name='Sheet1')

`read_json(filepath, project_path=None, strict=None, **kwargs)`¶

Read JSON file with lineage tracking.

Parameters:

filepath (str | Path): Path to JSON file or dataset slug
project_path (str | Path | None): Path to project directory. Defaults to Path.cwd()
strict (bool | None): Enable strict mode. If None, reads from SUNSTONE_DATAFRAME_STRICT env var
**kwargs: Additional arguments passed to pandas.read_json()

Returns: DataFrame with lineage tracking

Example:

# Read a JSON file
df = pd.read_json('data/records.json', project_path=PROJECT_PATH)

# With pandas options
df = pd.read_json('data/records.json', orient='records', lines=True)

`merge(left, right, **kwargs)`¶

Merge DataFrames with combined lineage.

Parameters:

left (DataFrame): Left DataFrame
right (DataFrame): Right DataFrame
**kwargs: Arguments passed to pandas.merge()

Returns: DataFrame with lineage from both sources

Example:

result = pd.merge(schools, teachers, on='school_id', how='inner')
print(len(result.lineage.sources))  # 2

`concat(dfs, **kwargs)`¶

Concatenate DataFrames with combined lineage.

Parameters:

dfs (list[DataFrame]): List of DataFrames to concatenate
**kwargs: Arguments passed to pandas.concat()

Returns: DataFrame with lineage from all sources

Example:

result = pd.concat([df1, df2, df3], ignore_index=True)

DataFrame Class¶

Main class for working with data and lineage.

from sunstone import DataFrame

Class Methods¶

`read_csv(filepath, project_path, strict=False, **kwargs)`¶

Read CSV file and return DataFrame.

Parameters: Same as pandas.read_csv()

Returns: DataFrame instance

`read_excel(filepath, project_path, strict=False, fetch_from_url=True, **kwargs)`¶

Read Excel file and return DataFrame.

Parameters: Same as pandas.read_excel()

Returns: DataFrame instance

Instance Methods¶

`to_csv(path, slug, name, **kwargs)`¶

Write DataFrame to CSV and register in datasets.yaml.

Parameters:

path (str | Path): Output file path
slug (str): Machine-readable identifier
name (str): Human-readable name
**kwargs: Arguments passed to pandas.DataFrame.to_csv()

Returns: None

Example:

df.to_csv(
    'outputs/summary.csv',
    slug='summary',
    name='Summary Results',
    index=False
)

Note: Publishing is controlled by the top-level publish configuration in datasets.yaml, not per-dataset.

`to_parquet(path, slug, name, **kwargs)`¶

Write DataFrame to Parquet file and register in datasets.yaml.

Parameters:

path (str | Path): Output file path
slug (str | None): Machine-readable identifier (required in relaxed mode if not registered)
name (str | None): Human-readable name (required in relaxed mode if not registered)
track (bool): If False, write without lineage tracking or dataset registration
**kwargs: Arguments passed to pandas.DataFrame.to_parquet()

Returns: None

Example:

df.to_parquet(
    'outputs/summary.parquet',
    slug='summary',
    name='Summary Results'
)

`set_field_metadata(column, *, description, unit, source, type, constraints)`¶

Set metadata for a column. Returns self for method chaining.

Parameters:

column (str): Column name to annotate
description (str, optional): Human-readable description of the field
unit (str, optional): Unit of measure (e.g., 'kg', 'students', '%')
source (str, optional): Slug of the input dataset this field comes from
type (str, optional): Data type override. If None, inferred from dtype at write time
constraints (dict, optional): Validation constraints (e.g., enum values)

Returns: DataFrame (self, for chaining)

Example:

df.set_field_metadata('population', description='Total population', unit='people')
df.set_field_metadata('gdp', description='Gross domestic product', unit='USD')

# Method chaining
df = (df
    .set_field_metadata('area', unit='km^2')
    .set_field_metadata('density', unit='people / km^2')
)

`merge(right, **kwargs)`¶

Merge with another DataFrame.

Parameters:

right (DataFrame): DataFrame to merge with
**kwargs: Arguments passed to pandas.merge()

Returns: New DataFrame with combined lineage

`join(other, **kwargs)`¶

Join with another DataFrame.

Parameters:

other (DataFrame): DataFrame to join with
**kwargs: Arguments passed to pandas.DataFrame.join()

Returns: New DataFrame with combined lineage

`concat(others, **kwargs)`¶

Concatenate with other DataFrames.

Parameters:

others (list[DataFrame]): DataFrames to concatenate
**kwargs: Arguments passed to pandas.concat()

Returns: New DataFrame with combined lineage

`apply_operation(operation, description)`¶

Apply transformation with lineage tracking.

Parameters:

operation (callable): Function that takes a pandas DataFrame and returns a pandas DataFrame
description (str): Human-readable description of the operation

Returns: New DataFrame with operation recorded in lineage

Example:

def adjust_enrollment(df):
    return df.assign(adjusted=df['enrollment'] * 1.1)

result = df.apply_operation(
    adjust_enrollment,
    description="Apply 10% enrollment adjustment factor"
)

Instance Attributes¶

`data`¶

Access the underlying pandas DataFrame.

Type: pandas.DataFrame

Example:

# Get numpy array
values = df.data.values

# Use pandas methods not wrapped
styled = df.data.style.highlight_max()

`metadata`¶

Access the unified metadata container.

Type: Metadata

Example:

# Lineage is accessed through metadata
print(df.metadata.lineage.sources)
print(df.metadata.lineage.get_licenses())

# Dataset identity
df.metadata.slug = 'my-dataset'
df.metadata.name = 'My Dataset'
df.metadata.description = 'A description of this dataset'

# RDF prefixes and custom properties
df.metadata.rdf_prefixes = {'schema': 'http://schema.org/'}
df.metadata.custom_properties = {'schema:about': 'Education'}

# Per-field metadata (see set_field_metadata)
print(df.metadata.field_metadata)

`lineage` (deprecated)¶

Access lineage metadata directly. Use df.metadata.lineage instead.

Type: LineageMetadata

Example:

# Preferred
print(df.metadata.lineage.sources)

# Deprecated (still works)
print(df.lineage.sources)

DatasetsManager Class¶

Manage datasets.yaml files programmatically.

from sunstone import DatasetsManager

Constructor¶

`DatasetsManager(project_path, datasets_file=None)`¶

Create a datasets manager.

Parameters:

project_path (str | Path): Path to project directory containing datasets.yaml
datasets_file (str | Path | None): Path to a specific datasets YAML file (relative to project_path or absolute). Defaults to "datasets.yaml"

Example:

manager = DatasetsManager('/path/to/project')

# Use a custom datasets file
manager = DatasetsManager('/path/to/project', datasets_file='config/my-datasets.yaml')

Methods¶

`find_dataset_by_location(location, dataset_type=None)`¶

Find dataset by file path.

Parameters:

location (str): File path to search for
dataset_type (str, optional): Filter by 'input' or 'output'

Returns: DatasetMetadata | None

Example:

dataset = manager.find_dataset_by_location('data/schools.csv')
if dataset:
    print(dataset.slug)

`find_dataset_by_slug(slug, dataset_type=None)`¶

Find dataset by slug identifier.

Parameters:

slug (str): Slug to search for
dataset_type (str, optional): Filter by 'input' or 'output'

Returns: DatasetMetadata | None

Example:

dataset = manager.find_dataset_by_slug('school-data')

`get_all_inputs()`¶

Get all input datasets.

Returns: list[DatasetMetadata]

`get_all_outputs()`¶

Get all output datasets.

Returns: list[DatasetMetadata]

`get_publish_config()`¶

Get the top-level publish configuration.

Returns: PublishConfig | None

Example:

publish_config = manager.get_publish_config()
if publish_config and publish_config.enabled:
    print(f"Publishing to: {publish_config.to}")
    print(f"Flatten: {publish_config.flatten}")

`add_output_dataset(name, slug, location, fields)`¶

Register new output dataset.

Parameters:

name (str): Human-readable name
slug (str): Machine-readable identifier
location (str): File path
fields (list[FieldSchema]): Field definitions

Returns: None

Example:

from sunstone import FieldSchema

manager.add_output_dataset(
    name='Analysis Results',
    slug='analysis-results',
    location='outputs/results.csv',
    fields=[
        FieldSchema(name='category', type='string'),
        FieldSchema(name='count', type='integer'),
        FieldSchema(name='avg_value', type='number')
    ]
)

Note: Use the top-level publish configuration in datasets.yaml to enable publishing for all outputs.

`update_output_dataset(slug, **kwargs)`¶

Update existing output dataset.

Parameters:

slug (str): Dataset slug to update
**kwargs: Fields to update (name, location, fields, etc.)

Returns: None

`set_dataset_strict(slug, strict, dataset_type=None)`¶

Enable or disable strict mode for a dataset.

Parameters:

slug (str): Dataset slug
strict (bool): True to enable strict mode, False to disable
dataset_type (str, optional): Filter by 'input' or 'output'

Returns: None

Raises: DatasetNotFoundError if dataset not found

Example:

# Enable strict mode
manager.set_dataset_strict('school-data', True)

# Disable strict mode
manager.set_dataset_strict('school-data', False)

`update_output_lineage(slug, lineage, content_hash, strict=False)`¶

Update lineage metadata for an output dataset.

Parameters:

slug (str): Output dataset slug
lineage (LineageMetadata): Lineage metadata to write
content_hash (str): Hash of the file content
strict (bool): If True, validates without modifying

Returns: None

Raises:

DatasetNotFoundError: If dataset not found
DatasetValidationError: In strict mode, if lineage differs

Note: Timestamp only updates when content_hash changes.

`get_absolute_path(location)`¶

Convert relative path to absolute project path.

Parameters:

location (str): Relative or absolute path

Returns: Path

Validation Functions¶

from sunstone import check_notebook_imports, validate_project_notebooks

`check_notebook_imports(notebook_path)`¶

Validate a single notebook's imports.

Parameters:

notebook_path (str | Path): Path to notebook file

Returns: ValidationResult

Example:

result = check_notebook_imports('analysis.ipynb')
if result.is_valid:
    print("✓ Notebook uses sunstone imports")
else:
    print(result.summary())

`validate_project_notebooks(project_path)`¶

Validate all notebooks in a project.

Parameters:

project_path (str | Path): Path to project directory

Returns: dict[Path, ValidationResult]

Example:

results = validate_project_notebooks('/path/to/project')
for path, result in results.items():
    if not result.is_valid:
        print(f"\n{path}:")
        print(result.summary())

Data Classes¶

FieldSchema¶

Field definition for datasets.

Attributes:

name (str): Field name
type (str | None): Field type (string, number, integer, boolean, date, datetime). If None, inferred from dtype at write time
description (str, optional): Field description
unit (str, optional): Unit of measure (e.g., 'kg', '%', 'people')
source (str, optional): Slug of the input dataset this field's data comes from
constraints (dict, optional): Validation constraints

Example:

from sunstone import FieldSchema

field = FieldSchema(
    name='enrollment',
    type='integer',
    description='Number of enrolled students',
    unit='students',
    constraints={'minimum': 0}
)

# type can be omitted — it's inferred at write time
field = FieldSchema(name='ratio', description='Student-teacher ratio')

DatasetMetadata¶

Dataset metadata from datasets.yaml.

Attributes:

name (str): Human-readable name
slug (str): Machine-readable identifier
location (str): File path
fields (list[FieldSchema]): Field definitions
source (SourceMetadata | None): Source attribution (inputs only)
strict (bool): Strict mode enabled
dataset_type (str): 'input' or 'output'

PublishConfig¶

Top-level publishing configuration.

Attributes:

enabled (bool): Whether publishing is enabled
to (str | None): Destination URL or path
flatten (bool): Whether to flatten directory structure (default: False)

Path Resolution:

If to ends with .json: Used as datapackage filename
gs://bucket/countries.json → datapackage at exact path
If to doesn't end with .json: Treated as directory
gs://bucket/datasets/project/ → adds /datapackage.json

Example:

from sunstone import PublishConfig

config = PublishConfig(
    enabled=True,
    to='gs://my-bucket/datasets/project/',
    flatten=False
)

LineageMetadata¶

Lineage tracking information. Aligned with W3C PROV-O.

Attributes:

sources (list[DatasetMetadata]): Source datasets that contributed to this data
created_at (datetime | None): Timestamp when lineage was last updated (content changed)
content_hash (str | None): SHA256 hash of the DataFrame content
activity (Activity | None): The PROV-O Activity that generated this data
field_derivations (list[FieldDerivation] | None): Field-level derivation detail (prov:qualifiedDerivation)

Methods:

get_licenses(): Return list of all source licenses
add_source(source): Add source dataset
populate_field_derivations(columns, slug): Auto-populate field derivations for columns from a source
merge(other): Merge lineage from another DataFrame, combining sources and field derivations

Activity¶

A W3C PROV-O Activity representing a script or notebook execution.

Attributes:

id (str): Unique identifier (e.g., 'exec-{timestamp}-{hash}')
used (list[UsageRecord]): Input entities consumed by this activity
generated (list[EntityRef]): Output entities produced
was_associated_with (list[Agent]): Agents involved in this activity
started_at (datetime | None): When the activity started
ended_at (datetime | None): When the activity ended
script_path (str | None): Path to the executed Python script
git_commit (str | None): Git commit hash at time of execution

Agent¶

A W3C PROV-O Agent: something that bears responsibility for an activity.

Attributes:

id (str): Unique identifier (username, org name, software name)
type (AgentType): One of PERSON, SOFTWARE, ORGANIZATION
label (str | None): Human-readable label
version (str | None): Version string (for SoftwareAgent)

FieldDerivation¶

Records that an output field was derived from a source entity. Maps to prov:qualifiedDerivation at the field level.

Attributes:

output_field (str): Name of the output column
source_entity (str): Slug of the source dataset
source_field (str | None): Name of the source field, if known

EntityRef¶

Lightweight reference to a PROV Entity (dataset).

Attributes:

slug (str): Dataset slug identifier
namespace (str | None): Optional namespace URI for external entities

UsageRecord¶

Records how an Activity used an Entity. Maps to prov:qualifiedUsage.

Attributes:

entity (EntityRef): Which entity was used
columns (list[str] | None): Which columns were selected (None means all)
filters (dict | None): Filters applied during read

Metadata Class¶

Unified metadata container for DataFrames.

from sunstone.lineage import Metadata

Attributes:

lineage (LineageMetadata): Lineage metadata tracking data provenance
description (str | None): Human-readable description of the dataset
rdf_prefixes (dict | None): RDF namespace prefixes for custom properties
custom_properties (dict | None): Custom properties including RDF triples
field_metadata (dict[str, FieldSchema]): Per-column metadata, keyed by column name
slug (str | None): Dataset slug, used at write time
name (str | None): Human-readable dataset name, used at write time

Plugin System¶

The plugin system handles reading, writing, and URL resolution through a registry of handlers.

from sunstone.plugins import PluginRegistry

PluginRegistry¶

Central registry for auth providers, URL handlers, and format handlers.

`PluginRegistry.get(project_path=None)`¶

Return a cached registry instance. If project_path is provided, the registry is scoped to that project and loads project-specific plugin configuration.

Example:

registry = PluginRegistry.get('/path/to/project')

`registry.fetch(url, dest)`¶

Download a URL to a local file using the appropriate URL handler.

Parameters:

url (str): URL to download (supports http://, https://, gs://, s3://, r2://, local paths)
dest (Path): Local destination file path

Returns: Path to the downloaded file

Example:

from pathlib import Path
from sunstone.plugins import PluginRegistry

registry = PluginRegistry.get()
registry.fetch('gs://my-bucket/data.csv', Path('data/local.csv'))

Note: DatasetsManager.fetch_from_url() is deprecated. Use PluginRegistry.get().fetch() instead.

Plugin Protocols¶

Plugins implement one or more of these protocols:

AuthProvider: Provides authentication headers for HTTP requests
URLHandler: Resolves URLs to readable/writable streams via open(url, mode)
FormatHandler: Reads and writes data formats (CSV, JSON, Excel, Parquet, TSV)

Plugin Discovery¶

External plugins are discovered via the sunstone.plugins entry point group:

# In your plugin's pyproject.toml
[project.entry-points."sunstone.plugins"]
my-plugin = "my_package:MyPlugin"

Plugin Configuration¶

Configuration is loaded with cascading precedence:

datasets.yaml → plugins.<name> section (highest priority)
pyproject.toml → [tool.sunstone.plugins.<name>] section
Environment variables → SUNSTONE_PLUGIN_<NAME>_<KEY>

Built-in URL Handlers¶

Scheme	Handler	Extra
Local files	`LocalFileHandler`	Built-in
`http://`, `https://`	`HttpURLHandler`	Built-in (with SSRF protection)
`gs://`	`GcsURLHandler`	Requires `sunstone-py[gcs]`
`s3://`, `r2://`	`S3URLHandler`	Requires `sunstone-py[s3]`

Unit-Aware Arithmetic¶

sunstone-py integrates with Pint for unit-aware column arithmetic.

Unit Modes¶

Set via SUNSTONE_UNIT_MODE environment variable or programmatically:

from sunstone.units import set_unit_mode

set_unit_mode('strict')  # Raise on unit mismatch
set_unit_mode('auto')    # Auto-convert compatible units
set_unit_mode('relaxed') # No unit validation (default)

Mode	Add/Sub mismatch	Mul/Div	Unknown units
`relaxed`	Allowed	Allowed	Allowed
`strict`	Error	Computes result unit	Error
`auto`	Auto-converts if compatible	Computes result unit	Warning

Setting Units on Columns¶

df.set_field_metadata('distance', unit='km')
df.set_field_metadata('time', unit='hour')

Unit Tracking Through Operations¶

When columns with units are used in merge, join, or concat operations, sunstone validates unit compatibility and (in auto mode) applies conversions automatically.

QUDT Round-Tripping¶

Units stored as QUDT URIs in datasets.yaml are preserved through read/write cycles via the unit_source field on FieldSchema.

Exceptions¶

from sunstone.exceptions import (
    SunstoneError,
    DatasetNotFoundError,
    StrictModeError,
    DatasetValidationError,
    LineageError
)

`SunstoneError`¶

Base exception for all sunstone-py errors.

`DatasetNotFoundError`¶

Raised when dataset not found in datasets.yaml.

Example:

try:
    df = pd.read_csv('missing.csv', project_path=PROJECT_PATH)
except DatasetNotFoundError as e:
    print(f"Dataset not registered: {e}")

`StrictModeError`¶

Raised when operation blocked in strict mode.

Example:

try:
    df.to_csv('new.csv', slug='new', name='New', strict=True)
except StrictModeError as e:
    print(f"Strict mode violation: {e}")

`DatasetValidationError`¶

Raised when dataset validation fails.

`LineageError`¶

Raised when lineage tracking encounters an error.

Type Hints¶

sunstone-py includes complete type hints for IDE support:

from sunstone import DataFrame, DatasetsManager
from pathlib import Path

# Type hints work automatically
def process_data(path: Path, project: Path) -> DataFrame:
    df: DataFrame = pd.read_csv(str(path), project_path=project)
    return df[df['value'] > 100]

API Reference¶

pandas Module¶

Functions¶

read_dataset(slug, project_path=None, strict=None, fetch_from_url=True, format=None, **kwargs)¶

read_csv(filepath, project_path=None, strict=None, **kwargs)¶

read_excel(filepath, project_path=None, strict=None, fetch_from_url=True, **kwargs)¶

read_json(filepath, project_path=None, strict=None, **kwargs)¶

merge(left, right, **kwargs)¶

concat(dfs, **kwargs)¶

DataFrame Class¶

Class Methods¶

read_csv(filepath, project_path, strict=False, **kwargs)¶

read_excel(filepath, project_path, strict=False, fetch_from_url=True, **kwargs)¶

Instance Methods¶

to_csv(path, slug, name, **kwargs)¶

to_parquet(path, slug, name, **kwargs)¶

set_field_metadata(column, *, description, unit, source, type, constraints)¶

merge(right, **kwargs)¶

join(other, **kwargs)¶

concat(others, **kwargs)¶

apply_operation(operation, description)¶

Instance Attributes¶

data¶

metadata¶

lineage (deprecated)¶

DatasetsManager Class¶

Constructor¶

DatasetsManager(project_path, datasets_file=None)¶

Methods¶

find_dataset_by_location(location, dataset_type=None)¶

find_dataset_by_slug(slug, dataset_type=None)¶

get_all_inputs()¶

get_all_outputs()¶

get_publish_config()¶

add_output_dataset(name, slug, location, fields)¶

update_output_dataset(slug, **kwargs)¶

set_dataset_strict(slug, strict, dataset_type=None)¶

update_output_lineage(slug, lineage, content_hash, strict=False)¶

get_absolute_path(location)¶

Validation Functions¶

check_notebook_imports(notebook_path)¶

validate_project_notebooks(project_path)¶

Data Classes¶

FieldSchema¶

DatasetMetadata¶

PublishConfig¶

LineageMetadata¶

Activity¶

Agent¶

FieldDerivation¶

EntityRef¶

UsageRecord¶

Metadata Class¶

Plugin System¶

PluginRegistry¶

PluginRegistry.get(project_path=None)¶

registry.fetch(url, dest)¶

Plugin Protocols¶

Plugin Discovery¶

Plugin Configuration¶

Built-in URL Handlers¶

Unit-Aware Arithmetic¶

Unit Modes¶

Setting Units on Columns¶

Unit Tracking Through Operations¶

QUDT Round-Tripping¶

Exceptions¶

SunstoneError¶

DatasetNotFoundError¶

StrictModeError¶

DatasetValidationError¶

LineageError¶

Type Hints¶

`read_dataset(slug, project_path=None, strict=None, fetch_from_url=True, format=None, **kwargs)`¶

`read_csv(filepath, project_path=None, strict=None, **kwargs)`¶

`read_excel(filepath, project_path=None, strict=None, fetch_from_url=True, **kwargs)`¶

`read_json(filepath, project_path=None, strict=None, **kwargs)`¶

`merge(left, right, **kwargs)`¶

`concat(dfs, **kwargs)`¶

`read_csv(filepath, project_path, strict=False, **kwargs)`¶

`read_excel(filepath, project_path, strict=False, fetch_from_url=True, **kwargs)`¶

`to_csv(path, slug, name, **kwargs)`¶

`to_parquet(path, slug, name, **kwargs)`¶

`set_field_metadata(column, *, description, unit, source, type, constraints)`¶

`merge(right, **kwargs)`¶

`join(other, **kwargs)`¶

`concat(others, **kwargs)`¶

`apply_operation(operation, description)`¶

`data`¶

`metadata`¶

`lineage` (deprecated)¶

`DatasetsManager(project_path, datasets_file=None)`¶

`find_dataset_by_location(location, dataset_type=None)`¶

`find_dataset_by_slug(slug, dataset_type=None)`¶

`get_all_inputs()`¶

`get_all_outputs()`¶

`get_publish_config()`¶

`add_output_dataset(name, slug, location, fields)`¶

`update_output_dataset(slug, **kwargs)`¶

`set_dataset_strict(slug, strict, dataset_type=None)`¶

`update_output_lineage(slug, lineage, content_hash, strict=False)`¶

`get_absolute_path(location)`¶

`check_notebook_imports(notebook_path)`¶

`validate_project_notebooks(project_path)`¶

`PluginRegistry.get(project_path=None)`¶

`registry.fetch(url, dest)`¶

`SunstoneError`¶

`DatasetNotFoundError`¶

`StrictModeError`¶

`DatasetValidationError`¶

`LineageError`¶