A project is defined by four YAML files. load_project() deep-merges them in order (bronze → silver → gold into main) and validates the result before the pipeline runs.
pipeline:name:my_project# (required) pipeline name — used in dlt and gold output dirsincludes:bronze:bronze.yaml# (required) filename of the bronze config in this foldersilver:silver.yaml# (required) filename of the silver configgold:gold.yaml# (required) filename of the gold configpaths:bronze:./data/bronze# (required) bronze output directory — local or s3://silver:./data/silver# (required) silver output directorygold:./data/gold# (required) gold output directoryexport:./data/export# (required) export output directorybi_export:# (optional) BI export configenabled:trueprojects:-name:defaulttables:-summary.parquet
Any string value in any YAML file may contain ${VAR} or ${VAR:-default} placeholders. Expansion happens after all four files are merged.
# Raises EnvironmentError if ORACLE_USER is not setconnection_string:"oracle+oracledb://${ORACLE_USER}:${ORACLE_PASSWORD}@${ORACLE_DSN}"# Falls back to "us-east-1" if AWS_DEFAULT_REGION is not setbucket_url:"s3://${BUCKET:-my-default-bucket}/data"
bronze_to_silver:tables:-source_file:ORDERS.parquet# filename in bronze directoryoutput_file:orders.parquet# filename to write in silver directorytransforms:-type:renamecolumns:ORDER_ID:order_idUPDATED_AT:updated_at-type:castcolumns:order_id:Int64amount:Float64-type:dropcolumns:[INTERNAL_COL,DEBUG_COL]-type:udffile:projects/my_project/udf/silver/base.pyfunction:enrich_ordersargs:threshold:500.0derived_tables:-output_file:order_lines_enriched.parquetudf:file:projects/my_project/udf/silver/derived.pyfunction:build_enrichedselect:[order_id,product_id,line_revenue]# optional column projection
silver_to_gold:projects:-name:analytics# subfolder under paths.goldaggregations:-source_file:orders.parquet# silver filename to readpre_agg_udf:# (optional) runs before group_byfile:projects/my_project/udf/gold/transforms.pyfunction:prepare_ordersargs:include_region:truegroup_by:[customer_id,region]# columns to group onmetrics:-{column:order_id, agg:count, alias:total_orders}-{column:amount, agg:sum, alias:total_revenue}-{column:amount, agg:mean, alias:avg_order_value}output_file:customer_summary.parquet-source_file:orders.parquet# pass-through (no group_by)select:[order_id,amount,status]output_file:orders_flat.parquet