Buster, the AI data engineer

Buster is an AI data engineer that keeps your data stack running

Buster saves data engineers hundreds of hours every month. It monitors your data stack, catches breaking changes, and fixes issues for you.

Learn more

Get started

IN PROGRESS

Reviewing PR from Nate Kelley

New PR: feature/add_customer_segments

Renaming inconsistent fields across custome…

Request from Dallin Bentley

Updating column descriptions

Schedule: Weekly, 9:30 AM

Checking for downstream impacts

CI/CD build on main branch detected

READY FOR REVIEW

Detected schema drift in production

Schedule: Daily, 1:00 AM

6hr

Updated customers.yml documentation

Merge to main detected

4hr

Detected schema drift in production

Weekly Schema Drift Check initiated

1:00 AM

Pulled latest manifest.json and run_results.json from main branch to compare model definitions.

1:00 AM

Identified 428 active models in project. Preparing warehouse comparison.

1:00 AM

Queried warehouse metadata for all production schemas and cached current state.

1:01 AM

Cross-referenced dbt model columns, data types, and constraints against live warehouse tables.

1:01 AM

Drift Detected

1:01 AM

- Detected data type mismatch in model: fct_order_summary

- order_total → changed from FLOAT (in dbt) to DECIMAL(12,2) (in warehouse).

Validating Impact

1:02 AM

- Found 3 downstream models directly impacted by change.

- Need to create new dbt schema test and update documentation in fct_order_summary.yml.

Generated new schema test and updated docs

1:04 AM

- Generated recommended dbt schema test

- Updated snippet for fct_order_summary.yml.

Ran tests locally

1:04 AM

- Ran updated tests

- All tests passed successfully

Opened PR & sent slack alert

1:05 AM

- Created GitHub issue #421 with drift details, impact summary, and code suggestions.

- Posted alert to Slack channel #buster-data-quality.

Checked other models for drift

1:05 AM

- No additional drift found.

Finished

1:11 AM

IN PROGRESS

Reviewing PR from Nate Kelley

New PR: feature/add_customer_segments

Renaming inconsistent fields across custome…

Request from Dallin Bentley

Updating column descriptions

Schedule: Weekly, 9:30 AM

Checking for downstream impacts

CI/CD build on main branch detected

READY FOR REVIEW

Detected schema drift in production

Schedule: Daily, 1:00 AM

6hr

Updated customers.yml documentation

Merge to main detected

4hr

Detected schema drift in production

Weekly Schema Drift Check initiated

1:00 AM

Pulled latest manifest.json and run_results.json from main branch to compare model definitions.

1:00 AM

Identified 428 active models in project. Preparing warehouse comparison.

1:00 AM

Queried warehouse metadata for all production schemas and cached current state.

1:01 AM

Cross-referenced dbt model columns, data types, and constraints against live warehouse tables.

1:01 AM

Drift Detected

1:01 AM

- Detected data type mismatch in model: fct_order_summary

- order_total → changed from FLOAT (in dbt) to DECIMAL(12,2) (in warehouse).

Validating Impact

1:02 AM

- Found 3 downstream models directly impacted by change.

- Need to create new dbt schema test and update documentation in fct_order_summary.yml.

Generated new schema test and updated docs

1:04 AM

- Generated recommended dbt schema test

- Updated snippet for fct_order_summary.yml.

Ran tests locally

1:04 AM

- Ran updated tests

- All tests passed successfully

Opened PR & sent slack alert

1:05 AM

- Created GitHub issue #421 with drift details, impact summary, and code suggestions.

- Posted alert to Slack channel #buster-data-quality.

Checked other models for drift

1:05 AM

- No additional drift found.

Finished

1:11 AM

buster

██████╗ ██╗ ██╗ ███████╗ ████████╗ ███████╗ ██████╗

██╔══██╗ ██║ ██║ ██╔════╝ ╚══██╔══╝ ██╔════╝ ██╔══██╗

██████╔╝ ██║ ██║ ███████╗ ██║ █████╗ ██████╔╝

██╔══██╗ ██║ ██║ ╚════██║ ██║ ██╔══╝ ██╔══██╗

██████╔╝ ╚██████╔╝ ███████║ ██║ ███████╗ ██║ ██║

╚═════╝ ╚═════╝ ╚══════╝ ╚═╝ ╚══════╝ ╚═╝ ╚═╝

BUSTER v0.3.1 — Your AI Data Worker.

You are standing in an open terminal. An AI awaits your commands.

ENTER send • \n newline • @ files • / commands

buster

██████╗ ██╗ ██╗ ███████╗ ████████╗ ███████╗ ██████╗

██╔══██╗ ██║ ██║ ██╔════╝ ╚══██╔══╝ ██╔════╝ ██╔══██╗

██████╔╝ ██║ ██║ ███████╗ ██║ █████╗ ██████╔╝

██╔══██╗ ██║ ██║ ╚════██║ ██║ ██╔══╝ ██╔══██╗

██████╔╝ ╚██████╔╝ ███████║ ██║ ███████╗ ██║ ██║

╚═════╝ ╚═════╝ ╚══════╝ ╚═╝ ╚══════╝ ╚═╝ ╚═╝

BUSTER v0.3.1 — Your AI Data Worker.

You are standing in an open terminal. An AI awaits your commands.

ENTER send • \n newline • @ files • / commands

Loved by data & analytics engineers at top companies

Loved by data & analytics engineers at

AI code review

Buster reviews your PRs, runs data diffs, and checks for downstream impact

Buster reviews every PR, runs data diffs to validate changes, and investigates downstream impact. It consistently catches data-specific bugs that other AI code reviewers miss.

PR opened by Nate Sanders

GitHub Pull Request

Get BugBot

invoice-project-join-fix

buster

bot

reviewed 1 minute ago

Needs Review: Invoices may “disappear” from project totals

This PR changes how intermediate_finance_materials_invoice assigns each invoice line to a project. It now matches using invoice.po_number (trimmed) instead of the existing project_id link.

What I Found (Data Diff + Lineage Check)

Overall: more invoice lines get a project (+6,461 rows, ~+44%)
Wins: 7,071 rows that previously had no project now get one via po_number
Risk: 610 rows lose a project because some po_number values aren’t usable ids (they’re free-text like names and addresses), so the new match fails

Downstream Impact

Those 610 lines will stop counting toward project spend in downstream models like:

fct_project_spend (facts: invoice lines rolled up to project)
mart_project_costs_daily (daily costs by project)
mart_budget_vs_actual (budget vs actual by project)

This will result in some projects looking artificially cheaper (or “missing invoices”) even though the underlying invoices still exist.

Suggested Fix

Use a COALESCE to fallback to materials_order.project_id when po_number doesn’t match. This results in 100% coverage (0 missing values) while preserving the new matches.

Alert & issue triage

Buster triages alerts from failed Airflow jobs, dbt jobs, and quality monitors

Buster investigates job failures, identifies root cause, and responds accordingly - opening PRs with fixes, sending Slack alerts, or intelligently filtering out false-positives.

Airflow DAG failed

Slack

Get BugBot

#data-alerts

8 members

Airflow Alert

APP

3 min ago

🚨 DAG Failed: analytics_daily

Execution Time: 2025-12-14T04:23:17+00:00

Log URL: http://airflow-qos-k8s-r1.example.com/dags/01831

Error Reason: None

1 reply

Buster

APP

1 min ago

What happened:

Test `unique_fct_orders_order_id` failed. 342 duplicate `order_id` values were found in `fct_orders`.

What I found:

Duplicates started 2025-12-14 at 6:00pm
Root cause: Refund adjustments from Stripe (Fivetran) are now syncing as separate rows instead of updating the original transaction
Each refunded order now has 2 rows: original + refund adjustment

Downstream impact:

`fct_orders`: Order count inflated ~2.3%
`mart_daily_sales`: Revenue double-counted for refunded orders
“Quarterly Executive Report” (Hex): GMV is now overstated

I opened a PR with a quick fix:

Updated `stg_orders` to deduplicate by `order_id`, keeping the most recent record and aggregating refund amounts.

PR #612: Handle refund adjustment rows in stg_orders

https://github.com/acme-analytics/dbt/pull/612

View Run

Open in Buster

Upstream change detection

Buster catches upstream changes before they break downstream models

Buster monitors source table schemas, JSON columns, and upstream repositories for changes. When something shifts, it investigates downstream impact and opens a PR to integrate or fix impacted models.

Upstream schema change detected

Slack

Get BugBot

#data-alerts

8 members

Buster

APP

1 min ago

🚨 Upstream Schema Change Detected

What changed:

Column `user_type` was renamed to `account_type` in the source database. Detected in Fivetran sync at 2025-03-14T06:15:22+00:00.

What I found:

Traced the change back to PR #1847 in the `core-api` repo: "Refactor user model naming conventions"
`user_type` is referenced in 4 downstream models:
- `stg_users` — direct reference, will break
- `int_users_enriched` — joins on `user_type`
- `fct_user_signups` — filters by `user_type`
- `mart_user_segments` — aggregates by `user_type`

Impact if not addressed:

Next dbt run will fail with “column `user_type does not exist” in `stg_users`
Downstream models (`int_users_enriched`, `fct_user_signups`, and `mart_user_segments`) will be blocked from building

I opened a PR with a quick fix:

Updated `stg_users` to select `account_type as user_type`, maintaining backward compatibility. No changes needed downstream.

PR #681: Hot fix: user_type column rename

https://github.com/acme-analytics/dbt/pull/681

View Run

Open in Buster

Build with Buster

Use Buster on-demand or set up autonomous workflows that run in the background

Use Buster in your CLI or IDE

Give Buster any task (refactoring models, running migrations, building pipelines, or answering questions about your stack). It works like Claude Code or Cursor, but with full data stack context and purpose-built tooling.

models/marts/sales_order_detail.yml

version: 2

models:
- name: sales_order_detail
description: |
Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.

Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.

relationships:
- name: sales_order_header
description: >
Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).
Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).
Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.
Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.
source_col: salesOrderID
ref_col: salesOrderID
cardinality: many-to-one

Ask anything...

Attach

Reason

Use Buster in your CLI or IDE

models/marts/sales_order_detail.yml

Ask anything...

Attach

Reason

Use Buster in your CLI or IDE

models/marts/sales_order_detail.yml

Ask anything...

Attach

Reason

Set up triggers for autonomous workflows

Pick from Buster’s prebuilt workflows or define your own. Create triggers for alerts, schedules, or webhooks - each with custom instructions defining Buster's workflow.

Runs

run_jzdeqacokljk4ioyuxqjvmkzrf

daily-dbt-audit

Flagged for review

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_mnaxvqzjkbhs8fmobgxlqhjzrt

upstream-change-review

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_ynhwertghjkf67asdlkfjhqw

pr-review

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_cmgvazqbgrh443aoiuoqxjkjh

pr-review

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_bjwnxfqhlpdt2focvwefklkqz

pr-review

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_jfakdbaqhlpdt2focvwefklkqz

dbt-test-update

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

daily-dbt-audit

5:36 AM

18 monitors passed, 2 anomalies investigated

Ran 20 monitors across staging and mart layers and flagged 2 anomalies:

stg_orders.shipping_address null rate up 34% → traced to new digital_only order type, expected
stg_payments.payment_method new value “klarna” detected → needs mapping

Opened PR to add “klarna” to payment method mapping. I also adjusted null rate thresholds for the digital orders monitor so we don't get false-positives going forward.

staging

←

add-klarna-payment-method

“Buster saves our data team hundreds of hours of work every month.”

Jonathon Northrup, Senior Analytics Engineer

“Buster saves our data team hundreds of hours of work every month.”

Jonathon Northrup, Senior Analytics Engineer

"Buster's understanding of our models has blown me away. It really gets how our stack fits together.”

Cale Anderson, Data Engineer

"Buster's understanding of our models has blown me away. It really gets how our stack fits together.”

Cale Anderson, Data Engineer

“Buster frees me up from the ad-hoc tasks I always had to do, so I can focus on longer term goals."

Landen Bailey, Senior Data Engineer

“Buster frees me up from the ad-hoc tasks I always had to do, so I can focus on longer term goals."

Landen Bailey, Senior Data Engineer

"Buster helps us keep our dbt project clean, documented, and up-to-date."

Jen Eutsler, Data Engineer

"Buster helps us keep our dbt project clean, documented, and up-to-date."

Jen Eutsler, Data Engineer

“Buster frees us up to focus on impactful data modeling and engineering work that we didn’t have bandwidth for.”

Alex Ahlstrom, Director of Data

“Buster frees us up to focus on impactful data modeling and engineering work that we didn’t have bandwidth for.”

Alex Ahlstrom, Director of Data

Purpose-built for data engineering

Buster deeply understands your stack and has native tooling for data engineering tasks

Full data stack context

Buster generates a detailed context graph of your data stack (documentation and lineage across every model, column, DAG, ETL job, etc) and deeply understands how everything fits together.

Full data stack context

Buster generates a detailed context graph of your data stack (documentation and lineage across every model, column, DAG, ETL job, etc) and deeply understands how everything fits together.

Full data stack context

Buster generates a detailed context graph of your data stack (documentation and lineage across every model, column, DAG, ETL job, etc) and deeply understands how everything fits together.

Purpose-built for data engineering

Buster has direct warehouse access, optimized data diff tooling, and its own DuckDB instance for statistical analysis. It has a deep understanding of dbt, Airflow, ETL pipelines, lineage, and more.

Enterprise-grade security

Buster is built with enterprise-grade security practices

This includes state-of-the-art encryption, safe and reliable infrastructure partners, and independently verified security controls.

View our security policy →

SOC 2 Type II compliant

Buster has undergone a Service Organization Controls audit (SOC 2 Type II) and pen test.

HIPAA compliant

Privacy & security measures to ensure that PHI is appropriately safeguarded.

Permissions & governance

Provision users, enforce permissions, & implement robust governance.

IP protection policy

Neither Buster nor our model partners train models on customer data.

Self-hosted deployment

Deploy in your own air-gapped environment.

Secure connections

SSL and pass-through OAuth available.

FAQs

Frequently asked questions

Read the docs

Get started

How do I get started with Buster?

Getting started takes about 10 minutes. Check out the Quickstart guide in our docs to see how.

What kinds of tasks can Buster handle?

Buster especially excels at repetitive data engineering workflows. Any reactive task you might instruct a teammate to do, Buster can automate. You can see a few examples here.

How does Buster work with my existing tools?

Buster integrates directly with your tools across your stack. It has native integrations with tools like dbt Cloud, dbt Core, Airflow, Prefect, Dagster, GitHub, Slack, major data warehouses, and more. You can see our integrations here.

Is Buster secure?

Yes. Buster is SOC 2 Type II compliant and is built with enterprise-grade security practices. You can read more about our security policies here.

How does Buster use my data?

We never train models on your data, and we do not permit third parties to train on your data. You can read more about our data use & privacy policies here.

Can I use my own keys?

Yes. You can bring your own OpenAI or Anthropic API keys.

Docs

General

Other