feat: Add AUDIT_ONLY model kind for multi-table validation #5362

Lewis-Lyons · 2025-09-11T22:30:55Z

Add AUDIT_ONLY Model Kind for Multi-Table Validation

Summary

This PR introduces a new AUDIT_ONLY model kind to SQLMesh, addressing the gap in validating relationships between multiple tables without materializing unnecessary tables. This feature combines the benefits of models (DAG participation, dependencies, scheduling) with audit behavior (validation without materialization).

Problem Statement

Previously, SQLMesh users had to choose between:

Creating wasteful materialized models just to run cross-table validations
Using standalone audits that don't integrate well with model dependencies
Building external validation systems outside SQLMesh

Solution

The AUDIT_ONLY model kind enables users to:

Validate relationships across multiple tables (e.g., referential integrity)
Run complex validation queries that don't belong to a single model
Participate in the model DAG with proper dependencies
Avoid creating unnecessary materialized tables

Implementation Details

Core Changes

1. Model Kind Definition (`sqlmesh/core/model/kind.py`)

Added AUDIT_ONLY to ModelKindName enum
Created AuditOnlyKind class with configuration:
- blocking (default: True): Whether failures stop the pipeline
- max_failing_rows (default: 10): Number of sample rows in error messages
Marked as is_symbolic=True (no materialization)

2. Execution Strategy (`sqlmesh/core/snapshot/evaluator.py`)

Created AuditOnlyStrategy extending SymbolicStrategy
Executes validation query and checks for returned rows
Raises AuditError with sample data if validation fails
Properly integrated with the evaluation strategy routing

3. Parser Support (`sqlmesh/core/dialect.py`)

Added AUDIT_ONLY to list of model kinds that accept properties

4. Snapshot Definition (`sqlmesh/core/snapshot/definition.py`)

Fixed evaluatable property to include audit-only models
Ensures proper interval tracking for validation execution

Testing

Unit Tests (`tests/core/test_model.py`)

6 unit tests covering:
- Basic parsing and properties
- Blocking/non-blocking configuration
- Max failing rows configuration
- Python model support
- Full configuration scenarios
- Serialization/deserialization

Integration Tests (`tests/core/test_integration.py`)

6 integration tests validating:
- Validation success/failure scenarios
- Blocking vs non-blocking behavior
- Dependency tracking
- Scheduling with cron
- Metadata changes

Documentation

User Documentation Updates

docs/concepts/audits.md: Added comprehensive AUDIT_ONLY section under Advanced Usage
docs/concepts/models/model_kinds.md: Added detailed AUDIT_ONLY section with examples
docs/reference/model_configuration.md: Added AUDIT_ONLY configuration reference

Example Models (`examples/sushi/models/`)

Added 3 demonstration models (all non-blocking for demo purposes):

audit_order_integrity.sql: Validates referential integrity
audit_waiter_revenue_anomalies.sql: Detects revenue anomalies
audit_duplicate_orders.sql: Identifies duplicate orders

Usage Example

MODEL (
    name data_quality.order_validation,
    kind AUDIT_ONLY (
        blocking TRUE,
        max_failing_rows 20
    ),
    depends_on [orders, customers],
    cron '@daily'
);

-- Query returns 0 rows for success
SELECT 
    o.order_id,
    o.customer_id,
    'Missing customer record' as issue
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;

Key Differences from Traditional Audits

Feature	Traditional Audits	AUDIT_ONLY Models
Scope	Single model	Multiple models
Dependencies	Implicit	Explicit via depends_on
Materialization	N/A	Never materializes
Location	`audits/` directory	`models/` directory
Scheduling	With parent model	Independent cron
DAG Participation	Attached to model	Full model in DAG

Migration Path

No breaking changes to existing models or audits
Optional feature - only use when needed
Can gradually migrate complex audits to audit-only models

Testing Instructions

Run unit tests:

pytest tests/core/test_model.py -k audit_only -xvs

Run integration tests:

pytest tests/core/test_integration.py -k audit_only -xvs

Try the sushi examples:

cd examples/sushi
sqlmesh plan
# Note: Example models are non-blocking so they won't fail the pipeline

Create a test AUDIT_ONLY model:

-- Save as models/test_audit.sql
MODEL (
    name test.audit_validation,
    kind AUDIT_ONLY,
    depends_on [your_table1, your_table2]
);

-- This should return 0 rows for success
SELECT * FROM your_table1 
WHERE some_condition_that_indicates_invalid_data;

Related Issues

Addresses the need for multi-table validation without materialization.

Notes for Reviewers

The feature is designed to be non-intrusive and backward compatible
Example models in sushi are set to non-blocking to avoid disrupting tests
Documentation emphasizes when to use AUDIT_ONLY vs traditional audits
The implementation follows existing SQLMesh patterns for symbolic models

Future Enhancements (Not in this PR)

Support for incremental validation by time range
Configurable number of failing rows to capture
Different visualization in UI/lineage graph

Introduces a new model kind that validates data relationships across multiple tables without materializing results. Combines model benefits (DAG participation, dependencies) with audit behavior (validation only). - Add AUDIT_ONLY to ModelKindName enum and create AuditOnlyKind class - Implement AuditOnlyStrategy for execution without materialization - Add comprehensive unit and integration tests - Update documentation with usage examples and best practices - Add three example models to sushi project demonstrating use cases

CLAassistant · 2025-09-11T22:31:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

- Handle potential None return from fetchone() properly - Apply ruff formatting

- Update model counts in analytics and integration tests - Account for 3 new AUDIT_ONLY models in sushi example - Fix snapshot count assertions

- Fix test_forward_only_plan_with_effective_date to handle audit_waiter_revenue_anomalies - Update assertions to check snapshot IDs in a set rather than exact order - Revert incorrect change to test_migrate_rows (uses fixtures, not live models)

This file should not have been committed to the repository. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

AUDIT_ONLY models are symbolic and don't create physical tables, but they still need to be included in plans so their validation queries can run. Changes: - Exclude symbolic models from missing_intervals in Plan to prevent them from being scheduled for backfill - Update integration tests to filter out AUDIT_ONLY models when counting new snapshots and checking intervals - Fix test validation to skip table existence checks for symbolic models - Distinguish between AUDIT_ONLY and EXTERNAL models (both symbolic but EXTERNAL models still track intervals) This ensures AUDIT_ONLY models serve their validation purpose without participating in the physical deployment lifecycle. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Lewis-Lyons and others added 5 commits September 11, 2025 18:49

fix: Fix mypy type error and formatting in evaluator.py

20c32f3

- Handle potential None return from fetchone() properly - Apply ruff formatting

fix: Update test expectations for AUDIT_ONLY models

510163c

- Update model counts in analytics and integration tests - Account for 3 new AUDIT_ONLY models in sushi example - Fix snapshot count assertions

Remove accidentally committed PR_DESCRIPTION.md

3c2cc3a

This file should not have been committed to the repository. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Apply ruff formatting to test_integration.py

62deaaf

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Lewis-Lyons marked this pull request as draft September 12, 2025 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add AUDIT_ONLY model kind for multi-table validation #5362

feat: Add AUDIT_ONLY model kind for multi-table validation #5362

Uh oh!

Lewis-Lyons commented Sep 11, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Sep 11, 2025

Uh oh!

Uh oh!

feat: Add AUDIT_ONLY model kind for multi-table validation #5362

Are you sure you want to change the base?

feat: Add AUDIT_ONLY model kind for multi-table validation #5362

Uh oh!

Conversation

Lewis-Lyons commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add AUDIT_ONLY Model Kind for Multi-Table Validation

Summary

Problem Statement

Solution

Implementation Details

Core Changes

1. Model Kind Definition (sqlmesh/core/model/kind.py)

2. Execution Strategy (sqlmesh/core/snapshot/evaluator.py)

3. Parser Support (sqlmesh/core/dialect.py)

4. Snapshot Definition (sqlmesh/core/snapshot/definition.py)

Testing

Unit Tests (tests/core/test_model.py)

Integration Tests (tests/core/test_integration.py)

Documentation

User Documentation Updates

Example Models (examples/sushi/models/)

Usage Example

Key Differences from Traditional Audits

Migration Path

Testing Instructions

Related Issues

Notes for Reviewers

Future Enhancements (Not in this PR)

Uh oh!

CLAassistant commented Sep 11, 2025

Uh oh!

Uh oh!

Lewis-Lyons commented Sep 11, 2025 •

edited

Loading

1. Model Kind Definition (`sqlmesh/core/model/kind.py`)

2. Execution Strategy (`sqlmesh/core/snapshot/evaluator.py`)

3. Parser Support (`sqlmesh/core/dialect.py`)

4. Snapshot Definition (`sqlmesh/core/snapshot/definition.py`)

Unit Tests (`tests/core/test_model.py`)

Integration Tests (`tests/core/test_integration.py`)

Example Models (`examples/sushi/models/`)