Introduce TBFPC and TBF_FCI causal discovery algorithms #1947

cetagostini · 2025-09-19T21:22:31Z

Description

This PR introduces two experimental causal discovery algorithms designed for fast, interpretable exploration of causal structures in tabular and time-series data.

Both methods are inspired by the constraint-based PC/FCI family (Spirtes, Glymour, Scheines, 2000) but replace the traditional partial correlation tests with a Bayes factor criterion based on ΔBIC (Kass & Raftery, 1995; Schwarz, 1978). This makes them simple, and computationally efficient — while still grounded in probability theory.

Includes comprehensive docstrings, public APIs, and extensive tests for both algorithms, covering edge rules, forbidden edges, time series support, and error handling.

Warning

Both algorithms are experimental and should be seen as initial structure-learning tools rather than definitive causal discovery.
They perform well for linear Gaussian/Gamma-like settings and provide interpretable graphs suitable for guiding further analysis.
For more complex or nonlinear systems, or where latent confounding is expected, more sophisticated methods (e.g. full FCI, PCMCI+, or Bayesian graph models) may be necessary.

Why this approach?

Target-first: aligns with many applied settings (e.g. economics, epidemiology, marketing) where the focus is on causes of a particular outcome. We don't want to explore a full grid of possible combinations of nodes, we know by a fact that nodes must direct ultimately to a specific other node, and certain node relations are not allowed or doesn't make sense in the ecosystem.

Draft example:
Code here - Only available to PyMC team

To-do's:

Create notebook example.
Function to get a linear pymc model from the DAG. (Done in BuildModelFromDAG class for causal components #1950)

Note

These methods return a skeleton graph with partial orientations (e.g. into the target variable and lag-based constraints). They should be considered as initial structure learners rather than complete CDAG/CPDAG estimators. Full orientation (e.g. Meek rules, v-structure identification) is possible as a later extension.

Related Issue

Closes #
Related to BuildModelFromDAG class for causal components #1950

Checklist

Checked that the pre-commit linting/style checks pass. Feel free to comment pre-commit.ci autofix to auto-fix.
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks) using numpydoc format.
If you are a pro: each commit corresponds to a relevant logical change

📚 Documentation preview 📚: https://pymc-marketing--1947.org.readthedocs.build/en/1947/

codecov · 2025-09-19T21:26:26Z

Codecov Report

❌ Patch coverage is 97.90698% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.54%. Comparing base (54de250) to head (be58888).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc_marketing/mmm/causal.py	97.90%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1947      +/-   ##
==========================================
+ Coverage   93.41%   93.54%   +0.12%     
==========================================
  Files          67       67              
  Lines        8521     8949     +428     
==========================================
+ Hits         7960     8371     +411     
- Misses        561      578      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cetagostini · 2025-09-19T22:05:30Z

Failing test related to: #1949 not the PR

juanitorduz · 2025-09-23T19:49:24Z

@carlosagostini I'm not familiar with these methods, so I'm unsure how to provide feedback. I can give you a code PR review, but is there anyone else who could potentially double-check the logic?

cetagostini · 2025-09-24T20:32:36Z

@carlosagostini I'm not familiar with these methods, so I'm unsure how to provide feedback. I can give you a code PR review, but is there anyone else who could potentially double-check the logic?

Sure, both are inspire in more formal methods, but they are experimental still. I'll add a notebook soon, showing more all the recent PRs this one included. Lets kick-off with the code review.

daniel-saunders-phil · 2025-09-24T20:46:02Z

Is the use of Bayes factors in the Spirites, Scheines, Glymour algorithm something you have developed or is it an approach based in some published paper?

I'm also wondering whether there are existing causal discovery libraries that perform similar roles (like implementations of causal discovery algorithms) or whether those libraries would appreciate this algorithm? Do you know what the landscape is like?

cetagostini · 2025-09-26T16:23:43Z

Is the use of Bayes factors in the Spirites, Scheines, Glymour algorithm something you have developed or is it an approach based in some published paper?

I'm also wondering whether there are existing causal discovery libraries that perform similar roles (like implementations of causal discovery algorithms) or whether those libraries would appreciate this algorithm? Do you know what the landscape is like?

@daniel-saunders-phil good point, short answer the algorithms in place are not grounded in some published paper.

Why? I have been during quite a while playing with causal discovery, mostly for marketing. Sadly, all these algorithms are great for biology, environmental science, medicine or engineering but not for marketing.

Over raw marketing data they bring big nonsensical answers, because in marketing we violate many of the assumptions those algorithms require. E.g: Some are not even suitable for time series, others are not suitable for dependent processes, and so on.

Does this mean it is impossible to make causal discovery for marketing? My opinion is no, marketing problems are quite structural. We know by the fact that certain relationships can't exist, all edges must ultimately direct to a specific target or kpi node, and we are certain about which latent process can be influencing certain nodes.

So, all this knowledge can reduce the search space and allow the algorithms to investigate over it, and return the most possible skeleton behind this.

That's what I am doing here, because I don't know about any other library doing the same. DoWhy and Causal learn are quite famous (They have traditional PC/GES/LinGAM) but at least with DoWhy you don't have anything quite connected or made for marketing, not sure causal learn, but if they have I would probably found it already.

Conclusion, the class is a target-oriented skeleton-discovery algorithm inspired by the PC/FCI family. Not a regular PC or FCI. Like PC, it discovers an undirected skeleton by removing edges when conditional independences are found. Like FCI, it allows forbidden edges as a way of encoding prior knowledge.

What are the differences? It does not perform full orientation (no Meek rules, no PAG). It adds a target-edge rule mechanism (any, conservative, full conditioning set), which biases discovery toward identifying direct drivers of a chosen target. It uses a Bayes factor (ΔBIC) conditional independence test instead of frequentist tests.

The time series one, on addition consider time as a dimension picking the lags from certain series and checking if these doesn't break conditional independence.

I have been experimenting with this for a few months with clients, and internally, I have a few examples showing limitations and capabilities (My plan is to add them in another PR). Ultimately they return skeletons, and users will need to decide the directionality of majority of the DAG. Coming next will be a PR to add falsification and sensitivity tests to see if the DAG is reliable or not.

juanitorduz

Thanks @cetagostini . Here is a first review on the code (but not really the logic as this goes beyond my area of expertise)

juanitorduz · 2025-10-04T08:41:12Z

pymc_marketing/mmm/causal.py

+import numpy as np
 import pandas as pd
+import pytensor
+import pytensor.tensor as tt


we usually use pt.

@cetagostini would you mind changing this to pt to have harmony in the code base? Usually tt remind us the old theano times :) ?

juanitorduz · 2025-10-04T08:45:38Z

pymc_marketing/mmm/causal.py

+        target_edge_rule: str = "any",
+        bf_thresh: float = 1.0,
+        forbidden_edges: Sequence[tuple[str, str]] | None = None,
+    ):


can we add doctrings for this init method? Also, can we use Pydantic to validate the call and use Field ensuting target is within a lsit of possible values (say with Literal type)?

pymc_marketing/mmm/causal.py

Copilot

Pull Request Overview

This PR introduces two experimental causal discovery algorithms (TBFPC and TBF_FCI) designed for fast, interpretable exploration of causal structures in tabular and time-series data. Both methods are inspired by the PC/FCI family but use Bayes factor criterion based on ΔBIC instead of traditional partial correlation tests.

Key changes:

Implementation of TBFPC algorithm for cross-sectional causal discovery with target-focused edge rules
Implementation of TBF_FCI algorithm for time-series causal discovery with lag support and contemporaneous relationships
Comprehensive test coverage for both algorithms including API validation, error handling, and edge case testing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
pymc_marketing/mmm/causal.py	Implements the core TBFPC and TBF_FCI causal discovery algorithms with comprehensive docstrings and public APIs
tests/mmm/test_causal.py	Adds extensive test coverage for both algorithms including parameterized tests for different configurations and error conditions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pymc_marketing/mmm/causal.py

juanitorduz · 2025-10-04T08:51:53Z

I asked Copilot for a review as well 🤗

Introduces the TBFPC (Target-first Bayes Factor PC) and TBF_FCI (Target-first Bayes Factor Temporal PC) classes for causal discovery using Bayes factor conditional independence tests. Includes comprehensive docstrings, public APIs, and extensive tests for both algorithms, covering edge rules, forbidden edges, time series support, and error handling.

juanitorduz

Little ask 🙏

juanitorduz · 2025-10-06T07:39:14Z

pymc_marketing/mmm/causal.py

+import numpy as np
 import pandas as pd
+import pytensor
+import pytensor.tensor as tt


@cetagostini would you mind changing this to pt to have harmony in the code base? Usually tt remind us the old theano times :) ?

cetagostini · 2025-10-06T08:45:56Z

I did 😄 but LLM revert it, looks like they like theano more @juanitorduz 🫠

cetagostini self-assigned this Sep 19, 2025

cetagostini marked this pull request as draft September 19, 2025 21:22

github-actions bot added MMM tests labels Sep 19, 2025

cetagostini added this to the 0.16.0 milestone Sep 19, 2025

juanitorduz modified the milestones: 0.16.0, 0.17.0 Sep 21, 2025

cetagostini requested review from juanitorduz, williambdean and daniel-saunders-phil September 22, 2025 20:46

cetagostini marked this pull request as ready for review September 22, 2025 20:46

juanitorduz requested changes Oct 4, 2025

View reviewed changes

juanitorduz requested a review from Copilot October 4, 2025 08:49

Copilot AI reviewed Oct 4, 2025

View reviewed changes

pymc_marketing/mmm/causal.py Outdated Show resolved Hide resolved

pymc_marketing/mmm/causal.py Outdated Show resolved Hide resolved

pymc_marketing/mmm/causal.py Outdated Show resolved Hide resolved

pymc_marketing/mmm/causal.py Outdated Show resolved Hide resolved

cetagostini added 3 commits October 5, 2025 18:25

Interesting jeje

1a76600

Update causal.py

2db43b0

cetagostini force-pushed the cetagostini/causal_discovery_utilities branch from b7e097c to 2db43b0 Compare October 5, 2025 15:30

cetagostini added 5 commits October 5, 2025 18:34

Run pre-commit fixes

942af43

Update causal.py

a844e3e

Document TBF_FCI public API

47c924f

Document internal TBF_FCI helpers

4fe1cf9

Merge branch 'main' into cetagostini/causal_discovery_utilities

1baaaae

cetagostini requested a review from juanitorduz October 5, 2025 18:42

juanitorduz requested changes Oct 6, 2025

View reviewed changes

cetagostini added 2 commits October 6, 2025 16:05

Update causal.py

e47f4f2

Merge branch 'main' into cetagostini/causal_discovery_utilities

be58888

cetagostini requested a review from juanitorduz October 6, 2025 19:54

juanitorduz approved these changes Oct 6, 2025

View reviewed changes

Introduce TBFPC and TBF_FCI causal discovery algorithms #1947

Are you sure you want to change the base?

Introduce TBFPC and TBF_FCI causal discovery algorithms #1947

Conversation

cetagostini commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Checklist

Uh oh!

codecov bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cetagostini commented Sep 19, 2025

Uh oh!

juanitorduz commented Sep 23, 2025

Uh oh!

cetagostini commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daniel-saunders-phil commented Sep 24, 2025

Uh oh!

cetagostini commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanitorduz left a comment

Choose a reason for hiding this comment

Uh oh!

juanitorduz Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

juanitorduz Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

juanitorduz Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juanitorduz commented Oct 4, 2025

Uh oh!

juanitorduz left a comment

Choose a reason for hiding this comment

Uh oh!

juanitorduz Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

cetagostini commented Oct 6, 2025

Uh oh!

Uh oh!

cetagostini commented Sep 19, 2025 •

edited

Loading

codecov bot commented Sep 19, 2025 •

edited

Loading

cetagostini commented Sep 24, 2025 •

edited

Loading

cetagostini commented Sep 26, 2025 •

edited

Loading