Merge pull request #32 from TRI-ML/auto_switch

HarukiNishimura-TRI · web-flow · commit 6db47ef502f7 · 2025-10-16T15:13:43.000-07:00
Automatic switch between STEP and Lai
diff --git a/README.md b/README.md
@@ -29,11 +29,7 @@ Features in development:
 ## Installation Instructions \[Standard\]
 The basic environmental setup is shown below. A virtual / conda environment may be constructed; however, the requirements are quite lightweight and this is probably not needed.
 ```bash
-$ cd <some_directory>
-$ git clone git@github.com:TRI-ML/sequentialized_barnard_tests.git
-$ cd sequentialized_barnard_tests
-$ pip install -r requirements.txt
-$ pip install -e .
+$ pip install sequentialized_barnard_tests
 ```
 
 ## Installation Instructions \[Dev\]
@@ -52,9 +48,87 @@ $ pre-commit install
 We assume that any specified virtual / conda environment has been activated for all subsequent code snippets.
 
 # Quick Start Guides
-We include key notes for understanding the core ideas of the STEP code. Quick-start resources are included in both shell script and notebook form.
+## Convenience: Automatic Test Instantiation
+
+For convenience, you can automatically select between STEP and Lai (a baseline method) depending on the value of `n_max` using the factory function in `auto.py`:
+
+```python
+from sequentialized_barnard_tests import get_mirrored_test
+test = get_mirrored_test(n_max, alternative, alpha, verbose=True, ...)
+```
+If `n_max > 500`, this will instantiate a `MirroredLaiTest`, which is a computationally efficient baseline with comparable performance to `MirroredStepTest` for a large-enough sample size; otherwise, it will use the more powerful `MirroredStepTest` which can take longer to synthesize the decision rule. All shared and class-specific arguments can be passed as keyword arguments.
+
+## Example Usage
+
+Below is a minimum example code with different policy evaluation data, leading to three distinct evaluation results.
+
+### Case 1: Test yields `AcceptAlternative`
+```python
+from sequentialized_barnard_tests import get_mirrored_test, Hypothesis
+
+n_max = 100  # maximum sample size is 100 (per policy)
+alternative = Hypothesis.P0LessThanP1  # we want to test if "success rate of the first policy < success rate of the second policy"
+alpha = 0.05  # false positive rate is 5%
+
+test = get_mirrored_test(n_max=n_max, alternative=alternative, alpha=alpha)
+
+success_array_policy_0 = [False] * 10  # the first policy failed 10 times
+success_array_policy_1 = [True] * 10  # the second policy succeeded 10 times 
+
+result = test.run_on_sequence(success_array_policy_0, success_array_policy_1)
+decision = result.decision
+print(decision)  # AcceptAlternative: success rate of the first policy < success rate of the second policy with 95% confidence
+```
 
-## Quick Start Guide: Making a STEP Policy for Specific \{n_max, alpha\}
+### Case 2: Test yields `AcceptNull`
+```python
+from sequentialized_barnard_tests import get_mirrored_test, Hypothesis
+
+n_max = 100  # maximum sample size is 100 (per policy)
+alternative = Hypothesis.P0LessThanP1  # we want to test if "success rate of the first policy < success rate of the second policy"
+alpha = 0.05  # false positive rate is 5%
+
+test = get_mirrored_test(n_max=n_max, alternative=alternative, alpha=alpha)
+
+success_array_policy_0 = [True] * 10  # the first policy succeeded 10 times
+success_array_policy_1 = [False] * 10  # the second policy failed 10 times 
+
+result = test.run_on_sequence(success_array_policy_0, success_array_policy_1)
+decision = result.decision
+print(decision)  # AcceptNull: success rate of the first policy > success rate of the second policy with 95% confidence
+```
+
+Note: `AcceptNull` is a valid decision only for "mirrored" tests.
+In our terminology, a mirrored test is one that runs two one-sided tests
+simultaneously, with the null and the alternaive flipped from each other.
+(Because of the monotonicity of the test statistic, mirrored tests suffer no penalty for
+running two tests simultaneously, and therefore essentially dominate one-sided tests.)
+In the example above, the alternative is `Hypothesis.P0LessThanP1` and the decision is
+`Decision.AcceptNull`, which should be interpreted as accepting `Hypothesis.P0MoreThanP1`.
+If you rather want a more conventional one-sided test, you can instantiate one by calling
+`get_test` instead of `get_mirrored_test`.
+
+### Case 3: Test yields `FailToDecide`
+```python
+from sequentialized_barnard_tests import get_mirrored_test, Hypothesis
+
+n_max = 100  # maximum sample size is 100 (per policy)
+alternative = Hypothesis.P0LessThanP1  # we want to test if "success rate of the first policy < success rate of the second policy"
+alpha = 0.05  # false positive rate is 5%
+
+test = get_mirrored_test(n_max=n_max, alternative=alternative, alpha=alpha)
+
+success_array_policy_0 = [True, False, False, True]  # the first policy succeeded 2 out of 4 times
+success_array_policy_1 = [False, True, True, True]  # the second policy succeeded 3 out of 4 times
+
+result = test.run_on_sequence(success_array_policy_0, success_array_policy_1)
+decision = result.decision
+print(decision)  # FailToDecide: difference was not statistically separable; user can collect 100 - 4 = 96 more rollouts for each policy to re-run the test.
+```
+
+## Key Notes for Understanding the Core Ideas of STEP Code
+
+We include key notes for understanding the core ideas of the STEP code. Quick-start resources are included in both shell script and notebook form.
 
 ### (1A) Understanding the Accepted Shape Parameters
 In order to synthesize a STEP Policy for specific values of n_max and alpha, one additional set of parametric decisions will be required. The user will need to set the risk budget shape, which is specified by choice of function family (p-norm vs zeta-function) and particular shape parameter. The shape parameter is real-valued; it is used directly for zeta functions and is exponentiated for p-norms.
@@ -87,20 +161,22 @@ Generalizing the accepted risk budgets to arbitrary monotonic sequences $`\{0, \
 Having decided an appropriate form for the risk budget shape, policy synthesis is straightforward to run. From the base directory, the general command would be:
 
 ```bash
-$ python scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha} -pz {shape_parameter} -up {use_p_norm}
+$ python sequentialized_barnard_tests/scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha} -pz {shape_parameter} -up {use_p_norm}
 ```
 
+Note: This script will be called automatically upon instantiation of a test object, if the corresponding polciy file is missing from `sequentialized_barnard_tests/policies/`.
+
 ### (2B) What If I Don't Know the Right Risk Budget?
 We recommend using the default linear risk budget, which is the shape *used in the paper*. This corresponds to \{shape_parameter\}$`= 0.0`$ for each shape family. Thus, *each of the following commands constructs the same policy*:
 
 ```bash
-$ python scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha}
+$ python sequentialized_barnard_tests/scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha}
 ```
 ```bash
-$ python scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha} -pz {0.0} -up "True"
+$ python sequentialized_barnard_tests/scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha} -pz {0.0} -up "True"
 ```
 ```bash
-$ python scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha} -pz {0.0} -up "False"
+$ python sequentialized_barnard_tests/scripts/synthesize_general_step_policy.py -n {n_max} -a {alpha} -pz {0.0} -up "False"
 ```
 
 Note: For \{shape_parameter\} $`\neq 0`$, the shape families differ. Therefore, the choice of \{use_p_norm\} *will affect the STEP policy*.
@@ -113,7 +189,7 @@ $ sequentialized_barnard_tests/policies/
 
 - At present, we have not tested extensively beyond \{n_max\}$`=500`$. Going beyond this limit may lead to issues, and the likelihood will grow the larger \{n_max\} is set to be. The code will also require increasing amounts of RAM as \{n_max\} is increased.
 
-## Quick Start Guide: Evaluation on Real Data
+## Script-Based Evaluation on Real Data
 
 We now assume that a STEP policy has been constructed for the target problem. This can either be one of the default policies, or a newly constructed one following the recipe in the preceding section.
 
diff --git a/sequentialized_barnard_tests/__init__.py b/sequentialized_barnard_tests/__init__.py
@@ -2,3 +2,4 @@
 from .lai import LaiTest, MirroredLaiTest
 from .savi import MirroredSaviTest, SaviTest
 from .step import MirroredStepTest, StepTest
+from .auto import get_test, get_mirrored_test
diff --git a/sequentialized_barnard_tests/auto.py b/sequentialized_barnard_tests/auto.py
@@ -0,0 +1,52 @@
+"""
+Factory function for automatic selection of STEP or Lai based on n_max.
+"""
+
+from sequentialized_barnard_tests.step import MirroredStepTest, StepTest
+from sequentialized_barnard_tests.lai import MirroredLaiTest, LaiTest
+
+
+def get_test(n_max: int, alternative, alpha: float, verbose: bool = False, **kwargs):
+    """
+    Factory function to select StepTest or LaiTest based on n_max.
+    Uses LaiTest for n_max > 500, otherwise StepTest.
+
+    Shared arguments:
+        n_max (int): Maximal sequence length.
+        alternative: Specification of the alternative hypothesis.
+        alpha (float): Significance level of the test.
+        verbose (bool, optional): If True, print outputs to stdout.
+    Additional arguments for each class can be passed as keyword arguments.
+    """
+    if n_max > 500:
+        if verbose:
+            print("Using LaiTest for n_max > 500")
+        return LaiTest(alternative, n_max, alpha, verbose=verbose, **kwargs)
+    else:
+        if verbose:
+            print("Using StepTest for n_max <= 500")
+        return StepTest(alternative, n_max, alpha, verbose=verbose, **kwargs)
+
+
+def get_mirrored_test(
+    n_max: int, alternative, alpha: float, verbose: bool = False, **kwargs
+):
+    """
+    Factory function to select MirroredStepTest or MirroredLaiTest based on n_max.
+    Uses MirroredLaiTest for n_max > 500, otherwise MirroredStepTest.
+
+    Shared arguments:
+        n_max (int): Maximal sequence length.
+        alternative: Specification of the alternative hypothesis.
+        alpha (float): Significance level of the test.
+        verbose (bool, optional): If True, print outputs to stdout.
+    Additional arguments for each class can be passed as keyword arguments.
+    """
+    if n_max > 500:
+        if verbose:
+            print("Using MirroredLaiTest for n_max > 500")
+        return MirroredLaiTest(alternative, n_max, alpha, verbose=verbose, **kwargs)
+    else:
+        if verbose:
+            print("Using MirroredStepTest for n_max <= 500")
+        return MirroredStepTest(alternative, n_max, alpha, verbose=verbose, **kwargs)
diff --git a/sequentialized_barnard_tests/scripts/synthesize_general_step_policy.py b/sequentialized_barnard_tests/scripts/synthesize_general_step_policy.py
@@ -4,13 +4,13 @@
 
 Example Default Usage (all equivalent, using default params):
 
-    (1) python scripts/synthesize_step_policy.py
-    (2) python scripts/synthesize_step_policy.py -n 200 -a 0.05
-    (3) python scripts/synthesize_step_policy.py --n_max 200 --alpha 0.05 --n_points 129
+    (1) python sequentialized_barnard_tests/scripts/synthesize_step_policy.py
+    (2) python sequentialized_barnard_tests/scripts/synthesize_step_policy.py -n 200 -a 0.05
+    (3) python sequentialized_barnard_tests/scripts/synthesize_step_policy.py --n_max 200 --alpha 0.05 --n_points 129
 
 Example Non-Default Parameter Usage:
 
-    python scripts/synthesize_step_policy.py -n 400
+    python sequentialized_barnard_tests/scripts/synthesize_step_policy.py -n 400
 """
 
 import argparse
@@ -481,7 +481,7 @@ def run_step_policy_synthesis(
         "-up",
         "--use_p_norm",
         type=str,
-        default=False,
+        default="False",
         help=(
             "Toggle whether to use p_norm or zeta function shape family for the risk budget. "
             "If True, uses p_norm shape; else, uses zeta function shape family. "
@@ -530,7 +530,7 @@ def run_step_policy_synthesis(
         major_axis_length = args.major_axis_length
 
     base_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
-    results_path = f"sequentialized_barnard_tests/policies/n_max_{args.n_max}_alpha_{args.alpha}_shape_parameter_{args.log_p_norm}_pnorm_{args.use_p_norm}/"
+    results_path = f"policies/n_max_{args.n_max}_alpha_{args.alpha}_shape_parameter_{args.log_p_norm}_pnorm_{args.use_p_norm}/"
     full_save_path = os.path.normpath(os.path.join(base_path, results_path))
     if not os.path.isdir(full_save_path):
         os.makedirs(full_save_path)
diff --git a/sequentialized_barnard_tests/step.py b/sequentialized_barnard_tests/step.py
@@ -10,8 +10,9 @@
 
 import os
 import pickle
+import subprocess
+import sys
 import warnings
-from pathlib import Path
 from typing import Optional, Union
 
 import numpy as np
@@ -323,15 +324,10 @@ def load_existing_policy(
             verbose (bool, optional): If True, print the outputs to stdout.
                 Defaults to False.
         """
-        # print(str(Path(os.path.dirname(os.path.abspath(__file__))).resolve()))
-        # print(os.path.dirname(os.path.abspath(__file__)))
-        # print(os.path.dirname(__file__))
-        # self.policy_path = os.path.join(
-        #     str(
-        #         Path(os.path.dirname(os.path.abspath(__file__))).resolve()
-        #     ),  # os.path.dirname(os.path.abspath(__file__)),
-        #     f"policies/n_max_{self.n_max}_alpha_{self.alpha}_shape_parameter_{self.shape_parameter}_pnorm_{self.use_p_norm}/policy_compressed.pkl",
-        # )
+        # Determine the path to the synthesis script
+        script_dir = os.path.join(os.path.dirname(__file__), "scripts")
+        synth_script = os.path.join(script_dir, "synthesize_general_step_policy.py")
+
         policy_path = os.path.join(
             os.path.dirname(__file__),
             f"policies/n_max_{self.n_max}_alpha_{self.alpha}_shape_parameter_{self.shape_parameter}_pnorm_{self.use_p_norm}/",
@@ -342,11 +338,41 @@ def load_existing_policy(
             with open(policy_path, "rb") as filename:
                 self.policy = pickle.load(filename)
             self.need_new_policy = False
-        except:
-            warnings.warn(
-                f"Current policy path: {policy_path}"
-                "Unable to find policy with the assigned test parameters. An additional policy synthesis procedure may be required."
-            )
+        except Exception as e:
+            print(f"Policy not found at {policy_path}. Attempting synthesis...")
+            # Build command to synthesize policy
+            cmd = [
+                sys.executable,
+                synth_script,
+                "--n_max",
+                str(self.n_max),
+                "--alpha",
+                str(self.alpha),
+                "--n_points",
+                "129",  # default value, could be parameterized
+                "--lambda_value",
+                "2.1",  # default, could be parameterized
+                "--major_axis_length",
+                "1.4",  # default, could be parameterized
+                "--log_p_norm",
+                str(self.shape_parameter),
+                "--use_p_norm",
+                str(self.use_p_norm),
+            ]
+            print(f"Running synthesis command: {' '.join(cmd)}")
+            result = subprocess.run(cmd, cwd=script_dir, capture_output=False)
+            # Try loading again
+            try:
+                with open(policy_path, "rb") as filename:
+                    self.policy = pickle.load(filename)
+                self.need_new_policy = False
+                print("Policy synthesis and loading successful.")
+            except Exception as e2:
+                warnings.warn(
+                    f"Unable to synthesize or load policy at {policy_path}. Error: {e2}"
+                )
+                self.policy = None
+                self.need_new_policy = True
 
         self.policy_path = policy_path
 
diff --git a/setup.py b/setup.py
@@ -2,14 +2,15 @@
 
 setup(
     name="sequentialized_barnard_tests",
-    version="0.0.1",
+    version="0.0.4",
     description="Sequential statistical hypothesis testing for two-by-two contingency tables.",
     authors=["David Snyder", "Haruki Nishimura"],
     author_emails=["dasnyder@princeton.edu", "haruki.nishimura@tri.global"],
     license="MIT",
     packages=find_packages(),
     package_data={
         "sequentialized_barnard_tests": [
+            "scripts/synthesize_general_step_policy.py",
             "data/lai_calibration_data.npy",
             "policies/n_max_100_alpha_0.05_shape_parameter_0.0_pnorm_False/policy_compressed.pkl",
             "policies/n_max_200_alpha_0.05_shape_parameter_0.0_pnorm_False/policy_compressed.pkl",