CImpact is a modular causal impact analysis library for Python, supporting multiple time series models, including TensorFlow , Prophet, and Pyro. It provides a flexible framework for estimating the causal effect of an intervention on time series data.
- Introduction
- Features
- Why CImpact?
- Code Structure
- Installation
- Getting Started
- Evaluation Methods
- Performance Comparison
- Future Plans
- Contributing
- License
- Acknowledgements
CImpact is designed to help analysts and data scientists assess the impact of an intervention on time series data. By leveraging different statistical models, CImpact aims to provide robust causal inference results, accommodating various use cases and preferences in model selection.
CImpact extends the functionalities of the tfcausalimpact library by incorporating support for multiple modeling approaches. This modular design allows users to choose the best model for their specific needs and compare performance and results across different models. We highly recommend reading this detailed blog post explainng the causal inference in great detail.
- Support for Multiple Models: Utilize TensorFlow, Prophet, or Pyro models according to your needs.
- Modular Design: Easily extend the library with new models due to its adapter-based architecture.
- Flexible Configuration: Customize model settings and hyperparameters to suit specific analysis requirements.
- Comprehensive Evaluation: Integrated methods for assessing model performance and the causal impact of interventions.
- Enhanced Visualization: Generate insightful plots for better interpretation of results.
CImpact/
├── .github/ # GitHub configuration files for workflows and actions
├── assets/ # Stores media assets, such as the project logo, used in the README or documentation
├── examples/ # Example scripts showcasing usage of the library and sample data for testing
├── scripts/ # Utility scripts for code cleaning, formatting, and other maintenance tasks
├── src/ # Core library source code, including main modules and adapters for different models
├── tests/ # Test cases for ensuring code functionality and correctness across modules
├── .coveragerc # Configuration file for coverage reporting, specifying which files to include/exclude
├── .gitignore # Specifies files and directories for Git to ignore
├── .pylintrc # Configuration for Python linter (Pylint) to enforce code style and quality standards
├── CONTRIBUTING.md # Guidelines for contributing to the project
├── LICENSE.txt # License information for the project, detailing usage rights and limitations
├── Makefile # Commands for building, testing, and packaging the project in a standard way
├── README.md # Project introduction, usage instructions, and documentation (this file)
├── __init__.py # Marks the directory as a Python package
├── pyproject.toml # Python packaging configuration file for managing dependencies and metadata
├── requirements.txt # List of Python dependencies required to run the project
CImpact can be installed using one of the following methods:
The stable release of CImpact will soon be available on PyPI. Once published, you can install it with:
pip install cimpact
Stay tuned for updates on the stable release!
To access the latest features or contribute to development, you can manually install CImpact by building it from source. Follow the steps below:
Step 1: Clone the Repository
Clone the CImpact repository to your local machine:
git clone https://github.com/Sanofi-Public/CImpact.git
cd CImpact
Step 2: Install Dependencies
Install the required dependencies listed in the requirements.txt
file:
pip install -r requirements.txt
Step 3: Build the Wheel File
Build the library into a Python Wheel file:
python -m build
The generated .whl
file will be located in the dist/
directory.
Step 4: Install the Wheel File
Use pip
to install the wheel file:
pip install dist/cimpact-<version>.whl
Replace <version>
with the version number of the generated .whl
file. This will install the cimpact library in your environment and now you can use it using the following steps.
import pandas as pd
from cimpact import CausalImpactAnalysis
# Load your data
data = pd.read_csv('https://raw.githubusercontent.com/Sanofi-Public/CImpact/master/examples/google_data.csv')
# Define the configuration for the model
model_config = {
'model_type': 'tensorflow', # Options: 'tensorflow', 'prophet', 'pyro'
'model_args': {
'standardize': True,
'learning_rate': 0.01,
'num_variational_steps': 1000,
'fit_method': 'vi'
}
}
# Define the pre and post-intervention periods
pre_period = ['2020-01-01', '2020-03-13']
post_period = ['2020-03-14', '2020-03-31']
#Define index column and target column
index_col = 'date'
target_col = 'y'
# Define color variables (optional arguments)
observed_color = "#000000" # Black for observed
predicted_color = "#7A00E6" # Sanofi purple for predicted
ci_color = "#D9B3FF66" # Light lavender with transparency for CI
intervention_color = "#444444" # Dark gray for intervention
figsize = (10,7)
# Run the analysis
analysis = CausalImpactAnalysis(data, pre_period, post_period, model_config, index_col, target_col, observed_color, predicted_color, ci_color, intervention_color)
result = analysis.run_analysis()
print(result)
Posterior inference {CIMpact}
Average | Cumulative | |
---|---|---|
Actual | 145 | 2,614 |
Prediction (s.d.) | 180 (10) | 3,237 (10) |
95% CI | [144, 218] | [2,880, 3,594] |
Absolute effect (s.d.) | -35 (15) | -623 (15) |
95% CI | [-61, -11] | [-980, -266] |
Relative effect (s.d.) | -19.08% (7.58%) | -19.08% (7.58%) |
95% CI | [-32.42%, -6.66%] | [-32.42%, -6.66%] |
Posterior tail-area probability p: | 0.15842 | |
Posterior probability of a causal effect: | 84.16% |
Note
Please refer to examples/how-to-use.md
for detailed model configuration instructions and additional usage examples of the library.
CImpact includes comprehensive evaluation methods to assess model performance and the causal impact of interventions:
- Summary Statistics: Provides point estimates and confidence intervals for the estimated impact.
- Visualization: Plots observed data, counterfactual predictions, and estimated impact over time.
- Diagnostics: Offers residual analysis and model diagnostics to assess fit.
Our performance comparisons highlight:
- TensorFlow: Robust performance with flexibility for advanced inference methods like variational inference and HMC.
- Prophet: User-friendly with built-in seasonality and holiday effects; may be slower with large datasets.
- Pyro: Strong Bayesian inference capabilities; may require more computational resources.
We welcome contributions to enhance and refine the library. While we are particularly interested in contributions in the following areas, we are open to other suggestions as well. If you have any ideas, please create an issue to discuss potential contributions.
- Add new, qualified models to broaden analytical options. We are currently exploring zero-shot learning models like Google timesfm or Amazon Chronos.
- Enhanced Visualization**: Develop advanced plotting functions for deeper insights and a better understanding of results.
- Publish detailed tutorials to help users in effectively utilizing the library.
Contributions are welcome! Please see our Contributing Guidelines for details on how to participate.
This work is available for academic research and non-commercial use only. See the file for details.
We are thankful of Google research (cited below1) team for publishing "Inferring causal impact using Bayesian structural time-series models" research paper and sharing orginal R package to open souce community. We also extend our gratitude to the authors of tfcausalimpact for their foundational work, which inspired this library.
Footnotes
-
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal impact using Bayesian structural time-series models. ↩