Bayesian reasoning where your data lives
postgres-bayes is our open-source research project that brings probabilistic inference directly into PostgreSQL. Instead of pulling data out of your database, running models in a separate Python process, and pushing results back, the inference runs where the data already lives.
This project is the research that earned us JEI (Jeune Entreprise Innovante) status from the French government.
What’s in the repo
Core algorithms, each adapted for database integration:
- Markov Chain Monte Carlo (MCMC) using the Metropolis-Hastings algorithm for sampling posterior distributions. We built this to estimate parameters of Gaussian distributions from observed data, which forms the backbone of most practical applications.
- Gibbs Sampling for joint distributions with correlated variables. Works well when you have multivariate data where variables depend on each other, like customer behavior influenced by multiple factors.
- Variational Inference (ADVI) via PyMC3 for cases where you need speed over exactness. Approximates the posterior with a simpler distribution.
- Approximate Bayesian Computation with rejection sampling for models where you can’t write down a likelihood function at all.
Real-world simulations that prove the algorithms work on actual problems:
- E-commerce recommendations that update product recommendation probabilities based on user behavior. Every click, view, or purchase refines the posterior for that user-product pair.
- Stock market regime detection using Hidden Markov Models to identify whether the market is in a stable or volatile state.
- Trading analysis with Bayesian linear regression for predicting returns based on market indices and interest rates.
Why this matters
Standard ML models give you a prediction. Bayesian models give you a prediction and tell you how confident you should be in it. That distinction matters when the cost of a wrong decision is high: fraud detection, medical recommendations, financial risk assessment, inventory planning under uncertainty.
Most tools for this kind of work run in Python notebooks, disconnected from the production database. We wanted to change that.
Get started
The repo includes everything you need to run the algorithms and simulations:
/algorithmsfor the core Bayesian implementations/simulationsfor the e-commerce, trading, and market examples/data_generationfor creating test datasets/docsfor additional documentation