reverse-engineering 1,500 lines of R into a real API

I inherited 1,500 lines of R code that calculated economic impacts for major energy projects. The model was mathematically sound - full Leontief inverse calculations, interprovincial trade flows, the works - but practically unusable. Hardcoded file paths. Manual data entry. CSV-only outputs. No API, no validation, no way to integrate it into modern workflows. Analysts had to edit the script by hand for every project, rerun it, and manually collect outputs.

The intellectual capital was strong. The delivery mechanism was fragile. I spent six months fixing that.

What Input-Output models do

Input-Output analysis maps how industries interact across economies. The models solve large matrix systems representing inter-industry flows - how a dollar spent in one province ripples into jobs, supply chains, and tax receipts elsewhere. What happens to Ontario’s economy when Alberta builds wind farms? How much carbon tax revenue flows from an oil sands expansion? These questions require sparse matrices of thousands of coefficients, ideal for computational efficiency but challenging to interpret. Automating them requires both economic literacy and technical diligence.

Nobel laureate Wassily Leontief formalized the methodology. The Canadian model I worked with implemented it correctly, but the code was stuck in the past.

The conversion

I built a Python API that preserved the theoretical rigor while creating a modern platform. Clean architecture with dependency injection, typed configurations, and modular components. FastAPI replaced script editing - analysts could now call endpoints with JSON payloads instead of manipulating R code. Technology Economic Assessment (TEA) models could run with inputs validated automatically. Outputs included GDP, jobs, emissions, and tax impacts, all delivered in seconds through a scalable interface.

The reverse-engineering followed a disciplined process: code archaeology, mathematical mapping, and incremental validation. I traced every R matrix operation - Leontief inverses, inventory rates, all of it - and translated them into explicit Python equivalents. Validation harnesses compared outputs line by line, ensuring tolerances of 1e-10 were maintained. Supply-use balances, trade consistency checks, and special commodity treatments got encoded into unit tests. This ensured fidelity to the original economics while opening space for architectural improvements.

Technical wins

OpenIO-Canada datasets let me integrate 33 greenhouse gases, 310 pollutants, and detailed water and mineral use by industry. Sparse matrix methods from SciPy replaced dense R operations, improving both runtime and memory efficiency. Multi-period data structures supported forward-looking hydrocarbon production forecasts and renewable deployment scenarios. Validation frameworks enforced economic identities across every calculation, reducing the risk of silent errors.

Performance

Response times fell from hours to seconds, with horizontal scaling enabling concurrent requests. Memory usage dropped through sparse representation. Benchmarking showed identical outputs to the R model within negligible tolerances. A full QA/QC framework documented every comparison, creating transparency and reproducibility. Policymakers could run scenarios on demand. Researchers gained a platform ready for dashboards, risk assessments, and reproducible workflows.

Real-world examples

A 100 MW wind farm in Alberta: $145.2 million of GDP, 1,247 jobs, $8.3 million in provincial tax revenue, with clear accounting of GHG reductions. Oil sands tax projections: multi-year breakdowns of federal and provincial revenues, capturing direct, indirect, and spillover effects. Results that were once buried in spreadsheet outputs became API-ready data points for investor models and policy briefs. SPE papers and Statistics Canada sources could be cross-validated directly within the system.

Architecture

Separate modules managed tax, employment, environment, and validation. Configuration via environment variables instead of hardcoding. Continuous testing ensured every calculation aligned with Statistics Canada Supply-Use data. Deployment via Docker enabled portability from local workstations to cloud environments. The stack: FastAPI, Pydantic, SciPy, and sparse linear algebra, with a Redis caching layer and Spark integration slated for future scale.

Why this matters

Governments and companies can now conduct multi-scenario assessments quickly, cutting consulting costs and accelerating decision cycles. Researchers gain a transparent and extensible framework for economic and environmental analysis. Technical teams inherit clean, modular codebases instead of sprawling scripts. The modernization bridges the gap between sophisticated economics and modern decision-making workflows, where speed, scale, and transparency matter as much as theoretical precision.

What’s next

Planned features: Monte Carlo simulations to capture uncertainty, automated data refreshes from government feeds, dynamic modeling of NAFTA and global trade flows. Predictive modules could layer on top, moving from descriptive IO analysis to forward-looking economic forecasting. On the technical side, GraphQL interfaces and interactive web dashboards could extend accessibility for non-technical users.

Legacy models hold immense intellectual value, but without modernization they remain underused assets. The transition from static scripts to dynamic APIs is not merely a technical upgrade - it is a structural shift in how economic analysis informs policy and investment.