The One File That Made My Forecasting System Maintainable
Published:
My first Python project had magic numbers everywhere. alpha=2.0 here, alpha=0.15 there… When I needed to change something, it was a nightmare. I had to find where the parameters are located. Sometimes I lost track of them and found inconsistency across the scripts which resulted in breaking the scripts. What changed everything was moving all my parameters into a single config.py file.
A config file, short for “configuration file”, is a file comprised of all the parameter settings used in the scripts. Instead of scattered approach, I put them in one place. My sample config file looks like this:
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
from typing import Dict
# Common imports used across all scripts
import pandas as pd
import numpy as np
# ----------------------------
# GLOBAL CONFIGURATION
# ----------------------------
@dataclass
class GlobalConfig:
"""
Central configuration for the entire forecasting system.
All scripts import this to ensure consistency.
"""
# =================================================================
# BASE DIRECTORIES
# =================================================================
project_root: Path = Path(__file__).parent
data_dir: Path = project_root / "data"
out_dir: Path = project_root / "output"
# ... 50+ more parameters for forecast settings, dates, paths`
@dataclass decorator
The decorator concept in Python is similar to other programming languages such as C#, TypeScript. Decorators can add and modify behaviors, add functionality, etc. In this project @dataclass automatically inlcude the __init__, __repr__, and __eq__ methods, saving me from writing 20+ lines of boilerplate code.
__post_init__ method
__post_init__ is called a “dunder method” (short for “double underscore”) which is a special method that Python calls automatically after __init__ finishes. Think of it as: “Do some extra setup after the object is created”. For example, the code below automatically creates output directories if they don’t exist. This means that I never have to remember to create output folders when I execute the scripts.
def __post_init__(self):
directories = [
self.out_dir,
self.forecast_input_parquet.parent,
self.out_forecast,
self.out_allocation,
self.out_cv,
self.out_03b_cv,
self.out_targets,
self.out_final
]
for directory in directories:
directory.mkdir(parents=True, exist_ok=True)
How to use config file?
Import config.py at the beginning of any script and reference its parameters.
# Import global config and common libraries
from config import global_config as gcfg, pd, np
...
dataset_xlsx: Path = gcfg.dataset_xlsx
dataset_sheet: str = gcfg.dataset_sheet
holidays_xlsx: Path = gcfg.holidays_xlsx
When to use config file?
If you are working on a one-off analysis with a single script, a config file might be overkill. However once you have
- Multiple scripts
- Shared parameters Then it’s time for a config file.
The if name == “main” guard
This is one of Python’s most common patterns. It means that “only run this code if this file is being run directly (not imported)”. It’s Python’s way of separating “Run as a program” (execute the script) from “Use as a library” (import functions from it).
What I learned: professional code is about making future changes easy.
