Architecture¶
glm-factor-optimizer keeps the low-level pieces visible. You can use the
automatic workflows, or call the same binning, fitting, scoring, and logging
functions yourself.
Layer 1: Specs and Model Primitives¶
Core modules:
bins.pymodel.pymetrics.pyoptimize.pypenalties.py
Responsibilities:
- create JSON-serializable numeric and categorical specs
- apply specs to dataframes
- fit GLMs
- score predictions
- optimize one factor at a time
- define reusable optimization penalties
Core mechanics live here.
Layer 2: Analysis Helpers¶
Helper modules:
screening.pyaggregation.pysampling.pydiagnostics.pyvalidation.pyruns.py
Responsibilities:
- rank candidate factors
- create representative samples
- aggregate model development tables
- find candidate interactions
- produce validation reports
- save run artifacts
Use them in notebooks or as building blocks for automatic workflows.
Layer 3: Notebook Study¶
Main modules:
study.pyfactor.py
Responsibilities:
- own split train, validation, and holdout samples
- manage accepted and rejected factor specs
- keep audit history
- compare candidate bins to the current model
- refine factors with the current full model fixed
- test interactions explicitly
- finalize and save model artifacts
Use GLMStudy when you want to design factors step by step and keep a record of
the choices you made.
Layer 4: Automatic Workflow¶
Main module:
workflow.py
GLMWorkflow is a compact automatic wrapper for baselines, experiments, and
examples. For close review, use GLMStudy instead.
Layer 5: Spark Backend¶
Spark modules live under:
glm_factor_optimizer.spark
Spark support lives in its own package namespace, so pandas users can install the core package without PySpark. Spark imports are lazy, and the same JSON specs can be used with Spark dataframes and Spark GLM jobs.
Data Flow¶
A common flow is:
raw dataframe
-> split
-> rank candidate raw factors
-> propose factor spec
-> apply spec to train and validation
-> fit model with accepted factors fixed
-> score validation
-> accept or reject
-> fit main effects
-> refine accepted factors
-> test interactions
-> finalize on holdout
-> save artifacts
Why Specs Are JSON Dictionaries¶
Specs are simple dictionaries so they are:
- easy to inspect in notebooks
- easy to write to JSON
- independent of Python class versions
- portable to Spark or other execution engines
- suitable for audit review
That keeps saved specs readable and portable.