Machine Learning in Excel Part 2

·

After publishing the first article on machine learning in Excel, I received several questions asking: “Which specific formulas do you actually use?” and “Can you show concrete Excel functions that replicate ML algorithms?

So, I put together a consolidated reference: a set of Excel functions I actively use, plus additional ones that are highly relevant for ML-style analysis even if I haven’t applied all these formulas in real scenarios.

Anyway, let’s go straight to the list:

1. Data Preparation (Cleaning, Normalization, Outliers, Missing Values)

Core cleaning functions

  • TRIM() – removes extra spaces.
  • CLEAN() – removes non-printable characters.
  • TEXT() – formatting for consistent data types.
  • VALUE() – converts text numbers to numeric.
  • IFERROR(), IFNA() – handles invalid or missing inputs.

Regex functions (advanced parsing)

  • REGEXEXTRACT() – extracts patterns (IDs, numbers, tokens).
  • REGEXREPLACE() – cleans or restructures strings.
  • REGEXMATCH() – boolean pattern detection.

Handling missing values

  • AVERAGE(), MEDIAN(), MODE() – imputations.
  • Custom rule:
    =IF(A2="", MEDIAN(A:A), A2)

Outlier detection

  • PERCENTILE.EXC(), PERCENTILE.INC() – threshold flags.
  • QUARTILE() – IQR logic.
  • STDEV.P(), STDEV.S() – standard deviation anomalies.
  • Outlier flag formula:
    =IF(OR(A2 < Q1 - 1.5*(Q3-Q1), A2 > Q3 + 1.5*(Q3-Q1)), 1, 0)

Normalization / scaling

  • MIN(), MAX() – min-max scaling.
  • STANDARDIZE() – z-score scaling.
  • NORM.S.DIST(), NORM.DIST() – create Gaussian features.
  • NORM.S.INV(), NORM.INV() – reverse transformations.
  • Z.TEST() – test standardized values.

Sorting, filtering, deduplication

  • SORT(), SORTBY() – reorganizing data for model training.
  • FILTER() – removing noisy instances.
  • UNIQUE() – finding class values.
  • REMOVE.DUPLICATES() – critical to avoid data leakage.

Reshaping data

  • TRANSPOSE() – switch orientation (wide ↔ long).
  • WRAPROWS(), WRAPCOLS() – reshape into structured arrays.
  • TAKE(), DROP() – extract dataset slices.

LET() and LAMBDA()

  • LET() – cleaner formulas.
  • LAMBDA() – define reusable ML functions:
    • loss function
    • distance function
    • prediction function
    • batch scoring

MAP(), REDUCE(), SCAN()

  • Essential for vectorized ML pipelines:
    • MAP() – apply function to each row.
    • REDUCE() – accumulate results (e.g., loss).
    • SCAN() – cumulative transformations.

2. Feature Engineering (Numeric, Textual, Categorical, Time-Series)

Categorical encoding

  • XLOOKUP() – label encoding.
  • XMATCH() – numeric encoding from lists.
  • CHOOSE() – simple mapping (e.g., Low/Med/High → 1/2/3).
  • SWITCH() – multi-condition mapping.
  • IF(), IFS() – target-based categories.

Text feature engineering

  • TEXTSPLIT() – tokenization.
  • TEXTJOIN() – recomposition of features.
  • LEN() – length-based features.
  • REGEXEXTRACT(), REGEXREPLACE(), REGEXMATCH() – pattern-based features.

Numeric feature engineering

  • Interaction features:
    =A2 * B2
  • Polynomial features:
    =A2^2, =A2^3
  • Binary flags:
    =IF(A2 > threshold, 1, 0)

Rolling time-series features

  • Rolling mean:
    =AVERAGE(OFFSET(A2,-6,0,7,1))
  • Rolling sum:
    =SUM(OFFSET(A2,-6,0,7,1))

3. Core Statistical & Regression Functions (ML Modelling Basics)

Regression & correlation

  • LINEST() – full regression stats (R², standard errors).
  • LOGEST() – exponential regression.
  • SLOPE(), INTERCEPT() – extract regression parameters.
  • RSQ() – coefficient of determination.
  • STEYX() – standard error of predicted y.
  • CORREL(), PEARSON() – correlation between features.
  • COVARIANCE.P(), COVARIANCE.S() – covariance matrices.

Probability distributions

  • GAMMA(), GAMMALN()
  • BETADIST()
  • T.DIST(), CHISQ.DIST()

Used in statistical testing, ML evaluation, and uncertainty.

4. Matrix Algebra (for OLS, Multi-variable Regression, Custom Models)

These are essential for manually implementing linear models:

  • MMULT() – matrix multiplication.
  • MINVERSE() – matrix inverse (OLS solution).
  • MDETERM() – determinant.
  • TRANSPOSE() – orientation switching.
  • SUMPRODUCT() – dot-product (manual prediction formula).

5. Model Building (Regression, Classification, Time-Series)

5.1 Linear regression (two options)

Quick method (1 line): =LINEST(y_range, x_range, TRUE, TRUE)

Manual matrix regression (multi-variable) using:

  • MMULT()
  • MINVERSE()
  • TRANSPOSE()

5.2 Logistic regression

Probability formula: =1/(1+EXP(-(B0 + B1*x1 + B2*x2)))

5.3 Classification logic

  • Threshold method:
    =IF(prob>0.5,1,0)
  • Multi-class:
    =IFS(class1_prob>0.5,"A", class2_prob>0.5,"B", TRUE,"C")

5.4 Time-series forecasting

  • FORECAST.LINEAR() – simple trend.
  • FORECAST.ETS() – full ETS algorithm.
  • FORECAST.ETS.SEASONALITY() – seasonality detection.
  • FORECAST.ETS.CONFINT() – confidence intervals.
  • FORECAST.ETS.STAT() – quality metrics (MASE, RMSE).
  • MOVING AVERAGE (Toolpak) – smoothing.
  • EXPONENTIAL SMOOTHING (Toolpak) – classical forecasting.

6. Model Evaluation & Scoring

Error metrics

  • MSE: =AVERAGE((actual - predicted)^2)
  • RMSE: =SQRT(AVERAGE((actual - predicted)^2))
  • MAE: =AVERAGE(ABS(actual - predicted))
  • R²: =RSQ(predicted, actual)

Confidence intervals

  • CONFIDENCE.T()
  • CONFIDENCE.NORM()

Classification metrics

  • COUNTIF(), COUNTIFS(), SUMIFS()

Accuracy example: =COUNTIF(correct_predictions, TRUE)/COUNT(correct_predictions)

Batch scoring using MAP()

If you have a model defined as a LAMBDA: =MAP(A2:A500, B2:B500, C2:C500, PredictOne)

7. Optimization (Training Models with Solver)

The Solver Add-In is essential for ML tasks in Excel:

  • training logistic regression
  • fitting nonlinear models
  • optimizing custom objective functions built with LAMBDA
  • minimizing loss functions
  • parameter search for forecasting models

It works with:

  • linear optimization
  • nonlinear optimization
  • evolutionary optimization

8. Deployment in Excel (Turning a Model Into a Practical Tool)

Build a prediction function

Use LAMBDA() to create a reusable prediction model.

Vectorized scoring

Use MAP() to score thousands of rows instantly.

Create a mini-dashboard

  • Slicers
  • Conditional formatting
  • Sparklines
  • Trend charts
  • Outlier flags
  • Seasonality analysis

This transforms Excel into a practical ML environment.

Enjoy!

Treasury tech insights. No fluff.

One email per week. Build logs, tools, and opinions from the trenches.

Subscribe →

Alina Turungiu

17 years in Treasury · Building TreasuryOS · The Hybrid Treasurer

Subscribe

Treasury tech insights. One email per week.

Try TreasuryOS

Modular treasury software. No 6 month implementation. Built by a practitioner.