Automate Treasury

Machine Learning in Excel Part 2

After publishing the first article on machine learning in Excel, I received several questions asking: “Which specific formulas do you actually use?” and “Can you show concrete Excel functions that replicate ML algorithms?

So, I put together a consolidated reference: a set of Excel functions I actively use, plus additional ones that are highly relevant for ML-style analysis even if I haven’t applied all these formulas in real scenarios.

Anyway, let’s go straight to the list:

1. Data Preparation (Cleaning, Normalization, Outliers, Missing Values)

Core cleaning functions

  • TRIM() – removes extra spaces.
  • CLEAN() – removes non-printable characters.
  • TEXT() – formatting for consistent data types.
  • VALUE() – converts text numbers to numeric.
  • IFERROR(), IFNA() – handles invalid or missing inputs.

Regex functions (advanced parsing)

  • REGEXEXTRACT() – extracts patterns (IDs, numbers, tokens).
  • REGEXREPLACE() – cleans or restructures strings.
  • REGEXMATCH() – boolean pattern detection.

Handling missing values

  • AVERAGE(), MEDIAN(), MODE() – imputations.
  • Custom rule:
    =IF(A2="", MEDIAN(A:A), A2)

Outlier detection

  • PERCENTILE.EXC(), PERCENTILE.INC() – threshold flags.
  • QUARTILE() – IQR logic.
  • STDEV.P(), STDEV.S() – standard deviation anomalies.
  • Outlier flag formula:
    =IF(OR(A2 < Q1 - 1.5*(Q3-Q1), A2 > Q3 + 1.5*(Q3-Q1)), 1, 0)

Normalization / scaling

  • MIN(), MAX() – min-max scaling.
  • STANDARDIZE() – z-score scaling.
  • NORM.S.DIST(), NORM.DIST() – create Gaussian features.
  • NORM.S.INV(), NORM.INV() – reverse transformations.
  • Z.TEST() – test standardized values.

Sorting, filtering, deduplication

  • SORT(), SORTBY() – reorganizing data for model training.
  • FILTER() – removing noisy instances.
  • UNIQUE() – finding class values.
  • REMOVE.DUPLICATES() – critical to avoid data leakage.

Reshaping data

  • TRANSPOSE() – switch orientation (wide ↔ long).
  • WRAPROWS(), WRAPCOLS() – reshape into structured arrays.
  • TAKE(), DROP() – extract dataset slices.

LET() and LAMBDA()

  • LET() – cleaner formulas.
  • LAMBDA() – define reusable ML functions:
    • loss function
    • distance function
    • prediction function
    • batch scoring

MAP(), REDUCE(), SCAN()

  • Essential for vectorized ML pipelines:
    • MAP() – apply function to each row.
    • REDUCE() – accumulate results (e.g., loss).
    • SCAN() – cumulative transformations.

2. Feature Engineering (Numeric, Textual, Categorical, Time-Series)

Categorical encoding

  • XLOOKUP() – label encoding.
  • XMATCH() – numeric encoding from lists.
  • CHOOSE() – simple mapping (e.g., Low/Med/High → 1/2/3).
  • SWITCH() – multi-condition mapping.
  • IF(), IFS() – target-based categories.

Text feature engineering

  • TEXTSPLIT() – tokenization.
  • TEXTJOIN() – recomposition of features.
  • LEN() – length-based features.
  • REGEXEXTRACT(), REGEXREPLACE(), REGEXMATCH() – pattern-based features.

Numeric feature engineering

  • Interaction features:
    =A2 * B2
  • Polynomial features:
    =A2^2, =A2^3
  • Binary flags:
    =IF(A2 > threshold, 1, 0)

Rolling time-series features

  • Rolling mean:
    =AVERAGE(OFFSET(A2,-6,0,7,1))
  • Rolling sum:
    =SUM(OFFSET(A2,-6,0,7,1))

3. Core Statistical & Regression Functions (ML Modelling Basics)

Regression & correlation

  • LINEST() – full regression stats (R², standard errors).
  • LOGEST() – exponential regression.
  • SLOPE(), INTERCEPT() – extract regression parameters.
  • RSQ() – coefficient of determination.
  • STEYX() – standard error of predicted y.
  • CORREL(), PEARSON() – correlation between features.
  • COVARIANCE.P(), COVARIANCE.S() – covariance matrices.

Probability distributions

  • GAMMA(), GAMMALN()
  • BETADIST()
  • T.DIST(), CHISQ.DIST()

Used in statistical testing, ML evaluation, and uncertainty.

4. Matrix Algebra (for OLS, Multi-variable Regression, Custom Models)

These are essential for manually implementing linear models:

  • MMULT() – matrix multiplication.
  • MINVERSE() – matrix inverse (OLS solution).
  • MDETERM() – determinant.
  • TRANSPOSE() – orientation switching.
  • SUMPRODUCT() – dot-product (manual prediction formula).

5. Model Building (Regression, Classification, Time-Series)

5.1 Linear regression (two options)

Quick method (1 line): =LINEST(y_range, x_range, TRUE, TRUE)

Manual matrix regression (multi-variable) using:

  • MMULT()
  • MINVERSE()
  • TRANSPOSE()

5.2 Logistic regression

Probability formula: =1/(1+EXP(-(B0 + B1*x1 + B2*x2)))

5.3 Classification logic

  • Threshold method:
    =IF(prob>0.5,1,0)
  • Multi-class:
    =IFS(class1_prob>0.5,"A", class2_prob>0.5,"B", TRUE,"C")

5.4 Time-series forecasting

  • FORECAST.LINEAR() – simple trend.
  • FORECAST.ETS() – full ETS algorithm.
  • FORECAST.ETS.SEASONALITY() – seasonality detection.
  • FORECAST.ETS.CONFINT() – confidence intervals.
  • FORECAST.ETS.STAT() – quality metrics (MASE, RMSE).
  • MOVING AVERAGE (Toolpak) – smoothing.
  • EXPONENTIAL SMOOTHING (Toolpak) – classical forecasting.

6. Model Evaluation & Scoring

Error metrics

  • MSE: =AVERAGE((actual - predicted)^2)
  • RMSE: =SQRT(AVERAGE((actual - predicted)^2))
  • MAE: =AVERAGE(ABS(actual - predicted))
  • R²: =RSQ(predicted, actual)

Confidence intervals

  • CONFIDENCE.T()
  • CONFIDENCE.NORM()

Classification metrics

  • COUNTIF(), COUNTIFS(), SUMIFS()

Accuracy example: =COUNTIF(correct_predictions, TRUE)/COUNT(correct_predictions)

Batch scoring using MAP()

If you have a model defined as a LAMBDA: =MAP(A2:A500, B2:B500, C2:C500, PredictOne)

7. Optimization (Training Models with Solver)

The Solver Add-In is essential for ML tasks in Excel:

  • training logistic regression
  • fitting nonlinear models
  • optimizing custom objective functions built with LAMBDA
  • minimizing loss functions
  • parameter search for forecasting models

It works with:

  • linear optimization
  • nonlinear optimization
  • evolutionary optimization

8. Deployment in Excel (Turning a Model Into a Practical Tool)

Build a prediction function

Use LAMBDA() to create a reusable prediction model.

Vectorized scoring

Use MAP() to score thousands of rows instantly.

Create a mini-dashboard

  • Slicers
  • Conditional formatting
  • Sparklines
  • Trend charts
  • Outlier flags
  • Seasonality analysis

This transforms Excel into a practical ML environment.

Enjoy!

About the author

Alina Turungiu

Treasury Automation Expert | 17+ years in global treasury operations | Founder of TreasuryOS
I help treasury teams eliminate manual work without enterprise budgets or heavy IT involvement. Certified in treasury management, Power Platform, RPA, and Six Sigma. TreasuryOS is my AI builder platform where treasurers describe what they need and get working applications, no coding, no enterprise contracts. At TreasuryEase.com, I share what actually works.