Machine Learning in Excel Part 2

After publishing the first article on machine learning in Excel, I received several questions asking: “Which specific formulas do you actually use?” and “Can you show concrete Excel functions that replicate ML algorithms?”

So, I put together a consolidated reference: a set of Excel functions I actively use, plus additional ones that are highly relevant for ML-style analysis even if I haven’t applied all these formulas in real scenarios.

Anyway, let’s go straight to the list:

1. Data Preparation (Cleaning, Normalization, Outliers, Missing Values)

Core cleaning functions

TRIM() – removes extra spaces.
CLEAN() – removes non-printable characters.
TEXT() – formatting for consistent data types.
VALUE() – converts text numbers to numeric.
IFERROR(), IFNA() – handles invalid or missing inputs.

Regex functions (advanced parsing)

REGEXEXTRACT() – extracts patterns (IDs, numbers, tokens).
REGEXREPLACE() – cleans or restructures strings.
REGEXMATCH() – boolean pattern detection.

Handling missing values

AVERAGE(), MEDIAN(), MODE() – imputations.
Custom rule:
=IF(A2="", MEDIAN(A:A), A2)

Outlier detection

PERCENTILE.EXC(), PERCENTILE.INC() – threshold flags.
QUARTILE() – IQR logic.
STDEV.P(), STDEV.S() – standard deviation anomalies.
Outlier flag formula:

=IF(OR(A2 < Q1 - 1.5*(Q3-Q1), A2 > Q3 + 1.5*(Q3-Q1)), 1, 0)

Normalization / scaling

MIN(), MAX() – min-max scaling.
STANDARDIZE() – z-score scaling.
NORM.S.DIST(), NORM.DIST() – create Gaussian features.
NORM.S.INV(), NORM.INV() – reverse transformations.
Z.TEST() – test standardized values.

Sorting, filtering, deduplication

SORT(), SORTBY() – reorganizing data for model training.
FILTER() – removing noisy instances.
UNIQUE() – finding class values.
REMOVE.DUPLICATES() – critical to avoid data leakage.

Reshaping data

TRANSPOSE() – switch orientation (wide ↔ long).
WRAPROWS(), WRAPCOLS() – reshape into structured arrays.
TAKE(), DROP() – extract dataset slices.

LET() and LAMBDA()

LET() – cleaner formulas.
LAMBDA() – define reusable ML functions:
- loss function
- distance function
- prediction function
- batch scoring

MAP(), REDUCE(), SCAN()

Essential for vectorized ML pipelines:
- MAP() – apply function to each row.
- REDUCE() – accumulate results (e.g., loss).
- SCAN() – cumulative transformations.

2. Feature Engineering (Numeric, Textual, Categorical, Time-Series)

Categorical encoding

XLOOKUP() – label encoding.
XMATCH() – numeric encoding from lists.
CHOOSE() – simple mapping (e.g., Low/Med/High → 1/2/3).
SWITCH() – multi-condition mapping.
IF(), IFS() – target-based categories.

Text feature engineering

TEXTSPLIT() – tokenization.
TEXTJOIN() – recomposition of features.
LEN() – length-based features.
REGEXEXTRACT(), REGEXREPLACE(), REGEXMATCH() – pattern-based features.

Numeric feature engineering

Interaction features:
=A2 * B2
Polynomial features:
=A2^2, =A2^3
Binary flags:
=IF(A2 > threshold, 1, 0)

Rolling time-series features

Rolling mean:
=AVERAGE(OFFSET(A2,-6,0,7,1))
Rolling sum:
=SUM(OFFSET(A2,-6,0,7,1))

3. Core Statistical & Regression Functions (ML Modelling Basics)

Regression & correlation

LINEST() – full regression stats (R², standard errors).
LOGEST() – exponential regression.
SLOPE(), INTERCEPT() – extract regression parameters.
RSQ() – coefficient of determination.
STEYX() – standard error of predicted y.
CORREL(), PEARSON() – correlation between features.
COVARIANCE.P(), COVARIANCE.S() – covariance matrices.

Probability distributions

GAMMA(), GAMMALN()
BETADIST()
T.DIST(), CHISQ.DIST()

Used in statistical testing, ML evaluation, and uncertainty.

4. Matrix Algebra (for OLS, Multi-variable Regression, Custom Models)

These are essential for manually implementing linear models:

MMULT() – matrix multiplication.
MINVERSE() – matrix inverse (OLS solution).
MDETERM() – determinant.
TRANSPOSE() – orientation switching.
SUMPRODUCT() – dot-product (manual prediction formula).

5. Model Building (Regression, Classification, Time-Series)

5.1 Linear regression (two options)

Quick method (1 line): =LINEST(y_range, x_range, TRUE, TRUE)

Manual matrix regression (multi-variable) using:

MMULT()
MINVERSE()
TRANSPOSE()

5.2 Logistic regression

Probability formula: =1/(1+EXP(-(B0 + B1*x1 + B2*x2)))

5.3 Classification logic

Threshold method:
=IF(prob>0.5,1,0)
Multi-class:
=IFS(class1_prob>0.5,"A", class2_prob>0.5,"B", TRUE,"C")

5.4 Time-series forecasting

FORECAST.LINEAR() – simple trend.
FORECAST.ETS() – full ETS algorithm.
FORECAST.ETS.SEASONALITY() – seasonality detection.
FORECAST.ETS.CONFINT() – confidence intervals.
FORECAST.ETS.STAT() – quality metrics (MASE, RMSE).
MOVING AVERAGE (Toolpak) – smoothing.
EXPONENTIAL SMOOTHING (Toolpak) – classical forecasting.

6. Model Evaluation & Scoring

Error metrics

MSE: =AVERAGE((actual - predicted)^2)
RMSE: =SQRT(AVERAGE((actual - predicted)^2))
MAE: =AVERAGE(ABS(actual - predicted))
R²: =RSQ(predicted, actual)

Confidence intervals

CONFIDENCE.T()
CONFIDENCE.NORM()

Classification metrics

COUNTIF(), COUNTIFS(), SUMIFS()

Accuracy example: =COUNTIF(correct_predictions, TRUE)/COUNT(correct_predictions)

Batch scoring using MAP()

If you have a model defined as a LAMBDA: =MAP(A2:A500, B2:B500, C2:C500, PredictOne)

7. Optimization (Training Models with Solver)

The Solver Add-In is essential for ML tasks in Excel:

training logistic regression
fitting nonlinear models
optimizing custom objective functions built with LAMBDA
minimizing loss functions
parameter search for forecasting models

It works with:

linear optimization
nonlinear optimization
evolutionary optimization

8. Deployment in Excel (Turning a Model Into a Practical Tool)

Build a prediction function

Use LAMBDA() to create a reusable prediction model.

Vectorized scoring

Use MAP() to score thousands of rows instantly.

Create a mini-dashboard

Slicers
Conditional formatting
Sparklines
Trend charts
Outlier flags
Seasonality analysis

This transforms Excel into a practical ML environment.

Enjoy!

Treasury tech insights. No fluff.

One email per week. Build logs, tools, and opinions from the trenches.

Subscribe →