After publishing the first article on machine learning in Excel, I received several questions asking: “Which specific formulas do you actually use?” and “Can you show concrete Excel functions that replicate ML algorithms?”
So, I put together a consolidated reference: a set of Excel functions I actively use, plus additional ones that are highly relevant for ML-style analysis even if I haven’t applied all these formulas in real scenarios.
Anyway, let’s go straight to the list:
1. Data Preparation (Cleaning, Normalization, Outliers, Missing Values)
Core cleaning functions
- TRIM() – removes extra spaces.
- CLEAN() – removes non-printable characters.
- TEXT() – formatting for consistent data types.
- VALUE() – converts text numbers to numeric.
- IFERROR(), IFNA() – handles invalid or missing inputs.
Regex functions (advanced parsing)
- REGEXEXTRACT() – extracts patterns (IDs, numbers, tokens).
- REGEXREPLACE() – cleans or restructures strings.
- REGEXMATCH() – boolean pattern detection.
Handling missing values
- AVERAGE(), MEDIAN(), MODE() – imputations.
- Custom rule:
=IF(A2="", MEDIAN(A:A), A2)
Outlier detection
- PERCENTILE.EXC(), PERCENTILE.INC() – threshold flags.
- QUARTILE() – IQR logic.
- STDEV.P(), STDEV.S() – standard deviation anomalies.
- Outlier flag formula:
=IF(OR(A2 < Q1 - 1.5*(Q3-Q1), A2 > Q3 + 1.5*(Q3-Q1)), 1, 0)
Normalization / scaling
- MIN(), MAX() – min-max scaling.
- STANDARDIZE() – z-score scaling.
- NORM.S.DIST(), NORM.DIST() – create Gaussian features.
- NORM.S.INV(), NORM.INV() – reverse transformations.
- Z.TEST() – test standardized values.
Sorting, filtering, deduplication
- SORT(), SORTBY() – reorganizing data for model training.
- FILTER() – removing noisy instances.
- UNIQUE() – finding class values.
- REMOVE.DUPLICATES() – critical to avoid data leakage.
Reshaping data
- TRANSPOSE() – switch orientation (wide ↔ long).
- WRAPROWS(), WRAPCOLS() – reshape into structured arrays.
- TAKE(), DROP() – extract dataset slices.
LET() and LAMBDA()
- LET() – cleaner formulas.
- LAMBDA() – define reusable ML functions:
- loss function
- distance function
- prediction function
- batch scoring
MAP(), REDUCE(), SCAN()
- Essential for vectorized ML pipelines:
- MAP() – apply function to each row.
- REDUCE() – accumulate results (e.g., loss).
- SCAN() – cumulative transformations.
2. Feature Engineering (Numeric, Textual, Categorical, Time-Series)
Categorical encoding
- XLOOKUP() – label encoding.
- XMATCH() – numeric encoding from lists.
- CHOOSE() – simple mapping (e.g., Low/Med/High → 1/2/3).
- SWITCH() – multi-condition mapping.
- IF(), IFS() – target-based categories.
Text feature engineering
- TEXTSPLIT() – tokenization.
- TEXTJOIN() – recomposition of features.
- LEN() – length-based features.
- REGEXEXTRACT(), REGEXREPLACE(), REGEXMATCH() – pattern-based features.
Numeric feature engineering
- Interaction features:
=A2 * B2 - Polynomial features:
=A2^2,=A2^3 - Binary flags:
=IF(A2 > threshold, 1, 0)
Rolling time-series features
- Rolling mean:
=AVERAGE(OFFSET(A2,-6,0,7,1)) - Rolling sum:
=SUM(OFFSET(A2,-6,0,7,1))
3. Core Statistical & Regression Functions (ML Modelling Basics)
Regression & correlation
- LINEST() – full regression stats (R², standard errors).
- LOGEST() – exponential regression.
- SLOPE(), INTERCEPT() – extract regression parameters.
- RSQ() – coefficient of determination.
- STEYX() – standard error of predicted y.
- CORREL(), PEARSON() – correlation between features.
- COVARIANCE.P(), COVARIANCE.S() – covariance matrices.
Probability distributions
- GAMMA(), GAMMALN()
- BETADIST()
- T.DIST(), CHISQ.DIST()
Used in statistical testing, ML evaluation, and uncertainty.
4. Matrix Algebra (for OLS, Multi-variable Regression, Custom Models)
These are essential for manually implementing linear models:
- MMULT() – matrix multiplication.
- MINVERSE() – matrix inverse (OLS solution).
- MDETERM() – determinant.
- TRANSPOSE() – orientation switching.
- SUMPRODUCT() – dot-product (manual prediction formula).
5. Model Building (Regression, Classification, Time-Series)
5.1 Linear regression (two options)
Quick method (1 line): =LINEST(y_range, x_range, TRUE, TRUE)
Manual matrix regression (multi-variable) using:
MMULT()MINVERSE()TRANSPOSE()
5.2 Logistic regression
Probability formula: =1/(1+EXP(-(B0 + B1*x1 + B2*x2)))
5.3 Classification logic
- Threshold method:
=IF(prob>0.5,1,0) - Multi-class:
=IFS(class1_prob>0.5,"A", class2_prob>0.5,"B", TRUE,"C")
5.4 Time-series forecasting
- FORECAST.LINEAR() – simple trend.
- FORECAST.ETS() – full ETS algorithm.
- FORECAST.ETS.SEASONALITY() – seasonality detection.
- FORECAST.ETS.CONFINT() – confidence intervals.
- FORECAST.ETS.STAT() – quality metrics (MASE, RMSE).
- MOVING AVERAGE (Toolpak) – smoothing.
- EXPONENTIAL SMOOTHING (Toolpak) – classical forecasting.
6. Model Evaluation & Scoring
Error metrics
- MSE:
=AVERAGE((actual - predicted)^2) - RMSE:
=SQRT(AVERAGE((actual - predicted)^2)) - MAE:
=AVERAGE(ABS(actual - predicted)) - R²:
=RSQ(predicted, actual)
Confidence intervals
- CONFIDENCE.T()
- CONFIDENCE.NORM()
Classification metrics
- COUNTIF(), COUNTIFS(), SUMIFS()
Accuracy example: =COUNTIF(correct_predictions, TRUE)/COUNT(correct_predictions)
Batch scoring using MAP()
If you have a model defined as a LAMBDA: =MAP(A2:A500, B2:B500, C2:C500, PredictOne)
7. Optimization (Training Models with Solver)
The Solver Add-In is essential for ML tasks in Excel:
- training logistic regression
- fitting nonlinear models
- optimizing custom objective functions built with LAMBDA
- minimizing loss functions
- parameter search for forecasting models
It works with:
- linear optimization
- nonlinear optimization
- evolutionary optimization
8. Deployment in Excel (Turning a Model Into a Practical Tool)
Build a prediction function
Use LAMBDA() to create a reusable prediction model.
Vectorized scoring
Use MAP() to score thousands of rows instantly.
Create a mini-dashboard
- Slicers
- Conditional formatting
- Sparklines
- Trend charts
- Outlier flags
- Seasonality analysis
This transforms Excel into a practical ML environment.
Enjoy!