AI Projects Need Therapy

One of the problems I see all the times, when we work with customers is that:

Data scientists don't understand what business objectives are
Business leaders don't understand what problem data scientists are solving

This loop of misunderstanding ends when finally a project is complete, deployed, and the result is not what either party expected.

Business leader: "Uhm, the system doesn't seem to work well, since using the prediction from the model, we are losing more money than last quarter"
Data scientist: "You asked for a system that maximizes prediction accuracy, and I built that for you."
Business leader: "Yes, but I wanted a system that maximizes revenue, I don't need a system that is accurate for cases where we don't make money, and it's always wrong for cases I care about!"
Data scientist: "Yes, but you didn't tell me that. You wanted an accurate system. The system I built is accurate."
Business leader: "Yes, but it was clear if you just read my mind or between the lines where I clearly implied what I wanted. Or just read my mind. Why can't you just do that? I pay you a lot of money."_

This is a case of a "high accuracy model" that doesn't deliver "business value". The data scientist says "mission accomplished!", the business leader says "this is an utter disaster", and they are both right.

If you are married, you probably can relate this to many unproductive discussions with your spouse. The "yes-but", the "read-my-mind", the "why-can't-you-just-be-like-XYZ?".

We need to replace accuracy metrics with decision value

The Business Problem#

Assume that:

The purchase cost for a mid-range industrial CNC mill is around $50,000-$150,000 [ref]
The annual gross production, if high utilization and good mix of work, is \$200,000-$400,000
When a serious failure occurs, the repair cost is $5,000-$20,000 (plus lost production cost) [ref]

The business leader wants a predictive model that predict when the CNC milling machine fails within the next 7 days of operation, using sensor data, so that a new part can be ordered and the machine repaired without downtime.

The Wrong Way#

Suppose the data scientist trains a classifier to predict $y$, whether a CNC milling machine will fail within the next 7 days

$y = 1$: failure within 7 days
$y = 0$: no failure

given the sensor data $x$

Assume the model reports an error rate of 1%, which means 99% accuracy. That sounds excellent at first. However, consider the real data: only 0.5% of all machine operating periods end in failure. That means a trivial model that "always predicts y = 0" (no failure) would have an error rate of only 0.5%. In other words, a completely useless model that never predicts failure looks more accurate than the model we trained.
So the claimed 99% accuracy is meaningless. This is similar to the fallacy "broken watch is right twice a day".

The model will fail to identify the rare but critical failure events, which are exactly what the business cares about.

This is a classic example of why accuracy alone is often a useless metric, especially when dealing with skewed (imbalanced) data.
When one class is much rarer than the other, accuracy can hide how poorly a model is performing.

If failure events are rare, a model can appear highly accurate by always predicting "no failure"

The Causal Way#

The right approach is to do what at Causify we call "end-to-end machine learning", where the business objective is used as goal to optimize for a model. In the case of the CNC machines, the right business metric to optimize is "revenues for a CNC machine", and not how often the model is correct.

The data scientist should understand the economics and the business objectives besides "predict whether a CNC machine will fail within the next 7 days, and maximize accuracy"

// - Annual gross production: $G$ = \$200,000 // - Major repair (parts + service): $p$ = \$10,000 // - Mean Time Between Failures: MTBF = 100 days // - Unplanned offline time if parts are not pre-staged: $d = 7$ days // - Effective production days per year: $D$ = 365

What should the data scientist do? ELI5 version#

There are two types of mistakes, and they do not cost the same:

Situation	Buy parts early	Result
Predict failure and it happens	Yes	Fast repair
Predict failure and it does not happen	Yes	Parts in inventory
Miss a real failure	No	Machine down for a week
Predict no failure and none happens	No	No cost

A missed failure costs about 25× more than buying parts early. Therefore, the model must prioritize avoiding missed failures.

The model should be trained using dollar values, not accuracy metrics. This means the model is optimized to minimize total cost, not to maximize prediction accuracy.

What should the data scientist do? Math version#

A Causal Bayesian model outputs a probability of failure. We only act when that probability is financially justified.
The best model is the one with the lowest total cost over time.
Let the model output $$ p_i = P(\text{fail in next 7 days} \mid \text{features}_i) $$
Predict positive (buy parts) if: $$ p_i C_{TP} + (1 - p_i)C_{FP} \le p_i C_{FN} + (1 - p_i)C_{TN} $$
Solving yields the optimal threshold: $$ \tau = \frac{C_{FP}}{C_{FP} + X} $$

Interpretation:

If false alarms are cheap, act early (low $\tau$).
If false alarms are expensive, require a higher predicted failure probability.
The cost-weighted cross-entropy is $$ \mathcal{L}{\text{wCE}} = -\sum_i \left( w_1 y_i \log p_i + w_0 (1-y_i) \log(1 - p_i) \right) $$ where $$ w_1 \propto (X + p), \quad w_0 \propto C $$

The expected-cost is: $$ \tilde{\mathcal{L}} = \sum_i \min \Big[ p_i C_{TP} + (1 - p_i)C_{FP}, \; p_i C_{FN} + (1 - p_i)C_{TN} \Big] $$

What this means for leadership#

The predictive model is directly tied to business impact
ROI can be expressed in avoided downtime dollars
Cost assumptions (downtime, capital rate, MTBF) are adjustable business inputs
The approach balances risk and cost rather than maximizing mathematical accuracy
Decisions become consistent and financially defensible