Data science: cost of mistakes

Michael Nielsen some other day shared a story on how researchers are using Twitter to predict stock market movements. The idea wasn’t actually that new – there were other attempts to use Twitter (and not only that) to predict stock prices. And actually it’s fairly easy to come up with another source of signal that will nicely correlate with the market (if you don’t believe that, look for performance figures of astrology-based trading – numbers are quite amazing). However, one of the key issues of algorithmic trading is not how good the method is, but how costly mistakes are. And I believe this is also the key issue in majority of real-life applications of data science projects.

If overprediction in a diagnostics procedure makes somebody loose her/his kidney, that’s not that good procedure. If a trading system makes you loose more money every 10th trade than you earned in previous 9, it’s not a good system either. Assessment of false positives and false negatives (type I and type II errors) is a standard element of statistical hypothesis testing, but real-life applications require weighting mistakes to understand if the algorithm is actually usable.