P-Hacking

P-Hacking

P-Hacking (also known as 'data dredging' or 'data fishing') is the practice of repeatedly analyzing a set of data in different ways until a statistically significant—but often meaningless—result emerges. Imagine you have a giant dataset and a specific hypothesis you want to prove. Instead of running one clean test, you twist and turn the data, add or remove variables, change the time period, and run dozens or even hundreds of slightly different analyses. Eventually, by pure chance, one of these tests will spit out a result that looks impressive. The researcher then presents this single “successful” test as if it were their one and only attempt, hiding all the failures. This practice is a major pitfall in academic research and, more importantly for investors, in the development of quantitative investing strategies. A strategy “discovered” through p-hacking may look brilliant in a backtest, but it's likely to fall flat in the real world because the pattern was a statistical ghost, not a genuine market inefficiency.

The P-Value Trap

At the heart of p-hacking lies the p-value. In statistics, a p-value helps determine the probability that your observed results occurred by random chance. A low p-value (the conventional threshold is less than 0.05) is typically hailed as “statistically significant,” implying that what you found is probably a real effect. The trap is believing this number is infallible. If you run 100 different statistical tests on completely random data where no real relationship exists, you should expect about five of them to produce a p-value below 0.05 just by sheer luck. A p-hacker exploits this. They run those 100 tests, discard the 95 that show nothing, and triumphantly publish the five “significant” ones. They have found a statistically significant result, but it signifies nothing of value. It's the equivalent of flipping a coin, getting five heads in a row, and declaring you have a magical coin, while conveniently forgetting to mention the dozens of other times you tried and failed.

How P-Hacking Happens in Investing

P-hacking isn't just an academic problem; it's rampant in the financial industry, where a “proven” new strategy can be worth millions in management fees. Here are a few common ways it manifests:

Cherry-Picking Data

This is the simplest form of p-hacking. A fund manager or researcher selects a time period or a dataset that makes their strategy look as good as possible. For instance, they might showcase a strategy's amazing performance from March 2009 to December 2019. While technically accurate, it conveniently begins right after the market bottom of the 2008 financial crisis and runs through one of the longest bull markets in history. By excluding the crash, they present a misleadingly rosy picture of the strategy's risk and return.

Torturing the Data

This classic technique is summed up by the phrase, “If you torture the data long enough, it will confess to anything.” Imagine a researcher trying to prove that stocks with a certain “value” characteristic outperform.

First, they test the P/E ratio against the market. It doesn't work.
Then they try the P/B ratio. Still no luck.
They keep going, testing EV/EBITDA, dividend yield, and free cash flow yield.
Finally, they find that companies with a P/B ratio below 1.2 and a dividend yield above 3.5%, when rebalanced every seven months, showed significant outperformance between 1992 and 2004.

They present this hyper-specific rule as a brilliant discovery, when it was actually the product of a brute-force search for a random correlation.

Why Should a Value Investor Care?

The entire philosophy of value investing is about digging for fundamental truth and avoiding market fads and fictions. P-hacking is the enemy of this approach because it creates statistical fictions that can lure investors into flawed strategies. It encourages trading on flimsy patterns rather than investing in durable businesses. When you read a report promoting a new, backtested “factor” that supposedly beats the market, your p-hacking alarm should go off. Is this a genuine economic insight, or just the result of a powerful computer mining for flukes? A true value investor knows that a genuine margin of safety comes not just from buying an asset for less than its intrinsic value, but also from having a margin of safety in your thinking. This means being deeply skeptical of claims that seem too good to be true and relying on timeless business principles over complex, data-mined formulas.

Spotting the Red Flags

You don't need a Ph.D. in statistics to protect yourself. Just keep an eye out for these common warning signs of p-hacking:

Results are too good to be true. Real-world investment returns are lumpy and unpredictable. A backtest showing perfectly smooth, high returns is a major red flag.
The logic is a stretch. Why should stocks of companies whose names have exactly ten letters outperform? If there is no sound economic or business reason for a pattern to exist, it's probably just noise.
Overly complex rules. A strategy that relies on a bizarre combination of seven different, unrelated metrics is more likely to be a data-mined fluke than a robust investment approach. Simplicity is often a sign of strength.
Lack of transparency. The author doesn't clearly state all the variables they tested and all the time periods they considered. Honest research is open about its entire process, not just the winning parts.
It hasn't been replicated. A single study is just a single study. A truly robust market anomaly should be discoverable by independent researchers using different datasets and time periods.

Table of Contents