Correlations
Discover which stocks systematically lead or follow other stocks by one or more days.
The Correlations page shows lead-lag pairs: relationships where the daily return of one stock (the leader) is statistically correlated with the daily return of another stock (the follower) a fixed number of days later.
For example, if BHP leads RIO by 1 day with r = 0.42, it means that on days when BHP moves strongly up or down, RIO tends to move in the same direction the following day, and this effect is strong enough to be statistically significant after correcting for multiple comparisons.
These relationships can arise from genuine economic linkages (same commodity, same supply chain, index rebalancing flows), from index membership effects, or from differential liquidity (a larger, more liquid stock incorporates news faster than a smaller one in the same sector).
⚠ A statistically significant correlation is not a trading signal by itself. Always check whether the backtest r is meaningfully positive before acting on a pair.
The analysis runs on the ~400–600 most liquid ASX stocks — those with a median daily traded value of at least $500k and data on at least 90% of trading days in the training period. Penny stocks and thinly traded names are excluded because their returns are dominated by bid-ask noise rather than genuine price discovery.
Returns are market-adjusted by default: the XAO (All Ordinaries) daily log-return is subtracted from each stock's log-return before any correlations are computed. This removes the common market factor so that a pair only appears if it moves together beyond what you'd expect from both stocks simply tracking the index.
All correlation discovery uses data strictly before 1 March 2025 (roughly 22 years of history). The period from 1 March 2025 onwards is held out as a backtest and never used during discovery. This means the pairs shown were found in historical data; the r (backtest) column tells you whether each relationship persisted in the subsequent year.
With ~500 symbols and 20 lags, roughly 5 million (leader, follower, lag) combinations are tested. At a nominal p < 0.05 threshold, around 250,000 pairs would appear significant by chance alone. To control for this, the analysis applies Benjamini-Hochberg FDR correction across all 5 million tests simultaneously. Only pairs with a corrected p-value below 0.05 and |r| ≥ 0.15 are shown.
The training period is split into three equal sub-periods (roughly 2003–2011, 2011–2018, 2018–2025). The same significance test is run independently in each sub-period. A pair is marked stable if it passes in all three — evidence that the relationship is persistent rather than driven by a single market episode. The n_stable column shows how many sub-periods (0–3) the pair was significant in.
| Control | What it does |
|---|---|
| Leader symbol | Filter to pairs where this stock is the leader — the one that moves first. Leave blank to see all leaders. |
| Follower symbol | Filter to pairs where this stock is the follower — the one that moves after. Useful for asking "what leads XYZ?" |
| Min strength |r| | Minimum absolute Pearson r in the training set. The default of 0.15 already requires a meaningful relationship; raise it to 0.25–0.30 to see only the strongest pairs. |
| Lag range | Restrict to lags within a specific day range. Set both to 1 to see only overnight effects; set min to 5 to look for weekly patterns. |
| Stable only | Show only pairs significant in all three sub-periods of the training set. Strongly recommended when using results for trading decisions. |
| Market-adjusted only | Show only pairs computed after removing the XAO market factor. This is the default and is almost always what you want. |
Click any column header to sort. Click a row to open the detail drawer on the right.
| Column | Description |
|---|---|
| Leader | The stock whose return at time t predicts the follower's return. |
| Follower | The stock whose return at time t+lag is predicted. |
| Lag | Number of trading days between leader and follower. |
| r (train) | Pearson correlation in the training set (pre-2025). Positive = same direction; negative = opposite direction. |
| r (backtest) | The same correlation computed on the held-out period (2025-Mar onwards). Compare to r (train) to judge whether the relationship has persisted. |
| Stable | ✓ if the pair was significant in all 3 sub-periods; otherwise shows how many (e.g. 2/3). |
| Direction | + for positive correlation (follower moves same way); − for negative (follower moves opposite). |
Clicking a row opens a side panel with:
- A bar chart of r at each lag from 1 to 20 days — useful for seeing whether the correlation is concentrated at a single lag or spread across several days.
- Key statistics: training r, backtest r, FDR-corrected p-value, stability count.
- Links to the stock chart page for both leader and follower.
💡 In the CCF chart, the dotted horizontal lines mark r = ±0.15. Bars that barely clear this threshold are marginal — focus on bars that are clearly above it.
It is normal and expected for backtest r to be lower than training r — this is correlation decay, and occurs in virtually every empirical finance study. A pair where backtest r is still 0.60× or more of training r (e.g. train 0.40, backtest 0.26) is holding up well. A pair where backtest r has collapsed to near zero has likely been arbitraged away or was a statistical artifact.
- Stable — significant in all 3 sub-periods. This is the single most important filter.
- |r| ≥ 0.25 in training — a moderate-to-strong relationship, not just above the 0.15 floor.
- Backtest r ≥ 0.15 — the relationship has not vanished out-of-sample.
- Economically sensible — same sector, same commodity exposure, or a known index composition effect.
- Lag 1 — shorter lags are more actionable. By lag 5–10 the window for acting on the signal has typically passed.
- Large-cap leads small-cap (same sector) — e.g. BHP leading a junior miner. The larger stock reacts to commodity price changes faster due to higher liquidity and analyst coverage.
- Index members leading near-members — stocks just inside an index threshold tend to lead similar stocks just outside it, because index-tracking flows move the included stock first.
- Negative correlation at lag 1 — can appear when two stocks compete for the same pool of day-trader capital; a strong up-day in one draws attention away from the other.
- Pairs with no obvious economic link — a spurious correlation found in 22 years of data will occasionally pass FDR correction by chance, especially at longer lags where many pairs are tested.
- n_stable < 3 — a pair significant in only 1 or 2 sub-periods may reflect a regime-specific effect (e.g. the GFC, COVID) rather than a persistent relationship.
- Lags > 5 — at longer lags, many other factors intervene between the leader moving and the follower being expected to move. The signal-to-noise ratio falls sharply.
- Backtest period is still short — the held-out period began March 2025. As more time passes the backtest r estimate becomes more reliable.
Suppose you see this row:
| Leader | Follower | Lag | r (train) | r (backtest) | Stable | Direction |
|---|---|---|---|---|---|---|
| BHP | RIO | 1d | 0.42 | 0.38 | ✓ | + |
Reading this: on trading days when BHP's market-adjusted return is notably positive, RIO's market-adjusted return tends to be positive the following day, with a Pearson r of 0.42 in the training set and 0.38 in the backtest period. The relationship was significant in all three sub-periods of the training set.
This is consistent with the known pattern where BHP — being more liquid and widely held — reacts to iron ore and global commodity news slightly ahead of RIO, which catches up the next day as its own investors and market makers process the same information.
The backtest decay is modest (0.42 → 0.38, about 10%), suggesting the relationship remains intact. A trader might use this as one input into a short-term view on RIO when BHP has a strong day — but not as a standalone mechanical strategy.