The Alternative Data Primer

The Diff

  • Alternative data is any info for an investment decision one can’t get via a trading terminal. Examples are anything that’s not in the firm’s shareholder report: web traffic data, Google trends, credit card panels, mobile app usage stats, satellite imagery, web scraping, etc. [MK: It’s different from the information from SEC filings, as the latter is analysed for the merit of the action.]

  • Access to a non-obvious or even non-public source of data can be used to gain from stock mispricing. It takes a while to understand the full extent of the impact of the newly discovered data source on the stock price (if any), and it requires backwards prediction at least for several quarters back. New data can be useless if it mainly correlates to the well-known factors for the firm, like seasonality.

  • There are well-documented and previously exploited ways of analysing order numbers (and placing orders on the last minute of the reporting period to find out the order number), but most firms have already closed these loopholes.

  • A new factor is only as good as the model it’s put into; thus an analyst needs to have a more or less accurate financial model for the target firm explaining the cost and revenue drivers.

  • New data rarely stays secret for long; over a short time period it becomes available to others who start trading on it. This is true for both secondary information (credit card panels) and primary one, too (company’s own reports they’re selling or distributing). This new data smoothens the intra-quarter volatility for, say, retailers, and shares see some bigger volatility around the earnings period.

  • When everyone’s trading on the same set of data, the value of the conflicting information rises, as it becomes a signal in itself that something’s going on in the firm.

  • Conflicting signals can be misleading if they’re accepted blindly without understanding of whether they’re systemic in nature or one-off. Cross-checking them against other datasets (ex: December offline sales vs the number of blizzards in the region) can create better understanding.

  • Unauthorized access to information, though, is likely to be treated as insider trading with all the negative consequences (criminal and financial) of doing so.

The Datasets

  • Most datasets traded on originally started as something else, usually for marketing. And they’re mostly cheap (as they’re produced in large numbers). And competitors love them.

  • Selling data can come as selling raw data to the data firms (which then enhance and aggregate it, then selling further to hedge funds), or to the hedge funds (which are willing to do the heavy lifting) themselves.

  • Data processing and matching, however, is super expensive, and it’s a fixed cost, so it makes sense to share the cost with others. Selling data is not cheap as well: it’s a long and painful sales cycle to get someone to adopt it; things get better as there’s a customer, as information about data sources tends to spread via cross-pollination in the financial industry.

  • The two-sided information asymmetry (knowing what matters to the firm and where to get the proper data on what matters) is a hard task to crack, and – solved properly – creates a temporary competitive advantage for an investor.

  • The time to process the raw data obtained from a vendor directly plays a big role in getting not just the trading signal, but also the knowledge how others would trade once they get the report from a data vendor.

  • There are inevitable conflicts between quants (looking at the data) and discretionary portfolio managers (trading on the data, but not telling which particular piece of it), so getting a data scientist in one room with an analyst is a good idea to ensure the eventually peaceful information exchange.

  • Using data increases fixed costs (the costs of obtaining the data), variable costs (staffing, technology) and even faster-increasing compliance costs. Additional headcount leads to management / supervision costs, and so investors no longer remain as lean as they used to.

source ($)