Want to get a grip on New York City’s dynamic market for apartments? It ends up that crime statistics and, perhaps more surprisingly, restaurant health inspections can be a helpful guide.
In a recently published paper, New York University computer-science professor Anasse Bari and graduate students Rafael Moraes and Jiachen Zhu found that traditional price-forecasting models can be enhanced by adding data sets available through New York City’s “Open Data” project, which makes a range of data published by city agencies and their partners available to the public.
In particular, crime statistics and restaurant health inspections offered the most insightful guide. And in the case of the restaurant data, that was a bit of a surprise, Bari told MarketWatch, in an interview.
It’s clear that nice restaurants and nice neighborhoods often go together.
“The intuition came from the fact that in New York, in many places where we have good restaurants it looks like somehow that affects the rents or the housing prices,” Bari said.
But the researchers initially thought the best potential insights might come from social-media reviews. One problem, however, was the well-documented issue of fake reviews. Business owners, after all, have an incentive to post positive reviews online that can distort the ratings.
In contrast, health inspection results can’t be faked. And every restaurant in New York is subject to at least one unannounced inspection a year. So in the end, high health ratings offer a cleaner perspective and provide a more useful signal than reviews, Bari said.
That said, making the model work wasn’t as easy as just going to Open Data and plugging in the figures. Substantial data cleaning and mapping efforts were also required, Bari explained.
To create the final data set, the researchers filtered only apartments that appeared at least twice in the data and stored their prices and sale dates, which allowed them to calculate the average monthly price growth for apartments in each zip code.
In a backtest, the researchers gathered data through 2017, then used predictive analytics to fuse the alternative data sources with historical prices to make predictions for 2018. They compared their findings to traditional forecast models based solely on historical price data. The results indicated that both the inspections data and the crime statistics led to better predictions and a lower error rate compared with relying solely on historical prices, Bari explained.
“The idea is that the two other data sets are much noisier but they still contain a faint signal that can improve our predictions. This is arguably plausible given the complexity of big cities, where many measurable factors are interconnected and can reinforce each other with certain time delays,” the researchers wrote, in the paper.
The paper also serves to underline why investors should be dissuaded from thinking that they can simply plug into a single data set and expect it to deliver market-beating insights.
Bari and other alternative-data experts caution against the hype that has accompanied the explosion of new data sets available to investors seeking an edge on the markets. An increasingly digitized economy, falling data storage prices and vastly growing computing power and other technological advances have made available a range of nontraditional data sets — from satellite images to credit-card data to web-scraping services — and spawned a rapidly growing industry.
For investors, individual alternative data sets by themselves should generally be expected to deliver results similar to traditional data sets, these experts say. The potential advantage is that a more diverse range of data sets combined with predictive analytics — a term that covers the use of historical data, statistical techniques and machine learning to make forecasts — can provide more accurate predictions that traditional data alone.
“This research project is a proof-of-concept of using two new data sets to make predictions of real-estate markets,” Bari said. An investment firm could use other data sets and use the same data-science approach described in the paper in an effort to improve their real estate predictions, he said.
“It requires several data experiments and patience,” he said. “It is like we are running a long marathon.”