InvestmentsDec 7 2022

Can voice analysis help predict future earnings?

twitter-iconfacebook-iconlinkedin-iconmail-iconprint-icon
Search supported by
Can voice analysis help predict future earnings?

A group of researchers at Ruhr University Bochum, Germany, have just completed an impressively original and methodologically sound research project on how the speech patterns of managers can help to predict future earnings.  

So far earnings prediction models have relied only on numerical financial data, but their research paper – "Listen Closely: Using Vocal Cues to Predict Future Earnings" – suggests that vocal cue models can do better and outperform the standard approaches. 

This work caught my attention immediately as I have been teaching presentation skills for years, of which voice is a fundamental element. 

Predicting earnings has long been a challenge for investment practitioners because the process entails complex interactions between many operational activities.  

Managers cannot fully prevent what they really think and believe from subtly permeating how they say things.

Yet, accurate data on this issue is the starting point and single most important input to firm and equity models in this context. In fact, the entire profession of financial analysts revolves around predicting future firm performance. 

Despite all this, the cues literally spoken by managers constitute a so-far overlooked element of this process. Collectively, research in the fields of psychology, neuroscience, accounting and finance unanimously suggest that vocal cues convey useful information. The structure of voice can accurately reflect cognitive states and thus also emotional states. 

However, no study so far has considered predicting future earnings in this manner. Interestingly too, there is anecdotal evidence that although analysts themselves acknowledge the value of vocal cues, they have not been able to incorporate this information systematically into their forecasts.  

Understanding vocal cues

Yet, the case for achieving this is clear. Because speech production is so complex, managers cannot fully prevent what they really think and believe from subtly permeating how they say things. This means that the tone and structure of voice can reveal issues that are not actually stated.

By contrast, people can of course control what they say through the words they choose. 

It is possible to interpret the message conveyed through what are referred to as the granular and sequential nature of vocal waves. Granular means that even small speech units are potentially valuable. For example, anger is associated with a larger jaw opening than sadness, which will in turn affect the voice structure. 

Analysing vocal cues may yield otherwise hidden and implicit insights into real perceptions and valuations.

Sequential refers to the voice 'melody'. The statement 'you can predict earnings changes' is essentially factual, if stated neutrally. But by raising one’s voice at the end, it can be converted into a question, and a clear emphasis on you shifts the message to the receiver role. 

Slight shifts in an individual’s emotional state can therefore affect the speech-production process, affecting several dimensions of the sound structure, including frequency, amplitude and overall range. At least 24 different emotions can evidently be detected from brief vocal statements and comments.

In this sense, vocal cues can inadvertently leak a speaker’s emotional state, so that, when the content is related to financial results, analysing vocal cues may yield otherwise hidden and implicit insights into real perceptions and valuations of both past and future firm performance. They can even evidently help prevent financial fraud and improve financial risk prediction. 

In other words, what may come across to the average listener as a bland and routine presentation or individual comment, can be exposed by the software as a warning of bad times for the company, or the converse, that a bonanza is on its way. 

On the other hand, deciphering vocal cues remains challenging.

Furthermore, there is anecdotal evidence that analysts acknowledge the value of vocal cues and may deliberately build non-verbal and vocal cues into their forecasts.

One analyst revealed that that their hedge fund even invited FBI agents to teach them how to 'read' what managers are really saying, above and beyond the specific words used. Another analyst stated that they specifically look for these non-verbal cues to assess how realistic managers’ statements are. 

On the other hand, deciphering vocal cues remains challenging. Retrieving useful information from vocal cues thus requires a sophisticated model that accounts, amongst other factors, for these granular and sequential vocal sound structures. 

Rich in data

Accordingly, the Bochum research team has developed a suitably high-tech model for analysing recordings of earnings conference calls – an increasingly common form of firm disclosure associated with significant information content.

These calls are teleconferences or webcasts at which managers of public firms discuss, with analysts and investors, the firm’s financial results during a given quarter or fiscal year. The calls consist of a presentation part and a Q&A session.

Because earnings conference calls are usually recorded and readily accessible, they provide a resource that is rich in analysable voice cues.

The paper says: "Although earnings are one of firms’ most thoroughly studied financial numbers, predicting future earnings remains a challenge for both researchers and investment practitioners. Collectively, our results imply that managers’ vocal cues are important information signals of future earnings that investment practitioners currently fail to consider.”

Because earnings conference calls are usually recorded and readily accessible, they provide a resource that is rich in analysable voice cues.

Furthermore, such audio data has become increasingly available in recent years, providing an ever-richer basis for examining information in the context of out-of-sample predictions, that is, predictions made by a model from new or 'unseen' data – the latter meaning not used for training the model.  

With sophisticated technology, it is possible to interpret the message way beyond the text itself, from the most subtle of voice cues. Furthermore, voice analysis can provide an extremely accurate interpretation. 

Doron Reichmann, a post-doctoral researcher on the project, says: “In seeking value-relevant information, research has now progressed from analysing mere accounting numbers to interpreting nuanced physiological responses from managers.”

The modelling process has inherent limitations and is only in its infancy. 

The Bochum work entails specially constructed model architectures that convert managers’ sound waves into spectrograms, that is, visualisations of sound. Using deep learning models that specialise in detecting patterns in the grid-like structures of these visualisations, they train the models to predict firm earnings for the next year.

In another model architecture, they use a pre-trained speech recognition system provided by Meta AI, which processes the raw waveform of audio signals at the millisecond level.

The researchers then fine-tune them in the context of earnings prediction. They have thus found a way to tap into a potentially vital information pool that has simply not been used before in the context of earnings prediction. Their large sample of conference calls from US public firms over a period of five years yields significant predictive power. 

A way to go

The authors warn, however, that their impressive results cannot readily and easily be used in practice. The modelling process has inherent limitations and is only in its infancy. 

The research literally constitutes a big data challenge, as audio data is inherently large and had to be run on high-end computers through a platform that provides a suite of cloud computing services. 

Nonetheless, their work provides initial evidence that trading strategies based on managers’ vocal cues can beat the market by almost 9 per cent on average. Specifically, the vocal-cue models developed for this research project substantially outperform the benchmark models that are currently state of the art.

Moreover, motivated by the desire of analysts to consider vocal cues in their forecasts, the researchers tested whether their model can improve professional analysts’ earnings forecasts.

Body language is now an established indicator of integrity, intentions, emotions and a whole lot more. Voice may potentially be just as powerful.  

Combining the forecasts from financial analysts with predictions based on the vocal-cue models (inadvertently provided by managers) does indeed systematically outperform analysts forecasts alone. This suggests that these cues constitute significant untapped potential for financial analysts.

Over time, this may well become a major source of information on what to expect from corporate earnings in the future. This is especially the case, as voice analysis seems to work particularly well in turbulent market phases such as the Covid-19 crisis, during which the value of numbers alone may be constrained. 

Indeed, the Bochum project covers the pandemic years of 2019-22, and has proven its worth against this background.  

Returning to my comment on presentation skills, body language is now an established indicator of integrity, intentions, emotions and a whole lot more. Voice may potentially be just as powerful.  

We could be onto something big. After all, this is just the first of many voice projects that the Bochum team are planning.   

Brian Bloch is a freelance journalist based in Germany