3rd Party Academia Whitepapers

Please see below for a selection of technical whitepapers by various professionals, academic professors, doctorate students and independent researchers which focus on sentiment datamining and its use in modern day trading.


Computing trading strategies based on financial sentiment data using evolutionary optimization

by ronald hochreiter

In this paper we apply evolutionary optimization techniques to compute optimal rule-based trading strategies based on financial sentiment data. The sentiment data was extracted from the social media service StockTwits to accommodate the level of bullishness or bearishness of the online trading community towards certain stocks. Numerical results for all stocks from the Dow Jones Industrial Average (DJIA) index are presented and a comparison to classical risk-return portfolio selection is provided.

Keywords: #Evolutionaryoptimization, #sentimentanalysis, #technicaltrading, #portfoliooptimization


A Test of VIX Prediction and a Simple Trading Algorithm based on PsychSignal’s Sentiment Metrics

By Dr. sergey Yurgenson, ph.d. of Kaggle, edited by Dr. Matthew S. Checkley, Ph.d.

We demonstrate that PsychSignal’s sentiment metrics contain leading information for the VIX index. Moreover, trading models based only on sentiments outperform both naïve and comparable models without sentiments as an input. These tests show sentiment metrics’
worth in relevant trading algorithms. We argue adding inputs relevant to market dynamics is likely to further enhance the predictive abilities of such algorithms.


Leveraging collective intelligence in organizations

By Dr. dANIEL M. ROMERo, Ph.d. (university of michigan), Dr. áGNES HORVÁt, Ph.D. (Northwestern institute on complex systems) AND Dr. BRIAN UZZi, Ph.d. (kellogg school of management, Northwestern University)

In this study, we examine the resource allocation decisions of a large hedge fund. Each decision concerns the movement of funds in and out of stock holdings. Because hedge funds make a large number of these decisions and the accuracy of the decision can be clearly measured, even a single firm provides remarkably dense data to draw valid statistical tests. Our data covers two years during which the organization made approximately 800 decisions per week valued at an average of $700,000 per decision. Financial organizations also have sophisticated communications that are now acquired in their totality and accurately through electronic records. To capture  ommunications dynamics in the organization that may be indicative of collective intelligence mechanisms, we analyze the instant message communications of the organization, which include 22 million IMs exchanged among the 182 personnel of the hedge fund and their network of 8,646 outside contacts. From these IMs we identify two collective intelligence signals that we show to be predictive of the
changes in the market’s movement as recorded by public Dow Jones Industrial Average (DJIA) data.

Keywords: #physics, #computerscience, #socialphenomena, #collectiveintelligence, #networkanalysis, #complexitytheory


Leveraging Social Media to Predict Continuation and Reversal in Asset Prices

by dr. Patrick Houlihan, ph.d. (STEVENS INSTITUTE OF TECHNOLOGY), Dr. german g. creamer, Ph.D. (Columbia University Engineering) 

Using features extracted from StockTwits messages between July 2009 and September 2012, we show through simulations that: 1) both message volume and sentiment help explain the diffusion of price information; 2) both message volume and sentiment can be used as features to predict asset price directional moves. We also show that positive and negative sentiment diffuses into an assets price over a period of days. Our findings suggest statistics derived from both message volume and message sentiment can improve asset price forecasts.
“Groups are only smart when there is a balance between the information that everyone in the group shares and the information that each of the members of the group holds privately. It's the combination of all those pieces of independent information, some of them right, some of the wrong, that keeps the group wise.”
-James Surowiecki, The Wisdom of Crowds

Keywords: #Social Media, #Crowdsourcing, #Sentiment



by Dr. Hailiang Chen, ph.d. (City University of Hong Kong), Dr. Prabuddha De, ph.d (Purdue University), Dr. yu (Jeffrey) Hu, Ph.D. (Georgia Tech), and Dr. Byoung-Hyoun Hwang, ph.d. (purdue University)

Social media has become a popular venue for individuals to share the results of their own analysis on financial securities. This paper investigates the extent to which investor opinions transmitted through social media predict future stock returns and earnings surprises. We conduct textual analysis of articles published on one of the most popular social-media platforms for investors in the United States. We also consider the readers’ perspective as inferred via commentaries written in response to these articles. We find that the views expressed in both articles and commentaries predict future stock returns and earnings surprises.


Investor Sentiment in the Stock Market

by Dr. Malcolm Baker, Ph.D. and Dr. Jeffrey Wurgler, Ph.d.

The history of the stock market is full of events striking enough to earn their own names: the Great Crash of 1929, the ’Tronics Boom of the early 1960s, the Go-Go Years of the late 1960s, the Nifty Fifty bubble of the early 1970s, the Black Monday crash of October 1987, and the Internet or Dot.com bubble of the 1990s. Each of these events refers to a dramatic level or change in stock prices that seems to defy explanation. The standard finance model, in which unemotional investors always force capital market prices to equal the rational present value of expected future cash flows, has considerable difficulty fitting these patterns. Researchers in behavioral finance have therefore been working to augment the standard model with an alternative model built on two basic assumptions.


Quantifying the effects of online bullishness on international financial markets

by Dr. Huina Mao, Ph.d., Dr. Scott Counts, Ph.d., Dr. Johan Bollen, Ph.d.

Computational methods to gauge investor sentiment from commonly used online data sources that rely on machine learning classifiers and lexicons have shown considerable promise, but suffer from measurement and classification errors. In our work, we develop a simple, direct and unambiguous indicator of online investor sentiment, which is based on Twitter updates and Google search queries. We examine the predictive power of this new investor bullishness indicator for international stock markets. Our results indicate several striking regularities. First, changes in Twitter bullishness predict changes in Google bullishness, indicating that Twitter information precedes Google queries. Second, Twitter and Google bullishness are positively correlated to investor sentiment and lead established investor sentiment surveys. The former, in particular, is a more powerful predictor of changes in sentiment in the stock market than the latter. Third, we observe that high Twitter bullishness predicts increases in stock returns, with these then returning to their fundamental values. We believe that our results may support the investor sentiment hypothesis in behavioural finance.

Keywords: #computationalscience, #investorsentiment, #bigdata, #socialmedia, #internationalfinancialmarkets


Predicting Stock Price Swings with PsychSignal and BigML

by David gerster

People like to Tweet about stocks, so much so that stock ticker symbols get their own special dollar sign like $AAPL (https://twitter.com/search?q=%24AAPL) or $FB (https://twitter.com/search?q=%24FB). What if you could mine this data for insight into public sentiment about these stocks? Even better, what if you could use this data to predict activity in the stock market? That’s the premise behind PsychSignal (http://psychsignal.com/), a provider of “real time financial sentiment”. They harvest large streams of data from Twitter and other sources, then compute real time sentiment scores (one “bullish” and one “bearish”) for stocks.

Just for fun, we created a dataset combining daily prices and trading volume for AAPL (thanks Quandl (http://www.quandl.com/search/aapl)!) with daily average bullish and bearish scores from PsychSignal. We then trained a simple model to predict the percentage “swing” in Apple’s stock price, defined as the magnitude of the difference between the daily high and daily low divided by the opening price. Looking at the SunBurst view, we see a lot of bright green, which means the model is picking up some interesting correlations.

For example, if daily volume is more than 16.28 million shares, the bullish sentiment measure is more than 0.99, and the bearish measure is more than 0.78, then the model (https://bigml.com/shared/model/cNfkZSKF4iJu4228jTGmpaUKfGM) strongly predicts that the percentage price swing will be in the highest tercile (shown as “3rd third”).


Creation of a Systematic Trading Strategy | Tap into the Pulse of the Markets

by nick kirk (University of Washington Department of Applied Mathematics)

Combining sentiment data with technical indicators, and using a machine learning classification technique named
Support Vector Machines (SVM), a systematic trading strategy is created.
The classifier forecasts with reasonable accuracy the future direction of the daily closing prices for a subset of S&P500
stocks. Based on these predictions, indicators are created, signals generated and trading rules make path-dependent
actionable decisions to generate orders.
All research and development was implemented in the R software environment for statistical computing and graphics. The following R packages were used; quanstrat, blotter, PerformanceAnalytics, TTR, kernlab, caret, xts, quantmod, doParallel and doMC.
End-of-Day (EOD) U.S. stock prices are sourced from QuoteMedia through Quandl’s premium subscription.

The chosen source of sentiment is from StockTwits message posts that have been aggregated and scored by PsychSignal. 


Bulls, Bears…and Birds? Studying the Correlation between Twitter Sentiment and the S&P500

by eric d brown

In this paper, a research project is described that uses data analysis techniques combined with Natural Language Processing (NLP) to collect and analyze Twitter messages, determine whether sentiment is conveyed and, if so, how well that sentiment compares to existing financial market sentiment measures, such as the American Association of Individual Investors (AAII) sentiment survey (AAII 2012).

At the outset of this project, there were three main goals: 1.) To determine if data analytics can be used to automate sentiment analysis from Twitter messages; 2.) To determine whether the outcome of the analytical engine is comparable to existing survey methods and finally; 3.) To determine if there is any actionable knowledge contained within Twitter sentiment that can be used to make investing decisions.

Each of the above goals was accomplished.


Analysis of Twitter Messages for Sentiment and Insight for use in Stock Market Decision Making


For as long as there has been a market available for trading assets and financial instruments, there has been an interest in finding methods to gain an edge in that market. This search for an edge has led investors and researchers down many paths with many different
approaches to analysis of the markets (Shostak, 1997).

Many theories have been put forth to explain the movements within the stock market with some theories focusing on the underlying business behind a stock's price, other theories focusing on historical price movements and others focusing on the human behavioral aspects of the market. Throughout most of the last century, market participants and academics have created analysis techniques and prediction methods that have been used to determine how and when money is invested into the stock market (Bessembinder & Chan, 1998; Lo, 2004). This type of research has developed over the years and can be loosely categorized into four main areas: 1) Efficient Markets; 2) Behavioral Finance; 3) Fundamental Analysis; and 4) Technical Analysis. While a detailed look into any of these areas is outside the scope of this study, each area is discussed briefly.

Another area of research that has become prominent over the last thirty years is behavioral finance (Thaler, 1999). This area of research starts with the clear distinction that markets are not made of rational actors as described by the EMH and, therefore, the assumption of rational actions is a false assumption (Shleifer, 2000). The study of behavioral finance has grown steadily and is now considered to be one of the most promising areas of research for understanding the markets and the market participants (Thaler, 1999).

The concept of sentiment has become a major element found in much of modern economics and market theory. In fact, the idea of sentiment has evolved from one of market sentiment to that of investor sentiment, whereby researchers and market participants attempt to measure the aggregate of sentiment of individual investors as found in surveys released by the National Association of Active Investment Managers (NAAIM) and American Association of Individual Investors (AAII). The AAII and NAAIM surveys are used by many
investors to understand the overall sentiment of the market in order to make the necessary adjustments to their portfolios to take advantage of, or to protect themselves from, changes in market sentiment (AAII, 2012; NAAIM, 2012).



Can social microblogging be used to forecast intraday exchange rates?

by Panagiotis Papaioannou, Lucia Russo, Dr. George P. Papaioannou, Ph.D.Constantinos I. Siettos

The Efficient Market Hypothesis (EMH) is widely accepted to hold true under certain assumptions. One of its implications is that the prediction of stock prices at least in the short run cannot outperform the random walk model. Yet, recently many studies stressing the psychological and social dimension of financial behavior have challenged the validity of the EMH. Towards this aim, over the last few years, internet-based communication platforms and search engines have been used to extract early indicators of social and economic trends. Here, we used Twitter’s social networking platform to model and forecast the EUR/USD exchange rate in a high-frequency intradaily trading scale. Using time series and trading simulations analysis, we provide some evidence that the information provided in social microblogging platforms such as Twitter can in certain cases enhance the forecasting efficiency regarding the very short (intradaily) forex.


Market sentiment and exchange rate directional forecasting

BY Dr. Vasilios Plakandaras Ph. D., Theophilos Papadimitriou, Periklis Gogas, Konstantinos Diamantaras

The microstructural approach to the exchange rate market claims that order flows on a currency can accurately reflect the short-run dynamics of its exchange rate. In this paper, instead of focusing on order flows analysis we employ an alternative microstructural approach: We focus on investors’ sentiment on a given exchange rate as a possible predictor of its future evolution.

As a proxy of investors’ sentiment we use StockTwits posts, a message board dedicated to finance. Within StockTwits investors are asked to explicitly state their market expectations. We collect daily data on the nominal exchange rate of four currencies against the U.S. dollar and the extracted market sentiment for the year 2013. Employing econometric and machine learning methodologies we develop models that forecast in out-of-sample exercise the future direction of the four exchange rates. Our empirical findings reject the Efficient Market Hypothesis even in its weak form for all four exchange rates. Overall, we find evidence that investors’ sentiment as expressed in public message boards can be an additional source of information regarding the future directional movement of the exchange rates to the ones proposed by economic theory.

Keywords: #Marketsentiment, #exchangerates, #forecasting, #EfficientMarketHypothesis, #machinelearning


Using Twitter to Model the EUR/USD Exchange Rate

by Dr. dietmar Janetzko, Ph.d.

Fast, global, and sensitively reacting to political, economic and social events of any kind, these are attributes that social media like Twitter share with foreign exchange markets. Does the former allow us to predict the latter? The leading assumption of this paper is that time series of Tweet counts have predictive content for exchange rate movements. This assumption prompted a Twitter-based exchange rate model that harnesses regARIMA analyses for short-term out-of-sample ex post forecasts of the daily closing prices of EUR/USD spot exchange rates. The analyses made use of Tweet counts collected from January 1, 2012 – September 27, 2013 via the Otter API of topsy.com.

To identify concepts mentioned on Twitter with a predictive potential the analysis followed a 2-step selection. Firstly, a heuristic qualitative analysis assembled a long list of 594 concepts, e.g., Merkel, Greece, Cyprus, crisis, chaos, growth, unemployment expected to
covary with the ups and downs of the EUR/USD exchange rate. Secondly, cross-validation using window averaging with a fixed-sized rolling origin was deployed to select concepts and corresponding univariate time series that had error scores below chance level as defined by the random walk model that is based only on the EUR/USD exchange rate. With regard to a short list of 17 concepts (covariates), in particular SP (Standard & Poor’s) and risk, the out-of-sample predictive accuracy of the Twitter-based regARIMA model was found to be repeatedly better than that obtained from both the random walk model and a random noise covariate in 1-step ahead forecasts of the EUR/USD exchange rate. This advantage was evident on the level of forecast error metrics (MSFE, MAE) when a majority vote over different estimation windows was conducted.

The results challenge the semi-strong form of the efficient market hypothesis (Fama, 1970, 1991) which when applied to the FX market maintains that all publicly available information is already integrated into exchange rates.





The use of social networks like Twitter and Facebook has grown exponentially over the last few years. Twitter, which was founded in 2006, had an estimated 200 million users on January 1 2011 with more than 95 million tweets sent per day. With this rapid growth and significant adoption, Twitter has become an important tool for businesses and individuals to communicate and share information. In addition, Twitter has rapidly grown as a medium to share ideas and thoughts on investing decisions. This research builds on prior published research and attempts to determine whether there is correlation between twitter and the stock market by studying sentiment, message volume, price movement and stock volume as well as the affect that a twitter user’s reputation may have on sentiment and the stock market.

Keywords: #Sentiment #Analysis, #Decision #Support, #Knowledge #Sharing, #Twitter