How Trump’s Tweets Follow and Move the Stock Market
A Project by Jerry Huang, Roger Zhang, et al.
At first, most of the tweets are from an Android and then there is a switch to predominantly iPhone with a mix of Twitter Web Client and then eventually Twitter Media Studio. The Twitter Media Studio and Web Client tweets are likely to be staffers since these platforms, especially the former, are geared towards press/media teams.
The basic Dow variables (Open, High, Low, Close) tell us only about the strength of the US economy. As the US economy keeps growing, the general trend of Dow variables is just to rise. On the other hand, the daily range of the Dow (Max - Min) is the essentially the daily volatility, which is probably why the Dow daily range and VIX Open are highly correlated.
“How much Donald Trump’s tweets improve our ability to predict the change in the VIX between a given day and the following day?”
The most frequent tweets by Trump mention country, and people, but also include his opponents like Democrat, China, Mexico. Also he mentioned a lot about the fake news and witch hunt.
Variables of Interest and Intuitions:
Financial Predictors:
The difference between Dow opening and closing prices for each of the five proceeding business days.
This information should represent or tell us something about the current trends in the stock market and short-term volatility (i.e. Are the tweets on a given day and subsequent change in VIX both a result of some "third variable" event/news).
Tweet-based Predictors:
Sentiment analysis of each tweet
This variable is the sum of positive and negative polarity or valence for all of Trump's tweets on a given day. Also whether his tweets contain facts, or more on his personal emotional expressions.
e.g. a tweet "it is snowing" has very low absolute polarity and low subjectivity score, where a tweet like "I hate the snow!" would have a high (negative) absolute polarity and subjectivity score.
The intuition here is that volatility depends on the magnitude of the polarity and subjectivity since VIX does not account for direction.
The number of times on a given day that a tweet referenced certain keywords like "China" or "tariff"
The intuition here, since the keywords are economically focused, is that the total number of these keywords mentioned gives a metric of how economically focused the tweets of the day were.
This is not necessarily an exhaustive or refined list.
Full list of our chosen keywords:
"stock", 'market', "agreement", "negotiator", "negotiation", "trade", "china", "economy", "jobs", "tariff", "employ", "s&p", "auto", "farmer"
"TRUMPINESS"
Media reports indicate that Trump switched to an iPhone from an Android phone in early 2017. Previously, Trump uses Android for Tweeting, and his staffers use other platforms, including Twitter for iPhone, Twitter for Web, for tweeting. Therefore, we can use the tweets created during this pre-iPhone period as a training set, using device as the ground-truth value of whether Trump posted a certain post.
We extract features from the text and tweet metadata as our predictors. Using TF-IDF, we generate a vector weighting the most important words among Trump's vocabularies. We also extracted dummy features that indicate whether a tweet includes a link, picture, video, hashtag, "@" or not, as well as the sentiment scores (polarity and subjectivity) of Trump's Tweet.
Here we fit the indicators we have into the Random Forest classification model to give us the probability of whether the twitter is tweeted by Trump himself or by his team member.
Features include hour of tweet, whether the tweet includes link, whether the tweet includes hashtag, whether the tweet includes "..." (indicate threading), polarity sentiment score, subjectivity sentiment score, year, month, day, minute of tweet, as well as the generated TF-IDF vectors.
Then we use standard 0.5 as the cut off line to classify whether the tweet is sent by Trump himself, or it's by other members in his team. With this binary classification, we have created two datasets: one without this classification named as "Unfiltered", the other one that filters the tweet sent by Trump called "Filtered".
There is a peak at around 11, 12, compared with the staff version where the peak happens at around 20. The middle time is from 15-1.
Conclusion
We used Dow Jones index and parameters derived from Donald Trump's tweets to build a classification model that predicts whether a day's VIX Open index would significantly increase, plateau, or significantly decrease compared with the last trading day. We first trained a binary classification model trained on Donald Trump's tweet using sentiment analysis and TF-IDF vectors to identify tweets that are actually posted by Donald Trump himself - who presumably would have larger market impact. Then, using the filtered true Donald Trump tweets, sentiment scores, and dummy variables that indicate whether a word is included in the tweets, incorporating Dow Jones index, we trained a series of models based on Random Forest to classify the change of VIX Open index.
For our best model including all of our parameters, we achieved a test accuracy of 75.11%, 4.89% higher than our model that only includes Dow Jones index. This indicates that Donald Trump's parameters we've derived is a valid signal and can inform decision making.