Analyzing Trump Tweets with NLP

Matt B Segall
4 min readMar 26, 2021

Overview

America seems to be growing ever more divided. As we just got through one of the most polarizing presidencies in history, I thought it would be important to dissect some of Trump’s public-facing communication. At first, I was thinking of investigating his speeches, but I thought his famous tweets might be a more raw source from which to draw insight. Ultimately, I wanted to attempt to answer the questions: What was Trump always so busy tweeting about? How did these change throughout his presidency? And what does this say about his overall style, and him as a person and leader?

Data

Because trump is banned from Twitter, his activity is no longer available through their API. SO, I began with a dataset I found on Kaggle, of 40,000 tweets from when he first created his account (2009) though April 2020. Notably, this misses most of the COVID pandemic, his failed reelection and second impeachment.

Process

After cleaning the data, I began by working some topic modeling, trying multiple tools for representing documents (count-vectorizer, TF-IDF) in combination with various dimension-reduction techniques (PCA, NMF). Ultimately, count-vectorizer + NMF yielded the most plausible collections of words.

One topic particularly interested me. It’s top words by weight are:

“white, house, totally, dont, sources, stories, russia, dishonest, reporting, ratings, said, like, good, bad, corrupt, story, cnn, media, fake, news

So, I named this topic vector “fake news”, and I followed it through the rest of my analysis. Below is the tweet which contains the highest weight of this topic.

After identifying this topic, I wanted to see how the strength of this fake news topic weight varied through time. Shown below is the strength of fake news tweets throughout his presidency, with some markers for significant events/PR crises.

Surprisingly, this does not track with some of the more major events of his presidency. We might expect that he would tweet more about fake news — attempt to stoke distrust in the media — immediately following his impeachment, for example.

We do see some peaks around Aug 2018, Sept 2019 and April 2020 though, so let’s check out the highest fake news-weighted tweets from those spikes.

In all of the tweets, he is talking about these catastrophic events — both the impeachment and the Mueller probe. They just don’t time out as I would expect! I would love to further explore this by digging deeper into his specific tweets during these peaks and valleys.

Future Ideas

I did get sucked down the fake news rabbit hole a bit. To continue answering my initial questions, I’d like to:

  • Dig more into other topics and how they varied throughout his presidency
  • Compare his speech style and topics to other leaders through history
  • Deep NLP dive into his speech at the rally on Jan 6, 2021 which incited the insurrection

Bonus: Trump Tweet Generator

As a fun little side project to this, I trained a fast.ai text generator on the entire collection of tweets, which came up with some rather fun tweets.

trump_tweet_predict('I think Fox News is', 20)
>>>"I think Fox News is a very professional operation like the Wall Street Journal Its indeed SP Media has"
trump_tweet_predict('The Mueller Report is', 20)
>>>"The Mueller Report is considered a disaster It is a JOKE that the Mueller Report was a joke Home"
trump_tweet_predict('The Impeachment', 20)
>>>"The Impeachment Hoax is a ongoing scandal when the FBI is doing so poorly as any Dems in the"
trump_tweet_predict('Hilary Clinton is', 20)
>>>"Hilary Clinton is a drug lords who is very responsible for the massive Russia attacks of backing Clinton Foundation"
trump_tweet_predict('I am a victim of', 20)
"I am a victim of a terrible Witch Hunt During my tenure No Pressure was used by the United"

Thanks!

I appreciate you taking some time to check out my analysis of Trump’s tweets. If you have any feedback, ideas, or just want to get in touch, feel free to reach out on LinkedIn! The code for this project can be found here.

--

--