Defining a Front-Runner: Online Trolls versus Kamala Harris


In this post we elaborate on reporting from Politico about influence campaigns using recent work at Marvelous AI: we demonstrate a persistent, escalating effort to define Kamala Harris on Twitter; we summarize two prominent stories that factor in several narrative themes about her candidacy; we present a nascent framework for aggregating and summarizing political narrative themes; and we discuss the dozen or so themes defining Kamala Harris during the week of February 7 – 14, 2019. 

Setting the Frame

Wednesday, we discussed recent reporting that active misinformation campaigns are underway in social media. Much of the public conversation about these campaigns is interpreted as implicating Russian involvement. But the terrain is much more chaotic and multi-faceted. Even the Politico story Wednesday stressed the unknown provenance of these attacks

Researchers and others interviewed for this story say they cannot conclusively point to the actors behind the coordinated activity. It’s unclear if they are rogue hackers, political activists or, as some contend, foreign state actors such as Russia, since it bears the hallmarks of earlier foreign attacks. One of the objectives of the activity, they say, is to divide the left by making the Democratic presidential primary as chaotic and toxic as possible.

The Race So Far

Our social media analysis over the past several months has been suggestive of a wide open multi-tier race with numerous popular candidates. A Monmouth Poll released on February 4 shows over a dozen candidates with a net favorable ratings in the double digits.1 More important though, most of the candidates are still talking to an audience whose majority has not yet formed an opinion of them.

Monmouth University Poll, February 4, 2019

In other words, most candidates still have an opportunity to make a first impression, even with Democratic voters.

Not surprisingly, operatives with agendas of all flavors are taking the opportunity to preemptively frame the candidates in ways that are most favorable to those agendas: opposing campaigns from either party; motivated interest groups; foreign state actors; scam artists.

Twitter Has Crowned a Front-Runner

Twitter mention counts show who people are talking about on Twitter. Since the beginning of the year, that person has been Kamala Harris. Harris leads both in the total count of mentions and in the number of days that she had the most mentions.

Twitter Mention Counts of Potential Democratic Candidates from December 20, 2018 to February 20, 2019

But Warren and Sanders have seen bursts of attention that have momentarily pushed their counts into the lead. Analysts interpret these bursts as a mixture of influence campaigns and genuine interest. 

Nevertheless, we view Harris as a comfortable front-runner in the Twitter mentions race. But we should be careful about how we define “front-runner” in this context. Kamala Harris is the candidate that people are racing to define. For better or worse.

Not all Tweets are the Same

The obvious questions are: who is doing this framing? and what are they saying? Our distribution plots for last week suggest that the answer is all sorts of people.

Weekly Bias Distribution February 14, 2019

But, on average, more right wing and less credible than expected for a Democratic primary electorate.

Average Bias Distribution February 14, 2019

And the distribution does not make clear how many of the left-linking users are also preemptively defining Harris, but for competing Democratic campaigns. A closer look at the substance of these attempts to define the candidate is in order regardless.

Attack of the Trolls

As we discussed Wednesday, the objectives of V.D. troll campaigns are to divide the American public, especially along racial lines. Politico reports that Harris campaign officials share this assessment.

An official with the Harris campaign said they suspect bad actors pushing misinformation and false narratives about the California Democrat are trying to divide African Americans, or to get the media to pay outsized attention to criticism designed to foster divisions among the Democratic primary electorate.

We will take a closer look at two specific high-profile attacks and then attempt to describe the twitter narrative landscape for Kamala Harris more broadly.

Imposters on “Black Twitter”

Last week the Daily Beast reported about troll networks impersonating African Americans on social media to spread divisive narratives and frames. And intelligence analyst Malcolm Nance calls out a specific line of attack. 

We see this in our own data as well. Along with “Kamala Harris is a cop”, the #ADOS meme has been a persistent and escalating attack since we started collecting data in November. We will look at the relative incidence of these memes over that time in a post soon. For now just note that these examples frame Harris as untrustworthy; as an unreliable ally; or, worse, as a self-interested interloper. Aspects of these themes were also deployed in various ways by right wing trolls against both Obama and Clinton.

Puff, Puff, Pathetic

The Harris campaign has been smart about defining her for all segments of the population that do not yet know her. For black voters, one of her early outreach efforts was to go on The Breakfast Club and openly answer questions on a range of topics. But the attention backbone (propaganda feedback loop) that promotes nonsense onto Fox News latched onto a small segment of the conversation in order to promote a narrative of deception on Harris’ part. 

Representative tweet in the “tupac right” cluster

In the interview, the hosts had proceeded quickly from a discussion of Harris’ pot use in college to another about her interests in music. But host Charlamagne Tha God also interjected a joke about pot use into the pause before her answer. As the San Francisco Chronicle recounts, this was the source of any “confusion.”

During a Thursday appearance on MSNBC, Charlamagne Tha God and DJ Envy accused Fox News of “lying” about what Harris said.

“I mean, we wanted to humanize her, not just talk about politics, talk about what she likes, what she does,” DJ Envy said. “And I asked what she listens to and she said she listens to Snoop Dogg and Tupac at the same time my co-host was still talking about the marijuana and it was just a funny exchange but she was actually answering me and people took it that she was answering Charlamagne and said she was lying, which was not true.”

And later the hosts compel their audience not to fall for the ruse.

In both of these cases, the Harris campaign and its media allies have done a good job of setting right these distortions–or getting out ahead of them. Let’s hope that trend continues. 

Analyzing Social Media Attacks on Twitter

Both of these examples are well documented by somewhat extensive media coverage–although more so the Tupac story than the impersonators. But the Politico piece reports a widespread operation, with tactics refined since the 2016 campaign. 2020 attackers are more subtle. And they lean more heavily on the amplification of existing fissures in American political discourse than on the injection of new messages into the conversation.  

Beyond these well reported cases, what messages are being pushed? Which are effective? With which audiences? As post-mortems of 2016 make clear, the latter two questions are measured and tracked by influence operators. They know which messages are effective; and they know which audiences they impact. But attackers have the advantage of knowing which messages they are trying to push. Defenders are mostly in the dark.

Grokking from 30,000 Feet

We want to present messages meaningfully in aggregation. In order to jump-start a virtuous cycle, we settled on a simple table display. 2 3 For the week ending on February 14, the major topics about Kamala Harris are summarized as:

Major Kamala Harris Topics Feb 7 -14, 2019

We’ve sorted the topics from liberal to conservative, using the Bias score. And we’ve color-coded the values to make the summary more readable. Before we walk through the topics, let’s talk briefly about the summary.


We group the tweets by the similarity of their messages and report the groups with more than a few dozen examples, given a minimum coherence score. So this is not all of the tweets in our corpus.

The topics are more akin to themes, but they are enough to get started. The id is arbitrary. The label is hand-assigned by us for this post. We use word lists to characterize these clusters internally.


These values provide an idea of how frequently each of the themes occurred during the week in question. Vol is the total number of tweets in the cluster and repeat is the rate are which they are exact copies (e.g. retweets). One interesting property is that a high repeat score is suggestive of amplification by an influence operation.


These values help to describe how much variation there is in the framing within that cluster. Coherence (coher) is a technical measure of how “similar” the tweets are, partisanship (partis) is the proportion of tweets that are not in the center. Polarization (polar) reflects the “compromise” between the two partisan “sub-clusters” in the topic. This metric ignores volume; it is the average of the centers of the two sub-clusters. For Kamala Harris last week, this value is always positive (i.e. right of center).


These metrics describe how far right or left the conversation is overall. L/R reframes polarization as a ratio. Bias is the average political bias value for the entire set of tweets in the cluster. Negative numbers reflect a left-leaning link pattern; positive numbers reflect a right-leaning link pattern.


These are early, dictionary-based estimates of the average emotional content of each cluster. The emotion “anticipation” is abbreviated as antic, the rest are spelled out.

The Clusters


pot (cluster 22)

This topic is left-of-center commentators going to town on the anti-Harris messaging around the Breakfast Club interview. This was a small volume cluster with too much slang to register emotion scores with our current dictionaries. And snark is not an emotion we track. 

Most representative tweet in the “pot” cluster
tupac left (cluster 26)

This was another left-leaning response to the Breakfast Club attacks. It was higher volume and went more directly at the truth of the attack. It used more conventional language registering an emotionally-varied and moderate tone. 

Representative Tweet from the “tupac left” cluster
sexism (cluster 6)

This cluster is a hodgepodge of double-standard observations regarding the Breakfast Club attacks. It was somewhat high-volume and moderately repetitive. The emotional language was varied and somewhat strong.

Most representative tweet in the “sexism” cluster


facts (cluster 5)

This cluster is primarily focused on fact-driven narratives about Senator Harris and her history as a prosecutor. Many center substantially on the Breakfast Club story.

Most representative tweet in the “facts” cluster

But the conversation is varied and knotty. 

Second most representative tweet in the “facts” cluster

This cluster is low volume and emotionally low-intensity but does demonstrate some messaging risk for the Harris campaign, especially in terms of message consistency among perceived surrogates.

justice (cluster 6)

This cluster is small and highly coherent. It is a critique of Harris’ policies relative to a justice frame. Despite being low volume there is a diversity of intense emotional content. The left edge of the Democratic base is likely to pursue these themes throughout the primary. Senator Harris will need to continue to frame her own history on criminal justice issues, but also to more widely deploy surrogates to promote that history.

Representative tweet in the “justice” cluster
DA (cluster 1)

The DA cluster is a catch-all for narratives about Senator Harris’ history as a prosecutor, mostly as San Francisco District Attorney, but also as CA Attorney General. The sentiment is varied, the volume is high, the coherence is low and the repeat rate is high. This is an active conversation with a variety of inter-related influence operations. 

Representative tweet in the “DA” cluster

Of course, there is also some Tupac.

Representative tweet in the “DA” cluster
birther (cluster 11)

This is a complex cluster that we will look at in more detail later. It leverages both left-leaning and right-leaning narratives of legitimacy and is thick with symbols and shorthand. The volume is moderate, but there does not yet seem to be substantial amplification. We named this cluster “birther” because we suspect it occupies the same narrative space as that attack on Obama. Like the Obama attack, we see it as low risk if handled properly. 

Representative tweet in the “birther” cluster


honesty (cluster 0)

The main right-of-center narrative about the Breakfast Club conversation goes to honesty. This is the biggest cluster by far and appears to be heavily promoted, with a high rate of repetition. Honesty is the crux of the narrative complex used to defeat Hillary Clinton. This is a high risk space that will need to be managed actively. Influence agents clearly know that and are working it hard. That said, the Tupac exchange appears to have been won by Team Harris.

Representative tweet in the “honesty” cluster

And even the non-identical tweets are suspiciously similar.

Representative tweet in the “honesty” cluster
prison (cluster 12)

This is another high risk cluster that is only seeing moderate volume and amplification. If the clustering algorithm had merged it with the DA cluster, we’d be getting close to a high-volume, heavily promoted theme. 

Representative tweet in the “prison” cluster

However, this cluster is among the more coherent, despite being circulated both by mainstream political journalists and by far right political operatives.

Representative tweet in the “prison” cluster

This theme is high risk, especially in the context of recent federal criminal justice reform signed by Trump. The high sorrow and anger values are particularly worrying. 

guns (cluster 13)

You’ve seen this one before. Both of our gun themes are low volume, high repetition. We do not expect this to present particularly high risk to the Harris campaign in particular, despite the high values of emotional content.  

Representative tweet in the “Guns” cluster
tupac right (cluster 4)

This cluster represents the main Breakfast Club attack. Like the more general honesty theme, it focuses on that aspect of her personality, attempting to frame her as a serial liar. It is medium volume and repetition. This is probably organic right wing content and most of the amplification was from that network. Somewhat surprisingly, it has very little emotional content. 

Representative tweet in the “tupac right” cluster


taxes (cluster 10)

This cluster represents a defense of the Trump Tax Cuts by way of an attack on Senator Harris’ credibility. This is the most heavily promoted Kamala Harris theme in our corpus for last week. As in several other themes, a major line of attack is honesty. You can see this in the higher than average rate of language about trust. 

Representative tweet in the “taxes” cluster

Much of the framing is presented as a test of competency. We will discuss the deployment of sexist themes in a post early next week. For the time being, note the attempts to re-frame a passing comment as an indictment of Harris’ entire understanding of the tax code. Of course, theatrics like this are the seeds of mythology in the Fox News bubble. 

Representative tweet in the “taxes” cluster

Right wing operatives have used similar tactics in the gun debate.4. I would expect more of this framing, probably not limited to the tax debate. 

reparations (cluster 2)

This is a low volume cluster with a far right bias. 

Representative tweet in the “reparations” cluster

Given the coverage by Breitbart, however, I suspect we haven’t seen the last of it. 

Representative tweet in the “reparations” cluster
willie brown (cluster 20)

This is another far-right cluster. But it is moderately high-volume and heavily promoted.

Representative tweet in the “willie brown” cluster

The narrative is anchored on Breitbart coverage and users are mostly repeating the language of the Breitbart-generated tweets. 

Representative tweet in the “willie brown” cluster

The emotional content is low. And the message is quarantined to a far-right audience. This theme plays a cathartic role for segments of the Trump base, but does not appear destined for a larger audience. We may see more of it. But for now, the risk of wider discussion seems low. 

  1. The same poll found that Democrats are looking, first and foremost, for a candidate who is electable.
  2. We will talk in more detail about the technical components in future posts.
  3. And this is v0, so probably a lot will change over time, too.
  4. For example “dumb libtards don’t even know what semi-automatic means”

Related Posts