Public Release: 1-Aug-2017
University of Cambridge
‘Celebrity’ Twitter accounts – those with more than 10 million followers – display more bot-like behaviour than users with fewer followers, according to new research.
The researchers, from the University of Cambridge, used data from Twitter to determine whether bots can be accurately detected, how bots behave, and how they impact Twitter activity.
They divided accounts into categories based on total number of followers, and found that accounts with more than 10 million followers tend to retweet at similar rates to bots. In accounts with fewer followers however, bots tend to retweet far more than humans. These celebrity-level accounts also tweet at roughly the same pace as bots with similar follower numbers, whereas in smaller accounts, bots tweet far more than humans. Their results will be presented at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) in Sydney, Australia.
Bots, like people, can be malicious or benign. The term ‘bot’ is often associated with spam, offensive content or political infiltration, but many of the most reputable organisations in the world also rely on bots for their social media channels. For example, major news organisations, such as CNN or the BBC, who produce hundreds of pieces of content daily, rely on automation to share the news in the most efficient way. These accounts, while classified as bots, are seen by users as trustworthy sources of information.
“A Twitter user can be a human and still be a spammer, and an account can be operated by a bot and still be benign,” said Zafar Gilani, a PhD student at Cambridge’s Computer Laboratory, who led the research. “We’re interested in seeing how effectively we can detect automated accounts and what effects they have.”
Bots have been on Twitter for the majority of the social network’s existence – it’s been estimated that anywhere between 40 and 60% of all Twitter accounts are bots. Some bots have tens of millions of followers, although the vast majority have less than a thousand – human accounts have a similar distribution.
In order to reliably detect bots, the researchers first used the online tool BotOrNot (since renamed BotOMeter), which is one of the only available online bot detection tools. However, their initial results showed high levels of inaccuracy. BotOrNot showed low precision in detecting bots that had bot-like characteristics in their account name, profile info, content tweeting frequency and especially redirection to external sources. Gilani and his colleagues then decided to take a manual approach to bot detection.
Four undergraduate students were recruited to manually inspect accounts and determine whether they were bots. This was done using a tool that automatically presented Twitter profiles, and allowed the students to classify the profile and make notes. Each account was collectively reviewed before a final decision was reached.
In order to determine whether an account was a bot (or not), the students looked at different characteristics of each account. These included the account creation date, average tweet frequency, content posted, account description, whether the user replies to tweets, likes or favourites received and the follower to friend ratio. A total of 3,535 accounts were analysed: 1,525 were classified as bots and 2010 as humans.
The students showed very high levels of agreement on whether individual accounts were bots. However, they showed significantly lower levels of agreement with the BotOrNot tool.
The bot detection algorithm they subsequently developed achieved roughly 86% accuracy in detecting bots on Twitter. The algorithm uses a type of classifier known as Random Forests, which uses 21 different features to detect bots, and the classifier itself is trained by the original dataset annotated by the human annotators.
The researchers found that bot accounts differ from humans in several key ways. Overall, bot accounts generate more tweets than human accounts. They also retweet far more often, and redirect users to external websites far more frequently than human users. The only exception to this was in accounts with more than 10 million followers, where bots and humans showed far more similarity in terms of the volume of tweets and retweets.
“We think this is probably because bots aren’t that good at creating original Twitter content, so they rely a lot more on retweets and redirecting followers to external websites,” said Gilani. “While bots are getting more sophisticated all the time, they’re still pretty bad at one-on-one Twitter conversations, for instance – most of the time, a conversation with a bot will be mostly gibberish.”
Despite the sheer volume of Tweets produced by bots, humans still have better quality and more engaging tweets – tweets by human accounts receive on average 19 times more likes and 10 times more retweets than tweets by bot accounts. Bots also spend less time liking other users’ tweets.
“Many people tend to think that bots are nefarious or evil, but that’s not true,” said Gilani. “They can be anything, just like a person. Some of them aren’t exactly legal or moral, but many of them are completely harmless. What I’m doing next is modelling the social cost of these bots – how are they changing the nature and quality of conversations online? What is clear though, is that bots are here to stay.”