Public Release: 16-Aug-2017
New machine learning algorithms will trace authors of exploitative advertising
NYU Tandon School of Engineering
BROOKLYN, New York – A team of university researchers has devised the first automated techniques to identify ads potentially tied to human trafficking rings and link them to public information from Bitcoin – the primary payment method for online sex ads.
This is the first step toward developing a suite of freely available tools to help police and nonprofit institutions identify victims of sexual exploitation, explained the computer scientists from the New York University Tandon School of Engineering; University of California, Berkeley; and University of California, San Diego.
Human trafficking is a widespread social problem, with an estimated 4.5 million people forced into sexual exploitation, according to the International Labor Organization. In 2016, the National Center for Missing and Exploited Children estimated that 1 in 6 endangered runaways reported to the group were probably sex-trafficking victims.
The Internet has enabled and emboldened human traffickers to advertise sexual services. Law enforcement efforts to trace and disband human trafficking rings are often confounded by the pseudonymous nature of adult ads and the tendency of ring leaders to employ multiple phone numbers and email addresses to avoid detection. Adding to the difficulty: Determining which online ads reflect willing participants in the sex trade and which reflect victims forced into prostitution.
The research team’s approach relies on two novel machine learning algorithms. The first is rooted in stylometry, or the analysis of an individual’s writing style to identify authorship. Stylometry can confirm authorship with high confidence, and in the case of online trafficking ads, allows researchers and police to identify cases in which separate advertisements for different individuals share a single author: a telltale sign of a trafficking ring. By automating stylometric analysis, the researchers discovered they could quickly identify groups of ads with a common author on Backpage, one of the most popular sites for online sex ads. (Since this research was conducted, the adult advertising section of Backpage was discontinued; however, the researchers noted that adult ads remain prevalent, now appearing in multiple sections of the site.)
After identifying groups of ads with a single author, the researchers tested an automated system that utilizes publicly available information from the Bitcoin mempool and blockchain — the ledgers that record pending and completed transactions. Because Backpage posts ads as soon as payment is received, the researchers compared the timestamp indicating submission of payment to the timestamp of the ads’ appearance on Backpage. All Bitcoin users maintain accounts, or “wallets,” and tracing payment of ads that have the same author to a unique wallet is a potential method for identifying ownership of the ads, and thus the individuals or groups involved in human trafficking.
Damon McCoy, an NYU Tandon assistant professor of computer science and engineering and one of the paper’s co-authors, explained that combining these techniques to identify sex ads by both author and Bitcoin owner represents a considerable advancement in assisting law enforcement and nonprofit organizations. “There are hundreds of thousands of these ads placed every year, and any technique that can surface commonalities between ads and potentially shed light on the owners is a big boost for those working to curb exploitation,” he said.
“The technology we’ve built finds connections between ads,” said Rebecca Portnoff, a UC Berkeley doctoral candidate in computer science who developed the algorithm as part of her dissertation. “Is the pimp behind that post for Backpage also behind this post in Craigslist? Is he the same man who keeps receiving Bitcoin for trafficked girls? Questions like these are answerable only through more sophisticated technological tools – exactly what we’ve built in this work – that link ads together using payment mechanisms and the language in the ads themselves.”
The research team also includes NYU Tandon doctoral student Periwinkle Doerfler; Danny Yuxing Huang, a doctoral student at UC San Diego; and Sadia Afroz, a senior researcher at the International Computer Science Institute.
The researchers deployed their automated author identification techniques on a sampling of 10,000 real ads on Backpage, a four-week scrape of all adult ads that appeared on Backpage during that time, as well as on several dozen ads they themselves placed for comparison. They reported an 89 percent true-positive rate for grouping ads by author — significantly more accurate than current stylometric machine learning algorithms. The team also reported a high rate of success in linking the ads they placed themselves to timestamps in the Bitcoin blockchain. They acknowledge, however, that they were unable to verify whether matches they made using real-life ads and Bitcoin transaction information truly correspond to individuals tied to human trafficking – that matter must ultimately be pursued by police.
“Sex trafficking of children hides in plain sight within the vast online escort environment. It’s difficult for investigators to sift through the mounds of data and figure out what is important and what is not when looking for a child,” said Julie Cordua, CEO of Thorn, a nonprofit organization working to prevent human trafficking. “This type of research is critical to advancing this work and helping investigators find children faster and reduce the time in trauma. We’re grateful to academics and researchers who are willing to lend their time and talent to this issue to help find new solutions that move this work forward.”
The researchers intend to refine their strategies in collaboration with law enforcement and nonprofit organizations.
Portnoff will present their paper, Backpage and Bitcoin: Uncovering Human Traffickers, this afternoon at the Association for Computing Machinery’s SIGKDD Conference on Knowledge Discovery and Data Mining, one of the world’s leading data mining conferences, which will publish the paper in its proceedings.
This work was supported by grants from the Amazon Web Services Cloud Credits for Research program, Giant Oak, Google, the National Science Foundation, and the U.S. Department of Education.
The researchers also wish to acknowledge Chainalysis for providing access to its platform for analyzing transactions on the Bitcoin blockchain, and Thorn.
About the New York University Tandon School of Engineering
The NYU Tandon School of Engineering dates to 1854, the founding date for both the New York University School of Civil Engineering and Architecture and the Brooklyn Collegiate and Polytechnic Institute (widely known as Brooklyn Poly). A January 2014 merger created a comprehensive school of education and research in engineering and applied sciences, rooted in a tradition of invention and entrepreneurship and dedicated to furthering technology in service to society. In addition to its main location in Brooklyn, NYU Tandon collaborates with other schools within NYU, the country’s largest private research university, and is closely connected to engineering programs at NYU Abu Dhabi and NYU Shanghai. It operates Future Labs focused on start-up businesses in downtown Manhattan and Brooklyn and an award-winning online graduate program. For more information, visit http://engineering.nyu.edu.