What is Outrage|Us?

Outrage|Us provides an objective, target-specific measure of anger and hate speech on Twitter.

Headlines proclaim the age of outrage

With all the noise around growing division and polarization worldwide, we often encounter unsubstantiated, qualitative statements about the increasing temperature of public discourse. But how do we, as a society, change something that is not measured?

That’s where Outrage|Us comes in. Outrage|Us provides a regular, target-specific measure of anger and hate speech on Twitter over time. Intended for journalists, activists, policy makers, and social scientists who want objective and consistent analysis rather than anecdotes, vague and sporadic polling and generic positive/negative sentiment analysis, Outrage|Us will cut through the noise.

For occasional updates, follow us on Twitter Follow us on Twitter

How it Works

The detailed models and code can be found in the GitHub repo. Below, we show an overview of the classification of a single tweet.

Pipeline depicting the flow of a single tweet through the model

Tweets are collected via the Twitter streaming API, which provides a sample of tweets every second. We collect approximately 700,000 tweets per day. Each of these tweets is then fed to both the anger and hate models.

The anger model classifies the tweet as angry or not, and then identifies the topic of anger. Angry tweets comprise between 3 and 6 % of the daily tweets we collect. The majority of the tweets are simply angry, with no obvious topic. These tweets remain unclassified.

The hate model classifies the tweet as hate speech or not, and then identifies the target of that hate. Hate speech is by definition targeted at a group or individual.

Once tweets are classified, Outrage|Us identifies relevant news headlines for annotation and selects representative sample tweets. All of this is then shown to you in the final visualization.

Read more about the definitions of hate and anger and the tweet and news headline selection in the FAQs.

FAQs

+ Who should be interested in this tool?

+ What is 'hate'?

+ What is 'anger'?

+ How were the models trained?

+ What is engagement and how is it calculated?

+ How do we select sample tweets?

+ How do we select relevant headlines for annotation?

+ How are we funded?

The Team

Daniel Olmstead

Daniel Olmstead comes to Data Science as a way of marrying his interests in analysis and design. Over the last 20 years he has been a freelance full-stack web designer, a graphic artist, a dramaturg, and an equity analyst, but it wasn't until he embarked on the Masters of Information and Data Science at UC Berkeley that he was finally able to do all the fun things in one place. An English Major, Daniel has a deep love of language and this project--combining Linguistics, Machine Learning, NLP and data visualization--represents the intersection and culmination of decades of study. You can see a sample of his other work at Omsteadily.com

 

Mike Winton

Mike Winton is a data geek at heart, currently working as an engineering leader in Data Science & Engineering at Google for their Search and Assistant products. He has also been the engineering lead for data and metrics for Engineering Education at Google; led a software engineering team for wearables and mobile at Motorola; and founded Google's Developer Relations organization. Prior to Google he has worked in manufacturing and enterprise software companies. He will be completing his Master's in Data Science from UC Berkeley in August 2019. He has an earlier Master's in Materials Science, also from Berkeley, and a Bachelor's from the University of Michigan. You can contact him on LinkedIn at https://linkedin.com/in/michaelwinton/.

 

Ram Iyer

Ram Iyer has 19 years of cumulative software Development and Management experience in the Electronic Design Automation Industry. He has a strong track record in developing and deploying complex technology to improve customer productivity. He is passionate about ML/AI and Big data technologies. He has a Bachelors in Computer Science with several graduate level courses taken from Texas A&M/Stanford. He will graduate with a Master's degree in Data Science from Berkeley in August 2019.

 

Alla Hale

Alla Hale creates new products and is passionate about using data to inform design decisions. She received a Bachelor of Chemical Engineering degree from the University of Minnesota, and will complete a Master of Information and Data Science from UC Berkeley in August 2019. She has worked on many varied projects. But from photorefractive polymers for updatable holography to semiconductor assembly to epoxy resins for wind turbines, what drives her is the data-driven new product design process. In addition to hiking, she likes to play with sticks, string, and data.

 

Testimonials

These are some ways people have told us they envision using Outrage|Us. Please share your terrific use cases with us!

"Outrage|Us gives journalists another source of information about rising levels of hate against specific communities. This data is more granular and real-time, offering the possibility of sounding warnings before rising levels of hate show up in hate crimes data."

“As a member of one of the hated minority groups this is a very useful barometer to gauge when hate is reaching a critical threshold.”

“I teach undergraduates, and I can easily imagine using this as a tool in some of my classes. I certainly love it as a case study in "visual representation of data" -- but I think its utility transcends that limited description.”

“I might use this in my classroom to talk about current events.”

“It’s hard to know if my bubble on social media is representative.”

"I oversee a social media team for a national nonprofit. We frequently have to decide whether some outrageous tweet or hateful attack necessitates a formal statement of condemnation from our organization. Outrage|Us would help us confirm when there is significant public upset."

Acknowledgements

This project was started as part of the Capstone course for UC Berkeley’s Master of Information in Data Science program. We would like to acknowledge our instructors, Alberto Todeschini and Stanislav Kelman, for their continued encouragement and persistent requests for forward progress.

We would also like to acknowledge those who provided their time for project feedback or user interviews to help us produce a useful tool. Thank you to Alex Hughes, Ben Arnoldy, Stefan Wojcik, Christian Anthony, Patrick van Kessel, Zachary Steinert-Threlkeld, Armand Kok, Samantha Mah, Ben Arnoldy, Kim Darnell, Emily Rapport, and many of our family, friends, and colleagues.

References

  1. M. Rezvan, S. Shekarpour, F. Alshargi, K. Thirunarayan, V. L. Shalin, and A. Sheth, “Analyzing and learning the language for different types of harassment,” arXiv:1811.00644 [cs], Nov. 2018.
  2. B. Mathew, N. Kumar, P. Goyal, and A. Mukherjee, “Analyzing the hate and counter speech accounts on Twitter,” p. 11.
  3. P. Mishra, M. Del Tredici, H. Yannakoudakis, and E. Shutova, “Author Profiling for Hate Speech Detection,” arXiv:1902.06734 [cs], Feb. 2019.
  4. T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated Hate Speech Detection and the Problem of Offensive Language,” arXiv:1703.04009 [cs], Mar. 2017.
  5. M. Hasan, E. Rundensteiner, and E. Agu, “Automatic emotion detection in text streams by analyzing Twitter data,” Int J Data Sci Anal, vol. 7, no. 1, pp. 35–51, Feb. 2019.
  6. M. Cliche, “BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, 2017, pp. 573–580.
  7. R. Plutchik, “Chapter 1 - A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION,” in Theories of Emotion, R. Plutchik and H. Kellerman, Eds. Academic Press, 1980, pp. 3–33.
  8. M. Hajjem and C. Latiri, “Combining IR and LDA Topic Modeling for Filtering Microblogs,” Procedia Computer Science, vol. 112, pp. 761–770, Jan. 2017.
  9. S. Zhu, S. Li, Y. Chen, and G. Zhou, “Corpus Fusion for Emotion Classification,” in COLING, 2016.
  10. A. Seyeditabari et al., “Cross Corpus Emotion Classification Using Survey Data,” SSRN Electronic Journal, 2018.
  11. P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep Learning for Hate Speech Detection in Tweets,” Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion, pp. 759–760, 2017.
  12. T. Sahni, C. Chandak, N. R. Chedeti, and M. Singh, “Efficient Twitter Sentiment Classification using Subjective Distant Supervision,” arXiv:1701.03051 [cs], Jan. 2017.
  13. A. Seyeditabari, N. Tabari, and W. Zadrozny, “Emotion Detection in Text: a Review,” arXiv:1806.00674 [cs], Jun. 2018.
  14. P. Shaver, J. Schwartz, D. Kirson, and C. O’Connor, “Emotion knowledge: Further exploration of a prototype approach.,” Journal of Personality and Social Psychology, vol. 52, no. 6, pp. 1061–1086, 1987.
  15. W. Wang, L. Chen, K. Thirunarayan, and A. P. Sheth, “Harnessing Twitter "Big Data" for Automatic Emotion Identification,” in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, Amsterdam, Netherlands, 2012, pp. 587–592.
  16. A. Koenecke and J. Feliu-Fabà, “Learning Twitter User Sentiments on Climate Change with Limited Labeled Data,” arXiv:1904.07342 [cs], Apr. 2019.
  17. P. Michel and G. Neubig, “MTNT: A Testbed for Machine Translation of Noisy Text,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 543–553.
  18. “Robert Plutchik’s Wheel of Emotions - Video & Lesson Transcript,” Study.com. [Online]. Available: http://study.com/academy/lesson/robert-plutchiks-wheel-of-emotions-lesson-quiz.html. [Accessed: 23-May-2019].
  19. M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment strength detection for the social web,” Journal of the American Society for Information Science and Technology, vol. 63, no. 1, pp. 163–173, Jan. 2012.
  20. “The Anger Iceberg,” The Gottman Institute, 08-Nov-2016. [Online]. Available: https://www.gottman.com/blog/the-anger-iceberg/. [Accessed: 23-May-2019].
  21. R. Ibrahim, A. Elbagoury, M. S. Kamel, and F. Karray, “Tools and approaches for topic detection from Twitter streams: survey,” Knowledge and Information Systems, vol. 54, no. 3, pp. 511–539, Mar. 2018.
  22. D. Alvarez-Melis and M. Saveski, “Topic Modeling in Twitter: Aggregating Tweets by Conversations,” p. 4.
  23. S. Müller, T. Huonder, J. Deriu, and M. Cieliebak, “TopicThunder at SemEval-2017 Task 4: Sentiment Classification Using a Convolutional Neural Network with Distant Supervision,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, 2017, pp. 766–770.
  24. P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, “TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets,” arXiv:1810.10308 [cs], Oct. 2018.
  25. P. M. Sosa, “Twitter Sentiment Analysis using combined LSTM-CNN Models.”
  26. Y. Chen, J. Yuan, Q. You, and J. Luo, “Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM,” 2018 ACM Multimedia Conference on Multimedia Conference - MM ’18, pp. 117–125, 2018.
  27. A. Go, R. Bhayani, and L. Huang, “Twitter Sentiment Classification using Distant Supervision,” p. 6.
  28. S. Mohammad and S. Kiritchenko, “Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan, 2018.
  29. C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text,” p. 10.
  30. D. A. Broniatowski et al., “Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate,” Am J Public Health, vol. 108, no. 10, pp. 1378–1384, Oct. 2018.