Counting on Twitter: Harvard's Web Ecology Project (Part One)

Anyone who has read my blog long knows that I am not big on counting things. Some of it is that I have math anxiety -- a serious vulnerability for someone who spent the first 20 years of his career at MIT! Some of it is that I think people often act as if counting things is the same thing as analyzing things or that the only things that count are things that came be counted. I often wage a one-man struggle against the push to quantify the universe -- perhaps as if (arbitrary science fiction reference warning) the world would end if we could just capture all of the billions of names of God. That said, I am finding myself mellow more than a little now that I am at USC, am watching my former graduate students struggle to grasp quantitative methods, and getting to know some of my office mates and colleagues who count things for a living. And there is a particularly value in trying to understand the scale on which certain changes in our communication environment are occurring -- at least to capture some order of their magnitude. And that's why I have been following with some interest the emergence of a research team at Harvard focused on understanding Twitter and its place in the "web ecology." Many members of the team are graduate students I worked with in a range of capacities during my time at MIT and have come to value their insights into digital media. Their data is already helping me to reframe some of the thinking I am doing about spreadable media and knowing how many people come to this blog now through my tweets, my bet is that you will find what they are doing interesting as well. In this first installment, the responses come from Dharmishta Rood, who I met through the Knight news challenge a few years ago and who took several of my classes during my final year at MIT. I featured one of her essays on the blog last spring. Next time, she will be joined by some other members of the research team.

What do you mean by web ecology? What does the name of your group tell us about the assumptions guiding your research?

We summarize our research by the statement that Web Ecology studies the relationship of the nature of data and the behavior of actors on the internet.

Web Ecology as a field, rather than focusing on the Internet from various fields such as Sociology, Humanities, Business or Media Studies, focuses on the Web itself, combining methodologies from multiple, often interrelated disciplines, to decipher activity online both quantitatively and qualitatively. In our personal research practices we frequently use large-scale data mining to inform our research questions and to further our understanding about the cultures and communities evident online. In addition to providing quantitative analysis about the social layer of the web, we see our role as Web Ecologists to provide tools for other Web Ecologists in an open manner for the community of researchers. We also see the advantageous position of this type of resarch for businesses interested in marketing and online presence.

What can you tell us about the core methodology you are applying to understanding how Twitter works?

We try to break down Twitter into quantifiable interactions. We understand that there are many factors outside of Twitter--both time specific, such as breaking news, the hour of a TV show or a holiday, but also new trends and information being spread throughout the web. We try to look at all of it within the ecosystem of Twitter itself. At Web Ecology we try to look at what we can measure--namely retweets, mentions, @replies, #hashtags and common keywords within the sea of tweets.

We understand that the web is constrained by various forces and configurations. Rather than a utopian or deterministic perspective, Web Ecology recognizes that the web is not limitless or truly divorceable from various geographic, social, historical, and other realities.

Web Ecology endorses the systematic creation and testing of models, which leads us to a heavily quantitative approach, that can then be paired with a qualitatitive exploration of these findings. We also don't overlook Internet phenomena as transient cultural fads--we see cultural creation on the Internet as impartially as possible, and also that code and users are part of an inseparable aggregate web phenomenon.

Some of your earliest results dealt with the role of Twitter in the aftermath of the Iran elections. What kinds of data emerged from your investigation? What did that tell us we didn't already know about the twitter traffic surrounding these events?

Our report cites much of the popular media that both creates the term yet also criticizes the hasty declaration of a "Twitter Revolution" in Iran.

Using 12 keywords and hashtags, we found that 58% of relevant twitter conversation did NOT contain the common hashtag #iranelection. This allowed us to get a much more comprehensive overview of the Twittersphere during the Iranian election.

One of the most interesting findings to emerge out of the report were these two facts in conjunction: The top 10% of users in our study account for 65.5% of total tweets and one in four tweets were retweet of another user's content, showing that the users who tweet the most are not always the most influential.

twitter mj_dies(2).jpg

You've also looked at the Twitter traffic following Michael Jackson's death. What similarities and differences did you find in the discussion surrounding these two events?

Similarly to the Iran election, with Michael Jackson's death on Twitter there were many keywords. One of the most interesting findings was the trajectory of each event over the Twittersphere. In the case of Michael Jackson's death, there were over 279,000 tweets within the first hour of mainstream news reports of Jackson's death, whereas with the Iranian election, there were 2,024,166 tweets total (over eighteen days), but never more than 17,500 tweets in any given hour. These tweets fluctuated during times of unrest.

Since the excitement on twitter decreased over time, especially after the first hour, the type of content was inherently very different. We spent time hand-coding tweets (in the Social Science sense, having individuals read and analyze the tweets according to certain metrics) rather than strictly doing data analysis. The Michael Jackson report sought to understand sentiment on Twitter, rather than the trajectory of a real-time event spanning many days.

twitter mj_iran(2).jpg

How important is retweeting to the ecology of the web?

Within twitter specifically, retweeting is only one of the many ways people can interact with content. It becomes important when new audiences see content from users they do not follow, but another important feature of Twitter is search. Users following a particular topic of interest can come across new content to consume and share.

What do you think Twitter is doing that is different from other kinds of social networks?

Twitter allows users to follow one another asymmetrically, meaning that users do not have to follow those that follow them. From this an interesting dynamic emerges wherein follower counts are meaningful in a separate way than the number and type of people a user follows. A user is often valued more for the amount of followers--an account with immensely more users they are following than follow them is likely spam, whereas a user like Ashton Kutcher (@aplusk) only follows ~300 users but has almost 4,000,000 followers.

Twitter, as it's been deemed many times over is a "micro blogging service," meaning the updates contain news and information like blogs, but with many fewer characters. This micro-update style is now a relevant part of other social networks, both during and after the increase of Twitter's userbase.

Dharmishta Rood is Director of Research Relations at the Web Ecology Project and a recent graduate Harvard's Graduate School of Education. Her work deals with large scale and interpersonal communication systems like social networks and news. These types of platforms allow users to generate and consume information in ways that further social connections and learning. She is a 2008 Knight News Challenge winner for Populous Project, a free and open-source platform for online news, holds a degree in Design | Media Arts from UCLA and is a Fellow at the Center for Future Civic Media. She tweets @dharmishta and blogs at dharmishta.com.

Reblog this post [with Zemanta]