Information flow reveals prediction limits in online social activity
James P. Bagrow, Xipei Liu, and Lewis Mitchell
Department of Mathematics & Statistics, University of Vermont, Burlington, VT, United States
Abstract
Modern society depends on the flow of information over online social networks, and popular social platforms now generate significant behavioral data. Yet it remains unclear what fundamental limits may exist when using these data to predict the activities and interests ofindividuals. Here we apply tools from information theory to estimate the predictive information content of the writings of Twitter users and to what extent that information flows between users. Distinct temporal and social effects are visible in the information flow, and these estimates provide a fundamental bound on the predictive accuracy achievable with these data. Due to the social flow of information, we estimate that approximately 95% of the potential predictive accuracy attainable for an individual is available within the social ties of that individual only, without requiring the individual’s data.How did they go about?
We gathered a dataset of n = 927 users of the Twitter social media platform. Users were selected who wrote in English, were active for at least one year, and had comparably sized social networks. We applied both computational tools and human raters to help avoid bots and non-personal accounts. For each user, we retrieved all of their public posts excluding retweets (up to the 3200 most recent public posts, as allowed by Twitter). Examining these texts, we determined each user’s 15 most frequent Twitter contacts and gathered the texts of those users as well, providing us ego-alter pairs. See Supporting Material (SM) for full details on data collection, filtering, and processing.Tweetsters rejoice: Even those not present in the data can be profiled.
In summary, the ability to repeatedly and accurately predict the text of individuals provides considerable value to the providers of social media, allowing them to develop profiles to identify and track individuals and even manipulate information exposure. That information is so strongly embedded socially underscores the power of the social network: by knowing who are the social ties of an individual and what are the activities of those ties, our results show that one can in principle accurately profile even those individuals who are not present in the data.Pdf here
Keine Kommentare:
Kommentar veröffentlichen
Hinweis: Nur ein Mitglied dieses Blogs kann Kommentare posten.