From I to We: Group Formation and Linguistic Adaption in an Online Xenophobic Forum

From I to We: Group Formation and Linguistic Adaption in an Online Xenophobic Forum


Venue: Journal of Social and Political Psychology

Quick takeaway:

  • Linguistic study of a xenophobic online chat room using Pennebaker’s LIWC text analytic system. Users who stay in the group change from individual to group pronouns and align linguistically. Cognitive complexity also appears to reduce as users align with the group


  • Much of identity formation processes nowadays takes place online, indicating that intergroup differentiation may be found in online communities. This paper focuses on identity formation processes in an open online xenophobic, anti-immigrant, discussion forum. Open discussion forums provide an excellent opportunity to investigate open interactions that may reveal how identity is formed and how individual users are influenced by other users. Using computational text analysis and Linguistic Inquiry Word Count (LIWC), our results show that new users change from an individual identification to a group identification over time as indicated by a decrease in the use of “I” and increase in the use of “we”. The analyses also show increased use of “they” indicating intergroup differentiation. Moreover, the linguistic style of new users became more similar to that of the overall forum over time. Further, the emotional content decreased over time. The results indicate that new users on a forum create a collective identity with the other users and adapt to them linguistically.


  • Social influence is broadly defined as any change – emotional, behavioral, or attitudinal – that has its roots in others’ real or imagined presence (Allport, 1954). (pg 77)
  • Regardless of why an individual displays an observable behavioral change that is in line with group norms, social identification with a group is the basis for the change. (pg 77)
  • In social psychological terms, a group is defined as more than two people that share certain goals (Cartwright & Zander, 1968). (pg 77)
  • Processes of social identification, intergroup differentiation and social influence have to date not been studied in online forums. The aim of the present research is to fill this gap and provide information on how such processes can be studied through language used on the forum. (pg 78)
  • The popularity of social networking sites has increased immensely during the last decade. At the same time, offline socializing has shown a decline (Duggan & Smith, 2013). Now, much of the socializing actually takes place online (Ganda, 2014). In order to be part of an online community, the individual must socialize with other users. Through such socializing, individuals create self-representations (Enli & Thumim, 2012). Hence, the processes of identity formation, may to a large extent take place on the Internet in various online forums. (pg 78)
  • For instance, linguistic analyses of American Nazis have shown that use of third person plural pronouns (they, them, their) is the single best predictor of extreme attitudes (Pennebaker & Chung, 2008). (pg 79)
  • Because language can be seen as behavior (Fiedler, 2008), it may be possible to study processes of social influence through linguistic analysis. Thus, our second hypothesis is that the linguistic style of new users will become increasingly similar to the linguistic style of the overall forum over time (H2). (pg 79)
  • This indicates that the content of the posts in an online forum may also change over time as arguments become more fine-tuned and input from both supporting and contradicting members are integrated into an individual’s own beliefs. This is likely to result (linguistically) in an increase in indicators of cognitive complexity. Hence, we hypothesize that the content of the posts will change over time, such that indicators of complex thinking will increase (H3a). (pg 80)
    • I’m not sure what to think about this. I expect from dimension reduction, that as the group becomes more aligned, the overall complex thinking will reduce, and the outliers will leave, at least in the extreme of a stampede condition.
  • This result indicates that after having expressed negativity in the forum, the need for such expressions should decrease. Hence, we expect that the content of the posts will change such that indicators of negative emotions will decrease, over time (H3b). (pg 80)
  • the forum is presented as a “very liberal forum”, where people are able to express their opinions, whatever they may be. This “extreme liberal” idea implies that there is very little censorship the forum is presented as a “very liberal forum”, where people are able to express their opinions, whatever they may be. This “extreme liberal” idea implies that there is very little censorship, which has resulted in that the forum is highly xenophobic. Nonetheless, due to its liberal self-presentation, the xenophobic discussions are not unchallenged. For example, also anti-racist people join this forum in order to challenge individuals with xenophobic attitudes. This means that the forum is not likely to function as a pure echo chamber, because contradicting arguments must be met with own arguments. Hence, individuals will learn from more experienced users how to counter contradicting arguments in a convincing way. Hence, they are likely to incorporate new knowledge, embrace input and contribute to evolving ideas and arguments. (pg 81)
    • Open debate can lead to the highest level of polarization (M&D)
    • There isn’t diverse opinion. The conversation is polarized, with opponents pushing towards the opposite pole. The question I’d like to see answered is has extremism increased in the forum?
  • Natural language analyses of anonymous social media forums also circumvent social desirability biases that may be present in traditional self-rating research, which is a particular important concern in relation to issues related to outgroups (Maass, Salvi, Arcuri, & Semin, 1989; von Hippel, Sekaquaptewa, & Vargas, 1997, 2008). The to-be analyzed media uses “aliases”, yielding anonymity of the users and at the same time allow us to track individuals over time and analyze changes in communication patterns. (pg 81)
    • After seeing “Ready Player One”, I also wonder if the aliases themselves could be looked at using an embedding space built from the terms used by the users? Then you get distance measurements, t-sne projections, etc.
  • Linguistic Inquiry Word Count (LIWC; Pennebaker et al., 2007; Chung & Pennebaker, 2007; Pennebaker, 2011b; Pennebaker, Francis, & Booth, 2001) is a computerized text analysis program that computes a LIWC score, i.e., the percentage of various language categories relative to the number of total words (see also (pg 81)
    • LIWC2015 ($90) is the gold standard in computerized text analysis. Learn how the words we use in everyday language reveal our thoughts, feelings, personality, and motivations. Based on years of scientific research, LIWC2015 is more accurate, easier to use, and provides a broader range of social and psychological insights compared to earlier LIWC versions
  • Figure 1c shows words overrepresented in later posts, i.e. words where the usage of the words correlates positively with how long the users has been active on the forum. The words here typically lack emotional content and are indicators of higher complexity in language. Again, this analysis provides preliminary support for the idea that time on the forum is related to more complex thinking, and less emotionality.
    • WordCloud
  • The second hypothesis was that the linguistic style of new users would become increasingly similar to other users on the forum over time. This hypothesis is evaluated by first z-transforming each LIWC score, so that each has a mean value of zero and a standard deviation of one. Then we measure how each post differs from the standardized values by summing the absolute z-values over all 62 LIWC categories from 2007. Thus, low values on these deviation scores indicate that posts are more prototypical, or highly similar, to what other users write. These deviation scores are analyzed in the same way as for Hypothesis 1 (i.e., by correlating each user score with the number of days on the forum, and then t-testing whether the correlations are significantly different from zero). In support of the hypothesis, the results show an increase in similarity, as indicated by decreasing deviation scores (Figure 2). The mean correlation coefficient between this measure and time on the forum was -.0086, which is significant, t(11749) = -3.77, p < 0.001. (pg 85)
    • ForumAlignmentI think it is reasonable to consider this a measure of alignment
  • Because individuals form identities online and because we see this in the use of pronouns, we also expected to see tendencies of social influence and adaption. This effect was also found, such that individuals’ linguistic style became increasingly similar to other users’ linguistic style over time. Past research has shown that accommodation of communication style occurs automatically when people connect to people or groups they like (Giles & Ogay 2007; Ireland et al., 2011), but also that similarity in communicative style functions as cohesive glue within a group (Reid, Giles, & Harwood, 2005). (pg 86)
  • Still, the results could not confirm an increase in cognitive complexity. It is difficult to determine why this was not observed even though a general trend to conform to the linguistic style on the forum was observed. (pg 87)
    • This is what I would expect. As alignment increases, complexity, as expressed by higher dimensional thinking should decrease.
  • This idea would also be in line with previous research that has shown that expressing oneself decreases arousal (Garcia et al., 2016). Moreover, because the forum is not explicitly racist, individuals may have simply adapted to the social norms on the forum prescribing less negative emotional displays. Finally, a possible explanation for the decrease in negative emotional words might be that users who are very angry leave the forum, because of its non-racist focus, and end up in more hostile forums. An interesting finding that was not part of the hypotheses in the present research is that the third person plural category correlated positively with all four negative emotions categories, suggesting that people using for example ‘they’ express more negative emotions (pg 87)
  • In line with social identity theory (Tajfel & Turner, 1986), we also observe linguistic adaption to the group. Hence, our results indicate that processes of identity formation may take place online. (pg 87)

Influence of augmented humans in online interactions during voting events

Influence of augmented humans in online interactions during voting events

  • Massimo Stella (Scholar)
  • Marco Cristoforetti (Scholar)
  • Marco Cristoforetti (Scholar)
  • Abstract: Overwhelming empirical evidence has shown that online social dynamics mirrors real-world events. Hence, understanding the mechanisms leading to social contagion in online ecosystems is fundamental for predicting, and even manouvering, human behavior. It has been shown that one of such mechanisms is based on fabricating armies of automated agents that are known as social bots. Using the recent Italian elections as an emblematic case study, here we provide evidence for the existence of a special class of highly influential users, that we name “augmented humans”. They exploit bots for enhancing both their visibility and influence, generating deep information cascades to the same extent of news media and other broadcasters. Augmented humans uniformly infiltrate across the full range of identified clusters of accounts, the latter reflecting political parties and their electoral ranks.
  • Bruter and Harrison [19] shift the focus on the psychological in uence that electoral arrangements exert on voters by altering their emotions and behavior. The investigation of voting from a cognitive perspective leads to the concept of electoral ergonomics: Understanding optimal ways in which voters emotionally cope with voting decisions and outcomes leads to a better prediction of the elections. (pg 1)
  • Most of the Twitter interactions are from humans to bots (46%); Humans tend to interact with bots in 56% of mentions, 41% of replies and 43% of retweets. Bots interact with humans roughly in 4% of the interactions, independently on interaction type. This indicates that bots play a passive role in the network but are rather highly mentioned/replied/retweeted by humans. (pg 2)
  • bots’ locations are distributed worldwide and they are present in areas where no human users are geo-localized such as Morocco.  (pg 2)
  • Since the number of social interactions (i.e., the degree) of a given user is an important estimator of the in uence of user itself in online social networks [1722], we consider a null model fixing users’ degree while randomizing their connections, also known as configuration model [2324].  (pg 2)
  • During the whole period, bot bot interactions are more likely than random (Δ > 0), indicating that bots tend to interact more with other bots rather than with humans (Δ < 0) during Italian elections. Since interactions often encode the spread of a given content online [16], the positive assortativity highlights that bots share contents mainly with each other and hence can resonate with the same content, be it news or spam.  (pg 2)
  • Differently from previous works, where the semantic content of bots and humans differs in its emotional polarity [12], in here we nd that bots mainly repeat the same political content of human users, thus boosting the spreading of hashtags strongly related to the electoral process, such as hashtags referring to the government or to political victory, names of political parties or names of influential politicians (see also 3). (pg 4)
  • Frequencies of individual hashtags during the whole electoral process display some interesting shifts, reported in Table III (Top). For instance, the hashtag #exitpoll, indicating the electoral outcome, becomes 10000 times more frequent on the voting day than before March 4. These shifts indicate that the frequency of hashtags reflects real-world events, thus underlining the strong link between online social dynamics and the real-world electoral process. (pg 4)
  • TABLE II. Top influencers are mostly bots. Hubs characterize influential users and broadcasters in online social systems [17], hence we use degree rankings for identifying the most in uential users in the network. (pg 5)
  • bots are mostly influential nodes which tend to interact mostly with other bots rather than humans and, when they interact with human users, they preferentially target the most influential ones. (pg 5)
  • we first filter the network by considering only pair of users with at least one retweet, with either direction, because re-sharing content it is often a good proxy of social endorsement [21]. However, Retweets alone are not sufficient to wash out the noise intrinsic to systems like Twitter, therefore we apply a more selective restriction, by requiring that at least another social action – i.e., either mention or reply – must be present in addition to a retweet [12]. This restrictive selection allows one to filter out all spurious interactions among users with the advantage of not requiring any thresholding approach with respect to the frequency of interactions themselves. (pg 5)
  • The resulting network is what we call the social bulk, i.e. a network core of endorsement and exchange among users. By construction, information ows among users who share strong social relationships and are characterized by similar ideologies: in fact, when a retweet goes from one user to another one, both of them are endorsing the same content, thus making non-directionality a viable approach for representing the endorsement related to content sharing. (pg 5)
  • Fiedler partitioning
  • The relevant literature has used the term “cyborg” for identifying indistinctly bot-assisted human or human-assisted bot accounts generating spam content over social platforms such as Twitter [5, 35]. Here, we prefer to use the term \augmented human” for indicating specifically those human accounts exploiting bots for artificially increasing, i.e. augmenting, their in uence in online social platforms, analogously to physical augmentation improving human performances in the real world [36]. (pg 8)
  • Baseline social behavior is defined by the medians of the two observables, like shown in Fig. 6c. This map allows to easily identify four categories of individuals in the social dynamics: i) hidden in uentials, generating information cascades rapidly spreading from a large small number of followers; ii) in uentials, generating information cascades rapidly spreading from a large number of followers; iii) broadcasters, generating information cascades slowly spreading from a large number of followers; iv) common users, generating information cascades slowly spreading from a small number of followers. (pg 9)
  • Hidden influentials, known to be efficient spreaders in viral phenomena [45], are mostly humans: in this category falls the augmented humans, assisted by social bots to increase their online visibility. (pg 10)
  • We define augmented humans as human users having at least 50% + 1 of bot neighbours in the social bulk. We discard users having less than 3 interactions in the social bulk. (pg 10)
  • The most central augmented human in terms of number of social interactions is Utente01, which interacts with 2700 bots and 55 humans in the social bulk. (pg 10)
  • The above cascade analysis reveals that almost 2 out 3 augmented humans resulted playing an important role in the flow of online content: 67% of augmented humans were either influentials or hidden influentials or broadcasters. These results strongly support the idea that via augmentation even common users can become social influencers without having a large number of followers/friends but rather by recurring to the aid of either armies of bots (e.g., Utente01, an hidden in uential) or the selection of a few key helping bots. (pg 11)