How social media language can help us predict user personality
- LindaKKaye
- 6 hours ago
- 4 min read
As more and more of our everyday interactions and behaviours take place online, this presents psychologists with important questions around how we understand each other based on online behaviour and how this relates to interpersonal interactions.
There are various ways we can infer a user’s personality from their content on social media. A study by Farnadi et al (2016) used machine learning algorithms based on content from different social media environments. They performed a comparative analysis of various state-of-the-art computational personality recognition methods on a varied set of social media data from Facebook, Twitter and YouTube.
So what did they do?
The researchers garnered user data from three large social media platforms as well as personality questionnaire scores, based on the Big-5 personality traits. The platforms and associated data included:
YouTube- 404 YouTube vlogs, each with full transcripts, the vlogger’s gender. The goal was to explore how speaking style and visual behaviour related to perceived personality
Facebook- 3,731 English-speaking Facebook users and metrics such as status updates, group memberships, average network size, and likes
Twitter - 102 Twitter users’ tweets (average tweets per user).
How did they do it?
The researchers extracted linguistic and emotional features from the data which were analysed in accordance with various types of text analysis software. These included:
Linguistic Inquiry and Word Count tool (LIWC). This could extract features including standard counts (e.g., word count), as well as psychological processes (e.g., anger words), relativity (e.g., verbs used to refer to future tense), and linguistic dimensions.
NRC- this is a lexicon containing words which are categorised in terms of emotions and sentiment (positive, negative).
MRC- is a psycholinguistic database containing psychological and distributional information about words based on various properties (e.g., syllables, number of letters etc).
SentiStrength- this scales text to a sentiment score from 1 (no sentiment) to 5 (very strong sentiment)
SPLICE- linguistic features such as positive and negative self evaluation of the speaker
The researchers tested different machine learning models: simple ones like decision trees, and more complex ones like multi-target stacking. The model would take the data from the social media platform, and make predictions about personality, which would then be compared to the responses from each participant’s personality questionnaire.
What did they find?
The researchers examined how user activity and demographics related to personality traits across YouTube, Facebook, and Twitter datasets. Out of 166 shared features, only 15 showed consistent correlations with personality traits—suggesting that useful features may differ across social media platforms.
The demographic features of age and gender were found to have significant correlation with personality scores across all three datasets. There was a positive relationship between gender and Agreeableness on Facebook – Facebook users who are women have a higher agreeableness than men. However, the relation was negative in case of YouTube and Twitter. This means that for female Facebook users, the mean personality score for Agreeableness will be higher than men, but lower in case of YouTube and Twitter users.
To understand agreeableness, various linguistic features showed to be helpful in this regard, but this varied based on social media platform. For example, word count showed a positive relation with agreeableness on Facebook and Twitter but a negative association on YouTube. Number of letters in the word and adjectives, positive and negative self-evaluation all predicted agreeableness but this also varied by platform. Namely, number of letters and adjectives on Facebook and Twitter were positively related whereas YouTube was negatively related. Negative self-evaluation was negatively related to agreeableness on Facebook and YouTube not positively on Twitter, whereas positive self-evaluation had positive relationships across all platforms.
To detect emotional stability, linguistic features from LIWC such as words relating to health and also leisure were predictive but again the nature of their relationship varied between platform. Specifically, use of health words was negatively related to emotional stability on Facebook and YouTube but positively on Twitter whereas use of leisure words was positively related to emotional stability across all platforms.
Openness was related to the use of motion words (e.g., walk, move, go) but this was related in a positive direction on YouTube but negatively on Facebook and Twitter.
Finally, conscientiousness and extraversion were more consistent in their relationships with linguistic features in that the relationships remained constant across all platforms. That is, conscientiousness was negatively related to the use of negation words (e.g., no, never, not) as well as words that are acquired late (AOA), and extraversion was positively related to the use of assent words (e.g., agree, ok, yes).
So what?
Overall, the results from this study, suggested that it may not be possible to always generalise the correlation between social media behaviours and the personality traits given that these differ across social media platforms. However, the findings provide some indication that there is merit to exploring linguistic features of online behaviour as a window into some aspects of human psychology.
References
Farnadi, G., Sitaraman, G., Sushmita, S., Celli, F., Kosinski, M., Stillwell, D., ... & De Cock, M. (2016). Computational personality recognition in social media. User modeling and user-adapted interaction, 26, 109-142. https://doi.org/10.1007/s11257-016-9171-0
Comentarios