"Same Word, Different Worlds"

Pragmatic Functions of 'Fuck' Across Strong- and Weak-Tie Online Communities

6 Annotators 15 Pairwise Comparisons Cohen's Kappa (κ)

AFinLA 2026 Symposium · Pilot Study

Expert

Expert (Lead)

Linguistics Student · Lead Annotator · n=300

Peer Rater A

Peer Rater A

No English Specialization · n=49

Peer Rater B

Peer Rater B

University Lecturer · Original Study Author · n=49

Peer Rater C

Peer Rater C

English Major · University Student · n=49

Gemini 3.1 Pro

Gemini 3.1 Pro

Google AI · Zero-Shot · n=281

GPT-5.3

GPT-5.3

OpenAI · Zero-Shot · n=246

Click a profile to highlight their pairwise comparisons in the matrix below.

Pairwise Kappa Heatmap Matrix

Slight (<0.21)
Fair (0.21–0.40)
Moderate (0.41–0.60)
Substantial (>0.60)

Key Statistical Findings

0.593

Highest Pairwise Agreement

Peer Rater C (English Major) and Gemini achieved the strongest agreement in the study (72.3%, n=47), suggesting that LLM pragmatic reasoning closely mirrors trained-but-non-specialist linguistic intuition.

0.415

Expert–AI Alignment

Gemini agreed with the expert annotator more than any human peer did (κ = 0.155–0.284), indicating that the LLM has internalized sociopragmatic patterns beyond surface-level sentiment.

≈ 0.40–0.45

Human Peer Consensus

All three peer raters achieved Moderate inter-rater agreement with each other (κ = 0.399–0.453), forming a distinct interpretive cluster separate from the expert's sociolinguistic framework.

0.136–0.218

ChatGPT Classification Bias

ChatGPT achieved the lowest agreement with every rater, collapsing the 5-label taxonomy into a binary aggression/bonding classification. 72% of its labels used only two categories.

Ranked Pairwise Agreement

Interpretation Scale

κ Range Interpretation
< 0.00 Less than chance
0.01–0.20 Slight
0.21–0.40 Fair
0.41–0.60 Moderate
0.61–0.80 Substantial
0.81–1.00 Almost perfect

Citation: Landis, J.R. & Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.