"Same Word, Different Worlds"

Pragmatic Functions of 'Fuck' Across Strong- and Weak-Tie Online Communities

6 Annotators • 15 Pairwise Comparisons • Cohen's Kappa (κ)

AFinLA 2026 Symposium · Pilot Study

Expert (Lead)

Linguistics Student · Lead Annotator · n=300

Peer Rater A

No English Specialization · n=49

Peer Rater B

University Lecturer · Original Study Author · n=49

Peer Rater C

English Major · University Student · n=49

Gemini 3.1 Pro

Google AI · Zero-Shot · n=281

GPT-5.3

OpenAI · Zero-Shot · n=246

Click a profile to highlight their pairwise comparisons in the matrix below.

Pairwise Kappa Heatmap Matrix

Slight (<0.21)

Fair (0.21–0.40)

Moderate (0.41–0.60)

Substantial (>0.60)

Key Statistical Findings

0.593

Highest Pairwise Agreement

Peer Rater C (English Major) and Gemini achieved the strongest agreement in the study (72.3%, n=47), suggesting that LLM pragmatic reasoning closely mirrors trained-but-non-specialist linguistic intuition.

0.415

Expert–AI Alignment

Gemini agreed with the expert annotator more than any human peer did (κ = 0.155–0.284), indicating that the LLM has internalized sociopragmatic patterns beyond surface-level sentiment.

≈ 0.40–0.45

Human Peer Consensus

All three peer raters achieved Moderate inter-rater agreement with each other (κ = 0.399–0.453), forming a distinct interpretive cluster separate from the expert's sociolinguistic framework.

0.136–0.218

ChatGPT Classification Bias

ChatGPT achieved the lowest agreement with every rater, collapsing the 5-label taxonomy into a binary aggression/bonding classification. 72% of its labels used only two categories.

Ranked Pairwise Agreement

Interpretation Scale

κ Range	Interpretation
< 0.00	Less than chance
0.01–0.20	Slight
0.21–0.40	Fair
0.41–0.60	Moderate
0.61–0.80	Substantial
0.81–1.00	Almost perfect

Citation: Landis, J.R. & Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.