Sentiment with Sarcasm

ClassificationDifficulty: MEDIUM11 models comparedTotal cost: $0.001715

Prompt

What is the sentiment of this review? Reply with POSITIVE, NEGATIVE, or NEUTRAL.

"Oh great, another update that breaks everything. Love how the new UI makes simple tasks take twice as long. Really appreciate the team's dedication to making my workflow worse."

Expected Output

NEGATIVE

Model Responses

Claude Opus 4.6

Unrated
2,956ms130 tokens
**NEGATIVE** The review is entirely sarcastic. Phrases like "Oh great," "Love how... takes twice as long," and "Really appreciate... making my workflow worse" use verbal irony to express frustration and dissatisfaction with the update, UI changes, and overall impact on productivity.

GPT-4.1

Unrated
598ms63 tokens
NEGATIVE

GPT-4.1 Mini

Unrated
416ms63 tokens
NEGATIVE

GPT-4.1 Nano

Unrated
432ms63 tokens
NEGATIVE

o3

Unrated
1,884ms80 tokens
NEGATIVE

o3 Mini

Unrated
2,809ms205 tokens
NEGATIVE

o4 Mini

Unrated
1,611ms144 tokens
NEGATIVE

GPT-4o Mini

Correct
869ms63 tokens$0.000010
NEGATIVE

GPT-4o

Correct
843ms63 tokens$0.000173
NEGATIVE

Claude Haiku 4.5

Partial
961ms115 tokens$0.000323
NEGATIVE The review is clearly negative. The sarcastic language ("Oh great," "Love how," "Really appreciate") combined with complaints about broken functionality, poor UI design, and reduced efficiency all indicate dissatisfaction with the update.

Verdict note: Description should not be part of answer

Claude Sonnet 4.6

Partial
2,103ms131 tokens$0.001209
The sentiment of this review is **NEGATIVE**. Although the language uses sarcasm and irony (phrases like "Oh great," "Love how," and "Really appreciate"), the actual meaning expresses frustration and dissatisfaction with the update, the new UI, and its impact on the user's workflow.

Verdict note: Description should not be part of answer

Cost & Performance Comparison

ModelLatencyTokensCostVerdict
Claude Opus 4.62,956ms130
GPT-4.1598ms63
GPT-4.1 Mini416ms63
GPT-4.1 Nano432ms63
o31,884ms80
o3 Mini2,809ms205
o4 Mini1,611ms144
GPT-4o Mini869ms63$0.000010Correct
GPT-4o843ms63$0.000173Correct
Claude Haiku 4.5961ms115$0.000323Partial
Claude Sonnet 4.62,103ms131$0.001209Partial
Sentiment with Sarcasm — Model Lab — AISpendGuard