Sentiment with Sarcasm

ClassificationDifficulty: MEDIUM11 models comparedTotal cost: $0.001715

Prompt

What is the sentiment of this review? Reply with POSITIVE, NEGATIVE, or NEUTRAL.

"Oh great, another update that breaks everything. Love how the new UI makes simple tasks take twice as long. Really appreciate the team's dedication to making my workflow worse."

Expected Output

NEGATIVE

Model Responses

Claude Opus 4.6

Unrated

2,956ms130 tokens

**NEGATIVE**

The review is entirely sarcastic. Phrases like "Oh great," "Love how... takes twice as long," and "Really appreciate... making my workflow worse" use verbal irony to express frustration and dissatisfaction with the update, UI changes, and overall impact on productivity.

GPT-4.1

Unrated

598ms63 tokens

NEGATIVE

GPT-4.1 Mini

Unrated

416ms63 tokens

NEGATIVE

GPT-4.1 Nano

Unrated

432ms63 tokens

NEGATIVE

o3

Unrated

1,884ms80 tokens

NEGATIVE

o3 Mini

Unrated

2,809ms205 tokens

NEGATIVE

o4 Mini

Unrated

1,611ms144 tokens

NEGATIVE

GPT-4o Mini

Correct

869ms63 tokens$0.000010

NEGATIVE

GPT-4o

Correct

843ms63 tokens$0.000173

NEGATIVE

Claude Haiku 4.5

Partial

961ms115 tokens$0.000323

NEGATIVE

The review is clearly negative. The sarcastic language ("Oh great," "Love how," "Really appreciate") combined with complaints about broken functionality, poor UI design, and reduced efficiency all indicate dissatisfaction with the update.

Verdict note: Description should not be part of answer

Claude Sonnet 4.6

Partial

2,103ms131 tokens$0.001209

The sentiment of this review is **NEGATIVE**.

Although the language uses sarcasm and irony (phrases like "Oh great," "Love how," and "Really appreciate"), the actual meaning expresses frustration and dissatisfaction with the update, the new UI, and its impact on the user's workflow.

Verdict note: Description should not be part of answer

Cost & Performance Comparison

Model	Latency	Tokens	Cost	Verdict
Claude Opus 4.6	2,956ms	130	—	—
GPT-4.1	598ms	63	—	—
GPT-4.1 Mini	416ms	63	—	—
GPT-4.1 Nano	432ms	63	—	—
o3	1,884ms	80	—	—
o3 Mini	2,809ms	205	—	—
o4 Mini	1,611ms	144	—	—
GPT-4o Mini	869ms	63	$0.000010	Correct
GPT-4o	843ms	63	$0.000173	Correct
Claude Haiku 4.5	961ms	115	$0.000323	Partial
Claude Sonnet 4.6	2,103ms	131	$0.001209	Partial