Multi-label Intent Detection

ClassificationDifficulty: HARD11 models comparedTotal cost: $0.002904

Prompt

Classify the user intent. Reply with ALL that apply, comma-separated: BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT.

"I've been charged twice this month and your app keeps crashing. I want a refund and I'm considering switching to a competitor unless you can offer me a better plan."

Expected Output

BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

Model Responses

Claude Opus 4.6

Unrated

2,133ms95 tokens

BILLING, TECHNICAL, COMPLAINT, UPGRADE, CANCELLATION

GPT-4.1

Unrated

1,073ms95 tokens

BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

GPT-4.1 Mini

Unrated

746ms90 tokens

BILLING, TECHNICAL, COMPLAINT, UPGRADE

GPT-4.1 Nano

Unrated

404ms87 tokens

BILLING, TECHNICAL, CANCELLATION

o3

Unrated

2,295ms304 tokens

BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

o3 Mini

Unrated

2,338ms358 tokens

BILLING, TECHNICAL, UPGRADE, COMPLAINT

o4 Mini

Unrated

2,968ms368 tokens

BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

GPT-4o Mini

Correct

1,136ms95 tokens$0.000023

BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

Claude Haiku 4.5

Correct

453ms91 tokens$0.000147

BILLING, TECHNICAL, COMPLAINT, UPGRADE

GPT-4o

Correct

776ms95 tokens$0.000388

BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

Claude Sonnet 4.6

Incorrect

3,953ms218 tokens$0.002346

Based on the message, the applicable intents are:

**BILLING, TECHNICAL, COMPLAINT, UPGRADE**

**Reasoning:**
- **BILLING** → Charged twice, wants a refund
- **TECHNICAL** → App keeps crashing
- **COMPLAINT** → Expressing dissatisfaction, threatening to leave
- **UPGRADE** → Open to a better plan offer

> *Note: **CANCELLATION** was considered but not applied, as the user expresses a **conditional threat** to switch rather than a definitive cancellation request. However, this message should be flagged as **high churn risk**.*

Verdict note: The intent was cancallation with conditions.

Cost & Performance Comparison

Model	Latency	Tokens	Cost	Verdict
Claude Opus 4.6	2,133ms	95	—	—
GPT-4.1	1,073ms	95	—	—
GPT-4.1 Mini	746ms	90	—	—
GPT-4.1 Nano	404ms	87	—	—
o3	2,295ms	304	—	—
o3 Mini	2,338ms	358	—	—
o4 Mini	2,968ms	368	—	—
GPT-4o Mini	1,136ms	95	$0.000023	Correct
Claude Haiku 4.5	453ms	91	$0.000147	Correct
GPT-4o	776ms	95	$0.000388	Correct
Claude Sonnet 4.6	3,953ms	218	$0.002346	Incorrect