Multi-label Intent Detection

ClassificationDifficulty: HARD11 models comparedTotal cost: $0.002904

Prompt

Classify the user intent. Reply with ALL that apply, comma-separated: BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT.

"I've been charged twice this month and your app keeps crashing. I want a refund and I'm considering switching to a competitor unless you can offer me a better plan."

Expected Output

BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

Model Responses

Claude Opus 4.6

Unrated
2,133ms95 tokens
BILLING, TECHNICAL, COMPLAINT, UPGRADE, CANCELLATION

GPT-4.1

Unrated
1,073ms95 tokens
BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

GPT-4.1 Mini

Unrated
746ms90 tokens
BILLING, TECHNICAL, COMPLAINT, UPGRADE

GPT-4.1 Nano

Unrated
404ms87 tokens
BILLING, TECHNICAL, CANCELLATION

o3

Unrated
2,295ms304 tokens
BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

o3 Mini

Unrated
2,338ms358 tokens
BILLING, TECHNICAL, UPGRADE, COMPLAINT

o4 Mini

Unrated
2,968ms368 tokens
BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

GPT-4o Mini

Correct
1,136ms95 tokens$0.000023
BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

Claude Haiku 4.5

Correct
453ms91 tokens$0.000147
BILLING, TECHNICAL, COMPLAINT, UPGRADE

GPT-4o

Correct
776ms95 tokens$0.000388
BILLING, TECHNICAL, CANCELLATION, UPGRADE, COMPLAINT

Claude Sonnet 4.6

Incorrect
3,953ms218 tokens$0.002346
Based on the message, the applicable intents are: **BILLING, TECHNICAL, COMPLAINT, UPGRADE** **Reasoning:** - **BILLING** → Charged twice, wants a refund - **TECHNICAL** → App keeps crashing - **COMPLAINT** → Expressing dissatisfaction, threatening to leave - **UPGRADE** → Open to a better plan offer > *Note: **CANCELLATION** was considered but not applied, as the user expresses a **conditional threat** to switch rather than a definitive cancellation request. However, this message should be flagged as **high churn risk**.*

Verdict note: The intent was cancallation with conditions.

Cost & Performance Comparison

ModelLatencyTokensCostVerdict
Claude Opus 4.62,133ms95
GPT-4.11,073ms95
GPT-4.1 Mini746ms90
GPT-4.1 Nano404ms87
o32,295ms304
o3 Mini2,338ms358
o4 Mini2,968ms368
GPT-4o Mini1,136ms95$0.000023Correct
Claude Haiku 4.5453ms91$0.000147Correct
GPT-4o776ms95$0.000388Correct
Claude Sonnet 4.63,953ms218$0.002346Incorrect
Multi-label Intent Detection — Model Lab — AISpendGuard