Types of Classifiers

Common classifier patterns, including binary, multiclass, multilabel, hierarchical, open-set, and ordinal classifiers.

For the purposes of this guide, we won't cover every type of classifier - just the ones that are relevant to what we've seen in working with customers.

The first design choice is the shape of the output. Most classifier designs fall into a few common patterns:

Type Output Use when...
Binary One of two labels The decision is naturally yes/no, pass/fail, relevant/irrelevant, or otherwise reducible to two outcomes.
Multi-class Exactly one label from a fixed set The categories are mutually exclusive, and every input should map to one best answer.
Multi-label Any subset of labels from a fixed set Multiple labels can apply at once, such as user intents, content topics, risk flags, or product attributes.
Hierarchical One or more labels across levels of a taxonomy The decision has a natural parent-child structure, such as support category -> issue type -> root cause.
Open-set A known label, other, or a proposed new label The taxonomy is still evolving, or you expect meaningful inputs that do not fit the current label set.
Ordinal / scoring An ordered label, score band, or Likert-style rating The output has meaningful order, such as severity, quality, risk, urgency, or degree of fit.

Binary

The "hello world" of classification. Binary classifiers make one of two possible decisions. If the task can honestly be represented as a yes/no question, start here.

Use binary classifiers when the decision boundary is narrow and easy to reason about. Common label sets include:

  • true / false
  • pass / fail
  • relevant / irrelevant
  • should_escalate / should_not_escalate

Avoid binary classifiers when there is a meaningful third state. If a human expert would often say "unclear", "not enough information", or "partially", forcing the model into a binary label set will usually produce false certainty, or force the model into a decision when it's actually unsure.

Multiclass

Multiclass classifiers choose exactly one label from a closed set of possible labels. Often, this is the next step up from a binary classifier, or a way to extend a binary decision to include a third class.

Use multiclass classifiers when the labels are mutually exclusive and each input should have one best answer. Some examples:

  • true, false, unclear
  • pass, fail, insufficient_evidence
  • billing_issue, technical_issue, account_issue, product_feedback

Avoid multiclass classifiers when labels can reasonably overlap. For example, a support ticket can be both a billing issue and a cancellation risk. Forcing one label may make downstream routing simpler, but it can erase information the business actually needs.

Multilabel

Multilabel classifiers choose any subset of labels from a closed set. They are best when one categorical choice will not sufficiently express the decision boundary.

Use multilabel classifiers for intent detection, topic tagging, risk tagging, product attributes, and other cases where multiple things can be true at once. An example label set might look like:

  • user_asked_follow_up_question
  • user_not_satisfied
  • user_submitted_claim
  • user_filed_case_report
  • user_requested_human_support_agent

Avoid multilabel classifiers when the label set is really a hierarchy or when one label should logically exclude another. In those cases, a multilabel task can produce combinations that look valid syntactically but do not make sense operationally.

Multilabel classifiers can also be harder to score precisely, since label coincidence is not truly independent. For example, one label may be more frequently chosen when another label is also chosen. This means aggregate accuracy is often less useful than per-label precision, recall, and review of common co-occurrences.

Hierarchical

Hierarchical classifiers assign labels across levels of a taxonomy. For example, a support classifier might assign billing -> refund -> duplicate_charge, rather than choosing only one flat category.

Use hierarchical classifiers when the domain already has a meaningful parent-child structure. They are useful for support taxonomies, product taxonomies, policy categories, medical or legal issue trees, and other settings where broad categories break down into narrower subtypes.

There are two common implementation patterns:

  • Predict every level at once, such as category, subcategory, and root cause.
  • Predict stepwise, where the first classifier picks the parent category and later classifiers pick narrower children.

Open set

This is a special type of multiclass, or (typically) multilabel classifier that actually does not begin with a fixed (closed) set of labels that a model can choose from. Rather, it can suggest new labels and apply them when a label in the set previously doesn't apply.

Open set labeling is a special case of classification that requires delicate post-processing, and can be particularly risky as it carries more model-induced biases than a closed-set task.

Use open-set classifiers when the taxonomy is still evolving, or when you are mining a dataset to discover latent clusters before committing to a closed label set.

Avoid open-set classifiers in production workflows that require stable reporting, routing, or enforcement. A model that can invent labels can also invent slightly different versions of the same label, encode its own biases into the taxonomy, or create categories that are hard for humans to interpret.

Open-set outputs usually need post-processing. At minimum, you should review proposed labels, merge duplicates, normalize wording, and decide which new labels deserve to become part of the closed set.

The most common failure mode is treating model-suggested labels as ground truth. Open-set labeling is useful for discovery, but the taxonomy still needs human ownership.

For production use, open-set labels usually need expert review before becoming part of a stable taxonomy.

Ordinal / scoring

Ordinal classifiers assign one label from an ordered set. The labels are discrete, but their order matters.

Use ordinal classifiers when the task is not just asking what kind of thing something is, but how much of some property it has.

Common output shapes include:

  • low / medium / high risk
  • poor / fair / good / excellent
  • 1 / 2 / 3 / 4 / 5
  • not_urgent / somewhat_urgent / very_urgent

Avoid ordinal classifiers when the labels imply more precision than the model or rubric can support. A 1-5 (Likert) score is only useful if humans can reliably agree on what each number means.