Good Extractor Design | Sutro Handbook

Like most topics in this handbook, you should err on the side of simplicity, atomicity, and task decomposition when building extractors. Like other decision models, you'll want them staying focused on one decision path at a time if budget and architecture can afford it.

We'll break down what field types should be allowed, how you should try to scope them, and what a task should look like in general.

Property	Recommendation	Rationale
Scope	When possible, defer to atomicity. That means fewer fields to extract, and less overall decision-making by the model.	Overloading models with too many tasks at once yields inconsistent results, and often forces the need for a larger model. Splitting an extraction task up may allow for overall greater throughput and accuracy.
Schema	Prefer enums, and closed-set fields when possible. If using free-form text, prefer shorter rather than longer spans (1-2 sentences max if possible).	Using closed-set field types simplifies verification.
Task orientation	Prefer "contained within" tasks that locate text and data already within the input. If you are making an inference about something contained, a classification task may make more sense. If you're summarizing or abstracting information from an input, a structured extraction task is not really the correct primitive to be using.	Again, using contained-within data is typically easier for verification. Abstractive summaries are harder to verify for correctness.
Missingness	Define when the model should return `null`, `unknown`, or `not_applicable`, and do not force a value when the evidence is absent.	Many extraction failures are hallucinated values caused by schemas that require an answer even when the input does not contain one.
Use of reasoning	If the task is decision-oriented, having the model emit a structured scratchpad of its decision rationale can improve accuracy and auditability.	As mentioned in the intro, extractors are often decision problems. By giving the model a scratchpad, it has something to self-reference and provides a window into potential failure modes as a developer.
Support gathering	It can be helpful to force a model to cite where it found information. If documents are long, this can help with verification of task accuracy.	Again, improvement of verification and auditability.