Parallel Sampling

Perhaps our favorite trick for increasing consistency is the use of parallel sampling. This is simply setting a model's "n" sampling parameter to >1.

For example, setting n=10 on a classification task, and simply taking the majority vote from the results can eliminate the statistical odds of one random "bad" inferences.

It can add an inference cost penalty, but typically not linearly with n because it can reuse cached input tokens effectively. Typically open-source model providers and inference engines expose this parameter via API.

For multi-model voting, see routers and ensembles.