Frontier model evals are quietly misleading anyone building in this region. A model that scores 92 on MMLU and 88 on a translation benchmark will still answer a customer-service query in Bahasa Indonesia in a register so wrong the user closes the chat.
The interesting work right now is not about model size. It is about the last mile of intent — accent, register, code-switching, and the half-dozen ways a Filipino seller will describe a sizing problem to a Vietnamese supplier through an English UI.
The defensibility surface
The teams pulling ahead are the ones that built a labeled eval set for their actual customers in months one and two, not month twelve. They fine-tune small models on it weekly. Their gross margins look better than the GPT-wrapper crowd because their token spend on retries is a third of the industry average.
"The model is rented. The eval set is the asset. Almost nobody pays attention to where that asset gets built."
If you are a founder in this region, the right question is not which model. It is which 2,000 conversations you are willing to label yourself before you hire your first ML engineer. That answer determines whether the product survives next year.

Dispatch Newsletter
One operator-grade breakdown every Tuesday. SEA tech, AI economics, execution frameworks. No fluff.


