Blogs

Blogs

Blogs

Blogs

Blogs

In Spotlight

Technology

September 23, 2023

Introducing: Specialized Model Avataars (SMAs)

Feb 10, 2025

The age of Agentic AI is upon us — where multiple smart, collaborative agents team up to tackle complex workflows across industries. But there’s a catch: most of these agents run on massive language models (LLMs) that come with eye-watering compute costs and painfully slow response times.

At Avataar.ai, we believe there’s a better way forward Specialized Model Avataars (‘SMAs’). These are domain-specific & task-driven avataars of distilled small models built to serve Agentic AI applications via a “mixture-of-experts” strategy. For businesses and innovators alike, that spells game-changing savings in both cost and time — an ability to rival traditional LLM accuracy at a fraction of the inference costs.

In simple terms, SMAs can manage multiple task-appropriate specialized models, and provide the ability to cover many tasks while still keeping each expert lean.

September 23, 2023

Let’s Start with ‘TSMs’ Our Recent Technical Findings

In a nutshell, Tiny Specialized Models (‘TSMs’) are domain-specific & task-driven variants of distilled small language models, engineered for domain-specific problems, that can provide traditional LLM accuracy at a fraction of the cost and inference time.

To dig deeper, you can read more:
Github Code | Technical Report (coming soon!)

Our Journey

Over the past years, we’ve been perfecting Agentic AI solutions for the retail industry — where our customers demand blazing-fast response times and cost-effective infrastructure. Running enormous LLMs quickly became a bottleneck. The overhead of large-scale training and serving just wasn’t sustainable, especially for high-frequency use cases like real-time customer interactions and inventory management.

So we asked ourselves: Could we achieve the same capabilities packed within a domain-specific and task-specialised distilled language model, and how miniaturized can this model be? And that is how we arrived at TSMs.

How We Tested the TSM Concept

Choosing a Target Problem

We selected the specialized task of solving differential polynomials — a complex but well-defined area that works well for stress-testing an AI model’s capacity learn domain-specific tasks e.g. a math teacher for differentiation.

Small (But Mighty) Model

We started choosing 4 distilled models of varying sizes — SmolLM-135M-Instruct (135 million parameters), SmolLM-360M-Instruct (360 million parameters), Qwen-0.5B-Instruct (494 million parameters) and Qwen-Math-1.5B-Instruct (1.5 billion parameters). Our intention was to see how miniaturized can these models get, while still rivalling the accuracy levels of LLMs.

Using a dataset of differential polynomial problems, we first used an LLM (the teacher model) on a dataset of inputs to get candidate outputs, and filter them using a score function, to get a filtered dataset of input-output pairs with Chain-of-Thought. We then fine-tuned the distilled small models, on these filtered datasets using a next-token-prediction cross entropy loss.

The SmolLM models have a near 0% baseline accuracy on the task, which made it more interesting. To train these, we generated outputs on the entire set of initial examples using Qwen-Math-1.5B-Instruct, which resulted in a dataset of 300,000 input-output pairs after filtering. Subsequently, we used supervised fine-tuning with the same hyper parameters.

Finally, for all models, we fine-tuned the model further using Reinforcement Learning, re-using the score function as a reward function.

Cost & Accuracy

The Supervised fine-tuning & Reinforcement learning yielded accuracy rates of 99.1% for Diff-TSM-Qwen-1.5B, and 97.6% for Diff-TSM-Qwen-0.5B, rivalling the best-in-class large language models (e.g. OpenAI o1 at 95.1% accuracy). To our delight, Diff-TSM-SmolLM-360M and Diff-TSM-SmolLM-135M could also achieved 97.3% and 95.4% accuracy levels, again rivalling the LLMs.
Diff-TSM-Qwen model trainings took ~2.5 hours each on a single H200 GPU, costing only ~$10 per model. Diff-TSM-SmolLM model trainings took 3.7 - 4.6 hours, costing $15 - $20 per model.

Benchmarked Results

We went on to evaluate the performance of base-line models up to 8B parameters, including those trained for the math domain without task specificity, and OpenAI’s o-series of models as representatives of large “intelligent” LLMs. Our Diff-TSMs could rival the performance of OpenAI o1, while packaging comparable performance in a significantly smaller model size.

Below figure plots the Cost per million tokens (in $, as a line chart; lower is better) alongside accuracy (in %, as a bar chart; higher is better). Our models, showcase accuracy levels that are comparable to the LLMs, at a fraction of the inference costs inline with the distilled models. In terms of cost, OpenAI o1 is ~580 times costlier to our Diff-TSM-SmolLM-135M, for a similar task-specific accuracy.

Accuracy + Cost benchmarking

Figure 3: Performance comparison of models on the differentiation task. The chart illustrates accuracy in the bar chart (higher is better, in %) and Cost per million tokens in the line chart (lower is better, in $s).
Deepseek-Distill models are evaluated with a longer 10k response length to accommodate their extremely long chain-of-thoughts.


Future Work: Our plans ahead

Evolve TSMs into ‘SMAs’:

Domain-specific & task-driven avataars of distilled models built to serve Agentic AI applications via a “mixture-of-experts” strategy. We are investigating ways to combine multiple small TSMs to cover a broader range of tasks which is emerging to be an exciting direction. For instance, a collection of expert models (individual TSM) could be coordinated e.g. with a gating mechanism that routes queries to the appropriate task-driven specialist, and this mixture-of-experts strategy could cover many tasks while still keeping each expert lean.

Scalable generation of agent-specific SMAs:

Driven by recent advancements, we are getting closer to putting together a structured way to user domain-specific and task-driven user prompts to specify domain and task oriented guard-rails or boundaries i.e. the trajectory guardrails, and then be able to conduct a data-free knowledge distillation workflow to create specialized models (student models) that can rival the accuracy by learning from traditional LLMs (teacher models). This is a powerful enabler in helping create custom domain-specific & task-driven “avataars” of distilled models that rival traditional LLMs. Each domain might present unique challenges, but the general methodology of data-free knowledge distillation approach coupled with reinforcement learning with automated rewards can be applied.

Additional data-driven KD for available data:

While data-free distillation limits TSMs to match the accuracy of their teacher LLM models, enhancing the flow with data driven KD expands the promise of being able to beat the accuracy of the teacher LLMs. With recent advancements, there are multiple approaches that help reduce the amount of labelled ground truth data needed. Where available, adding highly structured & curated ground truth data to KD allows TSMs to surpass the accuracy of LLMs.

Improving the RL fine-tuning layer:

Our Supervised fine-tuning & reinforcement learning techniques can be enhanced with more advanced reward schemes or human-in-the-loop feedback which will further enhance the capabilities of SMAs. Techniques like curriculum learning (starting with simple tasks and gradually increasing difficulty) will help the model master more complex expressions.

Push the boundaries of how compact individual TSMs can get:


We are also interested in exploring the limits of how small a model can be for a given task — pushing the boundary of model minimization while maintaining performance. Thanks to their compact footprint, TSMs can now hope to aspire to live on on-premise servers or even edge devices in future — further driving down operational costs and improving response times.


The Big Picture: Implications

Universal Applicability

In Customer Support, TSMs can triage incoming tickets, generate knowledge-base articles, and resolve routine inquiries. In Finance, they can expedite fraud detection, assist with regulatory compliance, or deliver personalized investment insights. Healthcare would benefit from automated patient data management, clinical documentation assistance, and even support for medical research. Meanwhile, HR functions can leverage TSMs for candidate screening, onboarding, and ongoing employee engagement. In Supply Chain and Logistics, they can analyze complex data to optimize routes, manage inventory, and forecast demand. Marketing teams could use TSMs to craft targeted campaigns, conduct sentiment analysis, and generate compelling promotional content. Whether for Retail, Manufacturing, Insurance, or Software Development, TSMs can be retrained to tackle virtually any domain-specific challenge at par with the traditional LLMs, making them indispensable in today’s Agentic AI revolution.

Slashing Overheads and Making your Data a Defensible Moat

With TSMs, you’re no longer bound to colossal LLM frameworks. Instead, you can enjoy near-instant inference and minimal infrastructure costs.
And wait, your data is yours, the new emerging defensible moat!

Phew, SaaS is not Dead

While the Agentic AI revolution was considered to be a disruption for SaaS Industry until now, TSMs offer them a strong right to win by capitalizing on their domain specific data, and build their own domain-specific TSMs. We have found ways now at Avataar.AI Labs to do this at scale. More to come on this from our labs, keep watching the space!

Edge Deployments

Thanks to their compact footprint, TSMs can now aspire to live on on-premise servers or even edge devices in near future — further driving down operational costs and improving response times.


A Shout-Out to Our Open-Source Partners & Community

We can’t overstate the importance of open-source collaboration in these findings. Our heartfelt thanks go out to our friends in the community for their open-source frameworks that sparked the creativity and lateral thinking behind TSMs. To give back, we’re releasing our pre-trained model with weights, evals, and benchmarks code so that the entire AI community can build on this foundational direction and democratize intelligence even further.

A Shout-Out to All Our Gurus Over the Past Decade

Special thanks to our gurus (kind mentors) who have guided us over the past decade of our R&D journey. Thank you!


Final Thoughts

The future of AI doesn’t have to be dominated by behemoth models. Using the TSM approach, we’re proving that miniaturized models can be just as powerful — and far more practical. TSMs open up a world of possibilities for any enterprise looking to harness the power of Agentic AI without breaking the bank.
Stay tuned for upcoming findings, deeper technical dives, and our continued efforts to push the boundaries of what these models can do. If you’ve been searching for an alternative to massive LLMs, look no further — SMAs are here, and they’re set to revolutionize how we think about AI agents.

To dig deeper, you can read more:
Github Code | Technical Report (coming soon!)

For inquiries or a deeper discussion, write to us at contactus@avataar.ai

Augmented Reality

September 23, 2023

Transforming Customer Experience: Success Story of India's Largest Furniture Marketplace

Read More →

Podcast

September 23, 2023

Read More →