This post was written by Daniela Shikhmakher, Data Engineer at Start.io.

LLMs are reshaping how organizations process and leverage unstructured text data. From intelligent chatbots to automated content generation and advanced analytics, LLMs are rapidly becoming fundamental components of modern data stacks. For data engineering teams, LLMs create a major opportunity – they allow us to reduce human-in-the-loop workflows and automate complex semantic tasks. On the other hand, we realize that the challenge of building production AI systems is not just about calling a model API, it’s about engineering repeatable and reliable infrastructure. 
 
This article describes how we evolved from a single LLM-based classification project into building our own internal utils package and where generic abstraction helped us, and where it nearly slowed us down. 

The Initial Use Case: A Single Classification System 
We began with a focused goal: develop an LLM-based classification framework, which will automatically classify daily ingested data into internal audience segmentation taxonomies. 

The architecture included: 

  • Data processing layer 
  • Caching mechanisms 
  • RAG (Retrieval-Augmented Generation) component 
  • Output parsing and validation 
     

At first, everything was in a single repository with one configuration and one business problem, and it worked well. 

The Pattern Emerges 
Once there were more classification use cases, each project had different taxonomies, label data, business rules, and evaluation metrics. But they still have the same core requirements that demand a solution by an A LLM-based framework. 

The Next Level of Abstraction 
There are already tools that abstract across models and providers, such as LiteLLM and Amazon Bedrock. These tools address model abstraction by providing a unified interface for interacting with multiple LLMs. This allows developers to switch models or providers without rewriting large portions of their integration code. Our problem was different – We needed to abstract across classification systems. 
  

The Decision: Build a Reusable Utilities Layer 
We decided to build our own utilities package not to create a full framework, but to avoid repeating the same setup code across classification tasks. Before starting development, we asked what stays consistent across projects and decided to generalize only those stable parts.  
  

What We Chose to Generalize 
We identified reusable components that consistently appeared in every classification system. 

1. Classification Mechanics – we needed to support 2 types of classification: single-class and multi-class. 

2. Prompt Management – across projects, we required: 

  • System / user / assistant message formatting 
  • Dynamic few-shot / one-shot / zero-shot 
  • Injection into the user prompt 
  • Controlled prompt construction patterns 

While prompt content varies by domain, the mechanics of building prompts are consistent. 

3. LLM Response Parsing – we repeatedly implemented: 

  • Extracting structured classification from raw LLM responses 
  • Parsing single-class outputs (e.g., returning a single label ID) 
  • Parsing multi-class outputs (e.g., structured JSON objects) 
  • Result selection strategies for handling multiple classifications 
  • Output validation 

Reliable structured output, like valid JSON or clean label IDs, is essential for all projects and should exist in a utils layer. 

4. Cost Estimation and Observability – every project required cost and token usage tracking. 

What We Explicitly Did Not Generalize 
Equally important was defining what should remain at the project level. 
We did not abstract: 

  • Label loading logic (each project sources labels differently) 
  • Classification taxonomies (categories, hierarchies, segmentation rules) 
  • Domain-specific workflows (offline batch processing vs. real-time applications) 

These components evolve with business needs. Freezing them behind a generic interface would reduce flexibility. 

When the Generic Code Started to Hurt 
The first version of our utilities layer was helpful. Then we pushed abstraction too far. 

  1.  Configuration Explosion 
    We attempted to make everything configurable. The configuration object grew increasingly complex. Eventually, only the developers who built the system fully understood how to configure it correctly. When configuration becomes a domain language of its own, abstraction has gone too far. Instead of simplifying development, it increased the complexity. 
  2. Model and API Volatility 
    Between projects, new LLM models appeared, APIs evolved, and versions changed. When the abstraction layer is too generic, this volatility hits hard: frequent model and API changes break the shared code, force constant updates, and make the system fragile and difficult to maintain. 
  3. Each Task Needs Its Own Metrics 
    We first built one generic Test Grader, but it was brittle and hard to maintain because single-class and multi-class projects use very different metrics, and the evaluation also depends on the specific task, making one grader too complicated for all cases. 

The Real Cost of Over-Generic Code 
Over-abstraction can create real engineering problems. As our utilities layer became more generic, even small changes required navigating multiple layers of code, making debugging less transparent and demanding deep internal knowledge to understand system behavior. What was meant to speed up development ended up slowing it down, adding a minor feature felt heavier than it should have. In this case, generic code became a constraint rather than an enabler. 
  

Our Solution 
To solve the problems caused by over-generic code, we adopted a clear architectural principle: abstract the infrastructure and keep business logic concrete. Our shared utilities layer focuses only on stable, reusable components LLM interaction, prompt formatting, output parsing, multi-class handling, cost estimation, and logging. Everything that varies by project – like taxonomy definitions, domain workflows, and evaluation methods remains in the application itself. This separation allows us to reuse common infrastructure without forcing rigid, one-size-fits-all solutions, keeping each project flexible and maintainable. 

 Tools like LiteLLM and Amazon Bedrock make it easy to switch models and handle API differences. Our utils package solves classification framework reuse.  

Both are abstraction layers – but at different levels. One operates at the model provider level. The other operates at the application infrastructure level. 

Recognizing the correct abstraction axis is critical. 

Abstract the wrong layer, and you build unnecessary complexity. 

Abstract the right layer, and you eliminate duplication without sacrificing agility. 

Conclusion 
LLMs are transforming how organizations use unstructured data, but building LLM-based framework is not just about models – it’s about sustainable infrastructure. Just as tools like LiteLLM make it easy to switch between models, our utilities package lets us switch between classification tasks easily, without rewriting common components. Abstraction must be intentional: too little leads to duplication, too much slows innovation. The key is focusing on what’s truly stable and resisting the urge to generalize everything else.