Kaushik's Digital Garden

Large Language Models have fundamentally changed how we think about building software. But despite all the hype, there's surprisingly little practical guidance on how to actually integrate LLMs into production applications.

What We'll Cover

This guide focuses on practical patterns: prompt engineering, context management, RAG architectures, and error handling. No theoretical fluff – just real-world techniques you can use today.

Understanding the LLM API

At its core, working with an LLM is straightforward: you send a prompt, you get a response. But the devil is in the details. Let's look at a basic integration:

basic-llm.js

import OpenAI from class="text-emerald-class="text-amber-400">400">'openai';

class="text-pink-400">const openai = new OpenAI();

class="text-pink-400">async class="text-pink-400">function generateResponse(userMessage) {
  class="text-pink-400">const completion = class="text-pink-400">await openai.chat.completions.create({
    model: class="text-emerald-class="text-amber-400">400">'gpt-class="text-amber-400">4',
    messages: [
      {
        role: class="text-emerald-class="text-amber-400">400">'system',
        content: class="text-emerald-class="text-amber-400">400">'You are a helpful assistant.'
      },
      {
        role: class="text-emerald-class="text-amber-400">400">'user',
        content: userMessage
      }
    ],
    temperature: class="text-amber-400">0.class="text-amber-400">7,
    max_tokens: class="text-amber-400">1000
  });
  
  class="text-pink-400">return completion.choices[class="text-amber-400">0].message.content;
}

Prompt Engineering Patterns

The quality of your prompts directly determines the quality of your outputs. Here are some patterns that consistently produce better results:

structured-prompt.js

class="text-pink-400">const systemPrompt = `
You are an expert code reviewer. Your task is to analyze code 
and provide constructive feedback.

## Guidelines:
- Focus on correctness, performance, and readability
- Be specific about issues and provide examples
- Suggest improvements, don't just criticize
- Rate severity: Critical, Warning, or Suggestion

## Output Format:
Respond in JSON with this structure:
{
  class="text-emerald-class="text-amber-400">400">"summary": class="text-emerald-class="text-amber-400">400">"Brief overview",
  class="text-emerald-class="text-amber-400">400">"issues": [
    {
      class="text-emerald-class="text-amber-400">400">"severity": class="text-emerald-class="text-amber-400">400">"Warning",
      class="text-emerald-class="text-amber-400">400">"line": class="text-amber-400">42,
      class="text-emerald-class="text-amber-400">400">"description": class="text-emerald-class="text-amber-400">400">"...",
      class="text-emerald-class="text-amber-400">400">"suggestion": class="text-emerald-class="text-amber-400">400">"..."
    }
  ],
  class="text-emerald-class="text-amber-400">400">"score": class="text-amber-400">85
}
`;

Pro Tip

Always specify your output format explicitly. JSON schemas, markdown templates, or clear examples help the LLM produce consistently structured responses.

RAG: Retrieval-Augmented Generation

LLMs have knowledge cutoffs and can't access your specific data. RAG solves this by retrieving relevant context before generating a response:

rag-example.js

class="text-pink-400">async class="text-pink-400">function ragQuery(userQuestion) {
  // 1. Convert question to embedding
  class="text-pink-400">const embedding = class="text-pink-400">await getEmbedding(userQuestion);
  
  // 2. Search vector database for relevant docs
  class="text-pink-400">const relevantDocs = class="text-pink-400">await vectorDB.search(embedding, {
    topK: class="text-amber-400">5,
    threshold: class="text-amber-400">0.class="text-amber-400">7
  });
  
  // 3. Build context from retrieved documents
  class="text-pink-400">const context = relevantDocs
    .map(doc => doc.content)
    .join(class="text-emerald-class="text-amber-400">400">'\n\n');
  
  // 4. Generate response with context
  class="text-pink-400">return generateResponse(`
    Context:
${context}


    Question: ${userQuestion}

    Answer based only on the provided context.
  `);
}

Handling Errors Gracefully

LLM APIs can fail for many reasons: rate limits, timeouts, invalid responses. Robust error handling is essential for production applications.

Always Have a Fallback

Never let an LLM failure crash your application. Implement graceful degradation, retry logic, and meaningful error messages for users.

Building Applications with LLMs

What We'll Cover

Understanding the LLM API

Prompt Engineering Patterns

Pro Tip

RAG: Retrieval-Augmented Generation

Handling Errors Gracefully

Always Have a Fallback

Share this post

More from AI

Prompt Engineering Mastery

AI-Powered Development Tools

Machine Learning for Web Developers