Building Applications with LLMs
A practical guide for developers
Large Language Models have fundamentally changed how we think about building software. But despite all the hype, there's surprisingly little practical guidance on how to actually integrate LLMs into production applications.
What We'll Cover
Understanding the LLM API
At its core, working with an LLM is straightforward: you send a prompt, you get a response. But the devil is in the details. Let's look at a basic integration:
import OpenAI from class="text-emerald-class="text-amber-400">400">'openai';
class="text-pink-400">const openai = new OpenAI();
class="text-pink-400">async class="text-pink-400">function generateResponse(userMessage) {
class="text-pink-400">const completion = class="text-pink-400">await openai.chat.completions.create({
model: class="text-emerald-class="text-amber-400">400">'gpt-class="text-amber-400">4',
messages: [
{
role: class="text-emerald-class="text-amber-400">400">'system',
content: class="text-emerald-class="text-amber-400">400">'You are a helpful assistant.'
},
{
role: class="text-emerald-class="text-amber-400">400">'user',
content: userMessage
}
],
temperature: class="text-amber-400">0.class="text-amber-400">7,
max_tokens: class="text-amber-400">1000
});
class="text-pink-400">return completion.choices[class="text-amber-400">0].message.content;
}Prompt Engineering Patterns
The quality of your prompts directly determines the quality of your outputs. Here are some patterns that consistently produce better results:
class="text-pink-400">const systemPrompt = `
You are an expert code reviewer. Your task is to analyze code
and provide constructive feedback.
## Guidelines:
- Focus on correctness, performance, and readability
- Be specific about issues and provide examples
- Suggest improvements, don't just criticize
- Rate severity: Critical, Warning, or Suggestion
## Output Format:
Respond in JSON with this structure:
{
class="text-emerald-class="text-amber-400">400">"summary": class="text-emerald-class="text-amber-400">400">"Brief overview",
class="text-emerald-class="text-amber-400">400">"issues": [
{
class="text-emerald-class="text-amber-400">400">"severity": class="text-emerald-class="text-amber-400">400">"Warning",
class="text-emerald-class="text-amber-400">400">"line": class="text-amber-400">42,
class="text-emerald-class="text-amber-400">400">"description": class="text-emerald-class="text-amber-400">400">"...",
class="text-emerald-class="text-amber-400">400">"suggestion": class="text-emerald-class="text-amber-400">400">"..."
}
],
class="text-emerald-class="text-amber-400">400">"score": class="text-amber-400">85
}
`;Pro Tip
RAG: Retrieval-Augmented Generation
LLMs have knowledge cutoffs and can't access your specific data. RAG solves this by retrieving relevant context before generating a response:
class="text-pink-400">async class="text-pink-400">function ragQuery(userQuestion) {
// 1. Convert question to embedding
class="text-pink-400">const embedding = class="text-pink-400">await getEmbedding(userQuestion);
// 2. Search vector database for relevant docs
class="text-pink-400">const relevantDocs = class="text-pink-400">await vectorDB.search(embedding, {
topK: class="text-amber-400">5,
threshold: class="text-amber-400">0.class="text-amber-400">7
});
// 3. Build context from retrieved documents
class="text-pink-400">const context = relevantDocs
.map(doc => doc.content)
.join(class="text-emerald-class="text-amber-400">400">'\n\n');
// 4. Generate response with context
class="text-pink-400">return generateResponse(`
Context:
${context}
Question: ${userQuestion}
Answer based only on the provided context.
`);
}Handling Errors Gracefully
LLM APIs can fail for many reasons: rate limits, timeouts, invalid responses. Robust error handling is essential for production applications.