If you have ever asked ChatGPT to "write some Python code" and received something that kind of works, you have experienced the difference between using an LLM and engineering with an LLM. Prompt engineering is not about clever phrasing — it is about interface design.
The Problem with Vague Prompts
Write a function to validate an email.
This prompt is ambiguous. Which language? What counts as valid? Should it check MX records or just regex? The model guesses — and its guess may not match your requirements.
Pattern 1: Structured System Prompts
Treat the system prompt as an API contract. Define role, constraints, output format, and examples.
You are a senior Python developer.
TASK: Write a robust email validation function.
CONSTRAINTS:
- Use only the Python standard library
- Return a tuple: (is_valid: bool, error_message: str)
- Do NOT use regex for the full validation
OUTPUT FORMAT:
```python
def validate_email(email: str) -> tuple[bool, str]:
...
```
Pattern 2: Chain-of-Thought Reasoning
For complex logic, force the model to think step by step before answering. This dramatically improves accuracy on reasoning tasks.
Before writing any code, explain your approach:
1. What edge cases must this handle?
2. What standard library modules are relevant?
3. What is your validation strategy?
Then write the implementation.
Pattern 3: Output Schemas with JSON Mode
When building features that consume LLM output programmatically, unstructured text is a liability. Use JSON mode to enforce structure.
import openai
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Classify this email as spam or not spam."}],
response_format={"type": "json_object"}
)
# Parse reliably
data = json.loads(response.choices[0].message.content)
Pattern 4: Few-Shot Examples
When consistency matters more than creativity, give examples of the desired input/output pairs.
EXAMPLES:
Input: "hello@example.com"
Output: {"valid": true, "domain": "example.com"}
Input: "not-an-email"
Output: {"valid": false, "reason": "missing @ symbol"}
Building a Prompt Pipeline
Real applications rarely use a single prompt. They use pipelines:
- Classifier prompt — determine user intent
- Router prompt — select the appropriate handler
- Generation prompt — produce the final output with context
Measuring Prompt Quality
Subjective evaluation does not scale. Track these metrics:
- Parse rate — % of responses that match your expected schema
- Accuracy — human-annotated correctness on a test set
- Latency & cost — tokens per request and response time
- Consistency — variance across multiple runs with the same input
Bottom Line
Prompt engineering is not a dark art. It is structured communication with a probabilistic system. The developers who treat it as interface design — with schemas, constraints, and validation — build AI features that actually ship.
Comments (0)
No comments yet. Be the first!
Leave a Comment