If you have ever asked ChatGPT to "write some Python code" and received something that kind of works, you have experienced the difference between using an LLM and engineering with an LLM. Prompt engineering is not about clever phrasing — it is about interface design.

The Problem with Vague Prompts

Write a function to validate an email.

This prompt is ambiguous. Which language? What counts as valid? Should it check MX records or just regex? The model guesses — and its guess may not match your requirements.

Pattern 1: Structured System Prompts

Treat the system prompt as an API contract. Define role, constraints, output format, and examples.

You are a senior Python developer. 

TASK: Write a robust email validation function.

CONSTRAINTS:
- Use only the Python standard library
- Return a tuple: (is_valid: bool, error_message: str)
- Do NOT use regex for the full validation

OUTPUT FORMAT:
```python
def validate_email(email: str) -> tuple[bool, str]:
    ...
```

Pattern 2: Chain-of-Thought Reasoning

For complex logic, force the model to think step by step before answering. This dramatically improves accuracy on reasoning tasks.

Before writing any code, explain your approach:
1. What edge cases must this handle?
2. What standard library modules are relevant?
3. What is your validation strategy?

Then write the implementation.

Pattern 3: Output Schemas with JSON Mode

When building features that consume LLM output programmatically, unstructured text is a liability. Use JSON mode to enforce structure.

import openai

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Classify this email as spam or not spam."}],
    response_format={"type": "json_object"}
)

# Parse reliably
data = json.loads(response.choices[0].message.content)

Pattern 4: Few-Shot Examples

When consistency matters more than creativity, give examples of the desired input/output pairs.

EXAMPLES:
Input: "hello@example.com"
Output: {"valid": true, "domain": "example.com"}

Input: "not-an-email"
Output: {"valid": false, "reason": "missing @ symbol"}

Building a Prompt Pipeline

Real applications rarely use a single prompt. They use pipelines:

  1. Classifier prompt — determine user intent
  2. Router prompt — select the appropriate handler
  3. Generation prompt — produce the final output with context

Measuring Prompt Quality

Subjective evaluation does not scale. Track these metrics:

  • Parse rate — % of responses that match your expected schema
  • Accuracy — human-annotated correctness on a test set
  • Latency & cost — tokens per request and response time
  • Consistency — variance across multiple runs with the same input

Bottom Line

Prompt engineering is not a dark art. It is structured communication with a probabilistic system. The developers who treat it as interface design — with schemas, constraints, and validation — build AI features that actually ship.