Eight prompt strategies for an 11.68:1 aspect ratio
An A/B testing harness wired into the kiosk admin: same scene description × 8 prompt approaches × 3 expanded variants = 24 candidate images per test. Each approach attacks the same problem from a different angle (panoramic mural, drone overhead, letterbox, tilt-shift, mini-figures, etc.). Approach 8 — the production winner — stacks vague language, explicit numerics, and complete-bodies framing into one prompt. A separate gpt-4o-mini layer expands user input into 3 variants under a strict word-preservation rule.
Eight strategies, one switchable parameter
The kiosk admin's ?promptApproach=N URL parameter selects which strategy generates the prompt. Same scene description, same expansion, eight different composition philosophies. Approach 8 won because it stacks three independent constraints in one prompt.
| # | Strategy | Mental model |
|---|---|---|
| 1 | Control (no constraints) | Baseline — naive prompt for diffing against |
| 2 | EWS + numeric crop band | "Subjects in middle distance, vertically centered" |
| 3 | Panoramic mural / Bayeux Tapestry | Frame the AI as painting a horizontal frieze |
| 4 | High-overhead drone (45°) | Ground plane fills 80%, subjects shrink naturally |
| 5 | Letterbox display | Tell model the format is letterboxed; content lives in the band |
| 6 | Tilt-shift diorama | "Detailed miniature… razor sharp across horizontal center" |
| 7 | Mini — small distant figures | Pure size language — "hundreds of yards away" |
| 8 ✓ | Combined (production winner) | Mini's size words + grouping + explicit 43%–57% band |
Six things that make this real prompt engineering
Switchable harness, not opinion
Eight strategies hidden behind a single promptApproach parameter. Same scene × 8 × 3 expansions = 24 candidates per test. Selected the production strategy from real comparison data, not from guesses about what "should" work for an ultra-wide aspect.
Stacked constraints beat any single technique
Approach 8 wins because it layers three independent biases: vague language ("tiny and distant, hundreds of yards away") to bias the latent space, explicit numerics ("between 43 percent and 57 percent of image height") to anchor placement, and complete-bodies language to keep the group cohesive.
snippet 01Reusable constants — improvements propagate
SCENE_STYLE locks aesthetic across all 8 ("professional custom automotive airbrush illustration… not photographic, not cartoon, not anime"). IDENTITY_PRESERVATION lists what must survive (faces, hairstyles, decals, body proportions). NEGATIVE_TAIL is the negative-prompt guardrail.
snippet 02Word-preservation rule on the expansion model
gpt-4o-mini sits in front of the image generator and expands one user line into 3 variants. Without an explicit rule, the model softens user intent ("neon green" → "bright green"). The system prompt is load-bearing: "copy their exact adjectives word-for-word."
snippet 03Structured-output validation, not vibes
Expansion returns JSON with exactly 3 { title, description } objects. Zod-style runtime validation rejects malformed responses; user can fall back to their raw description after 3 failed expansions. The model has one job and is held to it.
"Intended use" sentence in the negative-prompt tail
Every approach ends with: "…intended use: a custom automotive LED light bar insert — high-impact 51.4-inch panoramic display." Surprisingly nudges the model toward more dramatic, display-aware compositions. Cheap to include, measurable lift across all 8.
snippet 02Two-stage prompt pipeline
3 × { title, description }APPROACHES[1..8] · switchable via ?promptApproach=NSCENE_STYLE + IDENTITY_PRESERVATION + approach-specific composition + NEGATIVE_TAILThree snippets, in execution order
Real excerpts from app/api/generate-insert/route.ts (the 8 approaches), lib/promptConstants.ts (the reusable constants), and app/api/expand-prompt/route.ts (the word-preservation expansion). Reading order: catalog the strategies → see the shared building blocks → see the rule that protects user intent.
Eight strategies under one switchable parameter
Each approach is a function that composes the four shared blocks (SCENE_STYLE, IDENTITY_PRESERVATION, approach-specific composition, NEGATIVE_TAIL) into a single image-model prompt. A ?promptApproach=N URL parameter selects which one runs. Approach 8 — the production winner — stacks three independent biases in one prompt; that's the source of its lift over single-technique approaches.
// app/api/generate-insert/route.ts
import { SCENE_STYLE, IDENTITY_PRESERVATION, NEGATIVE_TAIL } from "@/lib/promptConstants";
type ApproachId = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8;
const APPROACHES: Record<ApproachId, (subject: string, scene: string) => string> = {
// 1 — Control. Naive prompt; baseline for diffing against.
1: (subject, scene) =>
`${subject} in ${scene}. ${SCENE_STYLE} ${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
// 2 — EWS + numeric crop constraint. "Establishing wide shot, vertically centered."
2: (subject, scene) =>
`Establishing wide shot of ${subject} in ${scene}. Subjects positioned in the middle
distance, vertically centered. ${SCENE_STYLE} ${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
// 3 — Panoramic mural / Bayeux Tapestry. Reframes the model's task as "paint a frieze."
3: (subject, scene) =>
`A panoramic horizontal mural in the style of a Bayeux Tapestry register: ${subject}
within ${scene}, rendered as a continuous narrative band across the frame.
${SCENE_STYLE} ${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
// 4 — High-overhead drone (45°). Ground plane fills 80%, subjects shrink naturally.
4: (subject, scene) =>
`High-overhead drone view (45-degree angle) of ${subject} in ${scene}. Ground plane
fills approximately 80 percent of the frame; subjects appear small from this elevation.
${SCENE_STYLE} ${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
// 5 — Letterbox display. "Format is letterboxed and content lives in the band."
5: (subject, scene) =>
`A letterbox display showing ${subject} in ${scene}. The format is letterboxed and all
subject content lives within the central horizontal band. ${SCENE_STYLE}
${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
// 6 — Tilt-shift diorama. Miniature world, razor sharp across the horizontal center.
6: (subject, scene) =>
`A detailed miniature tilt-shift diorama: ${subject} arranged within ${scene}, razor
sharp across the horizontal center, soft falloff above and below. ${SCENE_STYLE}
${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
// 7 — Mini / distant figures. Pure size language: "hundreds of yards away."
7: (subject, scene) =>
`${subject} appearing tiny and distant within ${scene}, like seeing them from hundreds
of yards away. ${SCENE_STYLE} ${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
// 8 — PRODUCTION WINNER. Stacks three independent biases:
// - vague size language ("tiny and distant, hundreds of yards away")
// - explicit numeric crop constraint ("43%–57% of image height")
// - complete-bodies-grouping language (head to feet, all together at center)
8: (subject, scene) =>
`Everything from the reference photo appears tiny and distant, far away in the
landscape, like seeing it from hundreds of yards away. ${subject} are all close
beside each other at the center of the frame, complete head to feet, roof to tires.
All important content fits between 43 percent and 57 percent of the image height.
The scene is ${scene}. ${SCENE_STYLE} ${IDENTITY_PRESERVATION} ${NEGATIVE_TAIL}`,
};
export function buildPrompt(
variant: { subject: string; scene: string },
approach: ApproachId = 8,
): string {
return APPROACHES[approach](variant.subject, variant.scene)
.replace(/\s+/g, " ")
.trim();
}Improvements propagate across all 8 strategies
Each approach composes the same four blocks. SCENE_STYLE locks the aesthetic. IDENTITY_PRESERVATION is the explicit list of what must survive intact. NEGATIVE_TAIL is the guardrail. Editing any one of these improves all 8 approaches simultaneously — that's the whole point of the structure.
// lib/promptConstants.ts
/**
* Aesthetic lock — the same across all 8 approaches. Prevents the model
* from drifting into photographic realism, cartoon, or digital painting
* styles that don't match the airbrushed automotive product.
*/
export const SCENE_STYLE = `
Professional custom automotive airbrush illustration. Vibrant high-saturation colors,
visible brush technique, soft glow on highlights, subtle gradient backgrounds.
NOT photographic realism. NOT cartoon. NOT anime. NOT digital painting. NOT 3D render.
Render style: hand-painted automotive mural with airbrushed depth.
`.trim();
/**
* Explicit list of what MUST survive into the generated image. Without this,
* the model "improves" subjects by reshaping faces, dropping decals, or
* changing brand markings — all of which break the use case (a customer's
* own vehicle on their own light bar).
*/
export const IDENTITY_PRESERVATION = `
Preserve all of the following from the reference photo intact:
faces, hairstyles, fur and markings on pets, clothing, accessories,
body proportions, poses, expressions, vehicle make and model,
visible logos and brand marks, decals, text, paint colors and finishes.
Cast realistic shadows consistent with the scene's lighting direction.
`.trim();
/**
* Negative-prompt tail. The "intended use" sentence at the end was a
* surprise win — measurably nudges the model toward more dramatic,
* display-aware compositions across every approach.
*/
export const NEGATIVE_TAIL = `
Do not add any text, words, watermarks, borders, logos, or signatures.
No frames, no captions, no UI overlays.
Intended use: a custom automotive LED light bar insert —
high-impact 51.4-inch panoramic display.
`.trim();Word-preservation rule + structural validation
Without an explicit rule, gpt-4o-mini softens user input ("upside down pineapples on fire" → "unique pineapple themed scene"). That drift cascades — the image model then misses what the customer actually asked for. The system prompt is load-bearing: it forces the model to copy adjectives word-for-word into all 3 variants. Then a Zod schema rejects malformed responses; after 3 retries the user can bypass and use their raw description.
// app/api/expand-prompt/route.ts
import OpenAI from "openai";
import { z } from "zod";
const ExpansionSchema = z.object({
variants: z
.array(
z.object({
title: z.string().min(2).max(80),
description: z.string().min(10).max(280),
}),
)
.length(3),
});
const SYSTEM_PROMPT = `You expand a customer's one-line scene description into
exactly 3 fuller scene options for an AI image generator.
CRITICAL — WORD PRESERVATION RULE:
Preserve EVERY word the customer used. Copy their exact adjectives, modifiers,
and descriptors into ALL 3 options word-for-word. If they said
"upside down pineapples", write "upside down pineapples" — NOT just "pineapples"
and NOT "unusual pineapple-themed scene." Do not soften, generalize, or
"improve" their language. Their words are load-bearing.
Each option should:
- Add atmosphere, environment, and mood AROUND the customer's exact words
- Stay 1–2 sentences
- Be visually concrete (lighting, surroundings, time of day)
- Never invent subjects the customer didn't mention
Return JSON: { "variants": [ {"title", "description"}, ... ] } with exactly 3 entries.`;
export async function POST(req: Request) {
const { rawDescription } = await req.json();
const openai = new OpenAI();
const MAX_ATTEMPTS = 3;
let lastError: unknown = null;
for (let attempt = 1; attempt <= MAX_ATTEMPTS; attempt++) {
try {
const completion = await openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 1.0,
response_format: { type: "json_object" },
messages: [
{ role: "system", content: SYSTEM_PROMPT },
{ role: "user", content: rawDescription },
],
});
const json = JSON.parse(completion.choices[0].message.content ?? "{}");
const parsed = ExpansionSchema.safeParse(json);
if (!parsed.success) {
lastError = parsed.error;
continue;
}
// Bonus check: every variant must include each significant word from the
// customer's input. Catches the model's most common cheat — paraphrasing
// adjectives into synonyms.
const significantWords = extractSignificant(rawDescription);
const allPreserved = parsed.data.variants.every((v) =>
significantWords.every((w) => v.description.toLowerCase().includes(w)),
);
if (!allPreserved) {
lastError = new Error("word-preservation check failed");
continue;
}
return Response.json({ variants: parsed.data.variants, attempts: attempt });
} catch (err) {
lastError = err;
}
}
// Graceful degrade — after 3 failures the kiosk UI offers the user
// a "skip expansion, use my exact words" path.
return Response.json(
{ error: "expansion_failed", canUseRaw: true, lastError: String(lastError) },
{ status: 502 },
);
}
function extractSignificant(input: string): string[] {
// Strip stop words; lowercase. Keeps "upside", "down", "pineapples", "fire".
const STOP = new Set(["a", "an", "the", "of", "in", "on", "at", "and", "or", "to", "with"]);
return input
.toLowerCase()
.replace(/[^a-z0-9\s'-]/g, "")
.split(/\s+/)
.filter((w) => w.length > 2 && !STOP.has(w));
}Source
Excerpts from app/api/generate-insert/route.ts, lib/promptConstants.ts, and app/api/expand-prompt/route.ts in the kiosk configurator. The full setup also includes a kiosk admin route (app/design/steps/StepGenerateTest.tsx) that runs all 8 approaches × 3 variants = 24 images side-by-side for live A/B comparison — that's how Approach 8 was selected from real data, not a guess. Pairs with the AI Pipeline case study (variance scoring, 7-level fallback, opentype.js text burn-in). Happy to walk through the harness on a call.