Zen Agent: Generation, Evaluation, and Feedback Loop Documentation
Overview
The Zen Agent system implements an iterative refinement loop using three core agents that work together to generate, evaluate, and improve content through multiple iterations. This document explains how these agents interact and how the loop is orchestrated.
Core Agents
1. ArtifactAgent (src/lib/server/zen-agent/agents/artifactAgent.ts
)
The ArtifactAgent is responsible for generating or refining content based on:
- Current artifact state
- Feedback from previous iterations
- Configuration context and criteria
Key Function:
createArtifact(
artifact: string, // Current artifact (or "No {type} yet." for first iteration)
feedback: string, // Feedback to incorporate
config: AgentConfig // Configuration with context, criteria, schema
): Promise<string>
Process:
- Takes the current artifact and feedback as input
- Generates a prompt incorporating the feedback and ranking criteria
- Either generates with schema validation or raw text generation
- Returns the new/refined artifact
2. EvaluationAgent (src/lib/server/zen-agent/agents/evaluationAgent.ts
)
The EvaluationAgent compares two artifacts and determines which is better:
Key Function:
evaluateArtifacts(
firstArtifact: string, // Original/old artifact
secondArtifact: string, // New/refined artifact
config: AgentConfig // Configuration with evaluation criteria
): Promise<string> // Returns JSON with result: "First", "Second", or "Tie"
Evaluation Schema:
{
result: "First" | "Second" | "Tie",
explanation: string
}
Process:
- Compares two artifacts based on ranking criteria
- Uses structured output (JSON schema) for consistent evaluation
- Returns which artifact is better with explanation
- Includes parseResult() helper for models without JSON support
3. FeedbackAgent (src/lib/server/zen-agent/agents/feedbackAgent.ts
)
The FeedbackAgent provides constructive criticism to guide improvements:
Key Function:
createFeedbackPrompt(
artifact: string, // Current artifact to critique
config: AgentConfig // Configuration with evaluation criteria
): Promise<string> // Returns detailed feedback
Process:
- Analyzes the current artifact against ranking criteria
- Provides demanding, precise, constructive criticism
- Focuses on areas for improvement
- Returns actionable feedback for the next iteration
The Generation-Evaluation-Feedback Loop
Basic Loop Implementation (src/routes/api/(protected)/zen-agent/jobs/+server.ts
)
The main job execution endpoint implements the core loop:
// Simplified loop structure
for (let i = 1; i <= iterations; i++) {
// 1. GENERATION: Create new artifact incorporating feedback
const newArtifact = await createArtifact(artifact, feedback, config);
// 2. EVALUATION: Compare old vs new artifact
const evaluation = await evaluateArtifacts(artifact, newArtifact, config);
const result = JSON.parse(evaluation).result;
// 3. SELECTION: Keep better artifact
if (result === 'Second') {
artifact = newArtifact; // New is better, replace
}
// If 'First' or 'Tie', keep current artifact
// 4. FEEDBACK: Generate feedback for next iteration
feedback = await createFeedback(artifact, config);
}
Key Loop Characteristics
- Iterative Refinement: Each iteration builds on the previous result
- Competitive Selection: Only improvements are kept (survival of the fittest)
- Guided Evolution: Feedback directs the next generation attempt
- Configurable Iterations: Typically 2-10 iterations depending on use case
Production Implementation
The production system uses the implementation in /src/routes/api/(protected)/zen-agent/jobs/+server.ts
. This is the only orchestration pattern actively used in the codebase.
Core Production Loop
// From jobs/+server.ts - The actual implementation used in production
for (let i = 1; i <= iterations; i++) {
// 1. GENERATION: Create new artifact
const newArtifact = await createArtifact(artifact, feedback, config);
// 2. EVALUATION: Compare artifacts
const evaluation = await evaluateArtifacts(artifact, newArtifact, config);
// 3. PARSING: Handle different model response formats
let result = '';
if (modelHasNoJSONSupport()) {
result = await parseResult(evaluation);
} else {
const cleanedEvaluation = stripMarkdownCodeBlock(evaluation);
const parsedEval = JSON.parse(cleanedEvaluation);
result = parsedEval.result;
}
// 4. SELECTION: Keep better artifact
if (result === 'Second') {
artifact = newArtifact;
}
// 5. FEEDBACK: Generate for next iteration
feedback = await createFeedback(artifact, config);
}
Special Case: Single Iteration
When iterations = 1
, the system skips evaluation and feedback:
if (iterations === 1) {
// Direct generation without refinement loop
const newArtifact = await createArtifact(artifact, feedback, config);
// Save and return immediately
await pool.query(`UPDATE ${tableName} SET ${columnName} = $1 WHERE id = $2`, [
{ status: 'completed', result: newArtifact },
recordId
]);
return newArtifact;
}
Experimental Patterns (Not Used in Production)
The codebase contains several experimental orchestration patterns that are not currently integrated into the production system. These are located in /src/lib/server/zen-agent/systems/
and include:
zenAgents.ts
: Simple sequential loop for testingevolution/
: Population-based evolutionary algorithmstasks/subtasks.ts
: Nested refinement for multi-part tasksproposal/
: Various proposal-specific refinement strategies
These files appear to be development/testing implementations and are not imported or used by any production routes.
Flow Diagrams
Production Loop Flow Diagram
flowchart TD
Start([Start]) --> Check{iterations == 1?}
Check -->|Yes| SingleGen[Generate artifact without refinement]
SingleGen --> SaveSingle[Save to database]
SaveSingle --> EndSingle([Return artifact])
Check -->|No| Init[Initialize: artifact='', feedback='']
Init --> Loop{i <= iterations?}
Loop -->|Yes| UpdateStatus1[Update DB: 'Creating analysis']
UpdateStatus1 --> Gen[1. GENERATE: createArtifact]
Gen --> UpdateStatus2[Update DB: 'Evaluating results']
UpdateStatus2 --> Eval[2. EVALUATE: evaluateArtifacts]
Eval --> Parse{Model has JSON support?}
Parse -->|No| ParseFallback[parseResult()]
Parse -->|Yes| StripMarkdown[Strip markdown blocks]
StripMarkdown --> ParseJSON[JSON.parse()]
ParseFallback --> Decision{Result?}
ParseJSON --> Decision
Decision -->|Second is better| Update[artifact = newArtifact]
Decision -->|First is better| Keep[Keep current artifact]
Decision -->|Tie| Keep
Update --> UpdateStatus3[Update DB: 'Creating feedback']
Keep --> UpdateStatus3
UpdateStatus3 --> Feedback[3. FEEDBACK: createFeedback]
Feedback --> SaveIteration{Final iteration?}
SaveIteration -->|Yes| SaveFinal[Save to database]
SaveFinal --> End([Return final artifact])
SaveIteration -->|No| Increment[i++]
Increment --> Loop
Loop -->|No| End
Error Handling Flow
flowchart TD
Start([Try block]) --> Operation[Agent Operation]
Operation --> Success{Success?}
Success -->|Yes| Continue([Continue loop])
Success -->|No| ErrorCheck{Error type?}
ErrorCheck -->|Quota/429| Wait[Wait 60 seconds]
Wait --> UpdateDB[Update DB: 'Waiting for API']
UpdateDB --> Retry{Retry < MAX?}
ErrorCheck -->|Parse Error| RetryParse{Retry < MAX?}
ErrorCheck -->|Other| Log[Log error]
Log --> RetryOther{Retry < MAX?}
Retry -->|Yes| Operation
Retry -->|No| Fail[Mark as failed]
RetryParse -->|Yes| Operation
RetryParse -->|No| DefaultFirst[Use 'First' as default]
RetryOther -->|Yes| Operation
RetryOther -->|No| Fail
DefaultFirst --> Continue
Fail --> End([Return error])
Configuration Structure
Each agent operates based on an AgentConfig
:
interface AgentConfig {
// Context
contextType: string; // What is being analyzed (e.g., "RFP", "Project")
context: string; // The actual content to analyze
// Artifact definition
artifactType: string; // What to generate (e.g., "Summary", "Analysis")
artifactTypeFileName: string; // For file naming
// Evaluation criteria
rankingCriteria: string; // Criteria for evaluation and improvement
// Optional
schema?: any; // JSON schema for structured output
matchingData?: string; // Additional context (e.g., company skills)
opportunityColumnName?: string; // Database column for results
}
Error Handling and Retries
The production implementation includes robust error handling:
// From jobs/+server.ts
while (!iterationSuccess && retryCount < MAX_RETRIES) {
try {
// Generation-Evaluation-Feedback cycle
const newArtifact = await createArtifact(artifact, feedback, config);
const evaluation = await evaluateArtifacts(artifact, newArtifact, config);
// ... process results ...
iterationSuccess = true;
} catch (error) {
// Handle quota errors with exponential backoff
if (isQuotaError(error)) {
await wait(60000); // Wait 60 seconds
continue;
}
// Other errors may trigger retry
retryCount++;
}
}
Best Practices
1. Iteration Count
- 1 iteration: Simple generation without refinement
- 2-3 iterations: Standard refinement for most use cases
- 5-10 iterations: Deep refinement for critical content
- Population-based: 4-10 individuals, 5-10 generations
2. Feedback Quality
- Be specific and constructive in ranking criteria
- Focus on measurable improvements
- Avoid generic or vague feedback prompts
3. Evaluation Consistency
- Use structured output (JSON schema) when possible
- Implement parseResult() fallback for non-JSON models
- Consider multiple evaluation perspectives
4. Performance Optimization
- Use parallel processing for population-based approaches
- Cache intermediate results for debugging
- Implement proper error handling and retries
5. Configuration Design
// Good configuration example
const config: AgentConfig = {
contextType: 'Tender Document',
context: tenderContent,
artifactType: 'Technical Requirements Analysis',
artifactTypeFileName: 'tech-requirements',
rankingCriteria: `
Evaluate based on:
1. Completeness: All requirements identified
2. Clarity: Clear, unambiguous descriptions
3. Structure: Logical organization
4. Actionability: Requirements are implementable
`,
schema: technicalRequirementsSchema
};
Extension Points
Adding New Agent Types
- Create configuration in
/src/lib/server/zen-agent/configs/
- Define the AgentType enum value
- Add case in
getConfig()
function - Implement any special processing logic
Custom Orchestration Patterns
- Create new orchestration file in
/systems/
- Import the three core agents
- Implement custom loop logic
- Export for use in API endpoints
Integration with Other Systems
The loop can be integrated with:
- Database persistence (as shown in jobs endpoint)
- Real-time status updates
- Webhook notifications
- Batch processing systems
Monitoring and Debugging
Logging Points
Each agent logs key events:
// ArtifactAgent
await logger.log('Creating artifact', 'artifactAgent', { artifactType });
await logger.log('Generated artifact', 'artifactAgent', { length });
// EvaluationAgent
await logger.log('Evaluating artifacts', 'evaluationAgent', { artifactType });
await logger.log('Generated evaluation', 'evaluationAgent', { result });
// FeedbackAgent
await logger.log('Creating feedback', 'feedbackAgent', { artifactType });
await logger.log('Generated feedback', 'feedbackAgent', { length });
File Output for Debugging
The system writes intermediate results for analysis:
writeToMarkdown(timestampedDir, `artifact-${i}`, artifact);
writeToMarkdown(timestampedDir, `evaluation-${i}`, evaluation);
writeToMarkdown(timestampedDir, `feedback-${i}`, feedback);
Summary
The Gen-Eval-Feedback loop is a powerful pattern for iterative content improvement. The three agents work together in a cycle:
- Generate: Create or refine content based on feedback
- Evaluate: Compare and select the better version
- Feedback: Provide guidance for the next iteration
This pattern ensures continuous improvement while preventing regression, as only better artifacts are kept. The system is flexible enough to support various orchestration patterns from simple sequential processing to complex evolutionary algorithms.
The modular design allows for easy extension and customization while maintaining the core improvement loop that drives quality enhancement across all content types in the Zen Agent system.