Zen Agent: Generation, Evaluation, and Feedback Loop Documentation

Overview

The Zen Agent system implements an iterative refinement loop using three core agents that work together to generate, evaluate, and improve content through multiple iterations. This document explains how these agents interact and how the loop is orchestrated.

Core Agents

1. ArtifactAgent (`src/lib/server/zen-agent/agents/artifactAgent.ts`)

The ArtifactAgent is responsible for generating or refining content based on:

Current artifact state
Feedback from previous iterations
Configuration context and criteria

Key Function:

typescript

createArtifact(
  artifact: string,      // Current artifact (or "No {type} yet." for first iteration)
  feedback: string,      // Feedback to incorporate
  config: AgentConfig    // Configuration with context, criteria, schema
): Promise<string>

Process:

Takes the current artifact and feedback as input
Generates a prompt incorporating the feedback and ranking criteria
Either generates with schema validation or raw text generation
Returns the new/refined artifact

2. EvaluationAgent (`src/lib/server/zen-agent/agents/evaluationAgent.ts`)

The EvaluationAgent compares two artifacts and determines which is better:

Key Function:

typescript

evaluateArtifacts(
  firstArtifact: string,   // Original/old artifact
  secondArtifact: string,  // New/refined artifact
  config: AgentConfig      // Configuration with evaluation criteria
): Promise<string>        // Returns JSON with result: "First", "Second", or "Tie"

Evaluation Schema:

typescript

{
  result: "First" | "Second" | "Tie",
  explanation: string
}

Process:

Compares two artifacts based on ranking criteria
Uses structured output (JSON schema) for consistent evaluation
Returns which artifact is better with explanation
Includes parseResult() helper for models without JSON support

3. FeedbackAgent (`src/lib/server/zen-agent/agents/feedbackAgent.ts`)

The FeedbackAgent provides constructive criticism to guide improvements:

Key Function:

typescript

createFeedbackPrompt(
  artifact: string,       // Current artifact to critique
  config: AgentConfig     // Configuration with evaluation criteria
): Promise<string>        // Returns detailed feedback

Process:

Analyzes the current artifact against ranking criteria
Provides demanding, precise, constructive criticism
Focuses on areas for improvement
Returns actionable feedback for the next iteration

The Generation-Evaluation-Feedback Loop

Basic Loop Implementation (`src/routes/api/(protected)/zen-agent/jobs/+server.ts`)

The main job execution endpoint implements the core loop:

typescript

// Simplified loop structure
for (let i = 1; i <= iterations; i++) {
	// 1. GENERATION: Create new artifact incorporating feedback
	const newArtifact = await createArtifact(artifact, feedback, config);

	// 2. EVALUATION: Compare old vs new artifact
	const evaluation = await evaluateArtifacts(artifact, newArtifact, config);
	const result = JSON.parse(evaluation).result;

	// 3. SELECTION: Keep better artifact
	if (result === 'Second') {
		artifact = newArtifact; // New is better, replace
	}
	// If 'First' or 'Tie', keep current artifact

	// 4. FEEDBACK: Generate feedback for next iteration
	feedback = await createFeedback(artifact, config);
}

Key Loop Characteristics

Iterative Refinement: Each iteration builds on the previous result
Competitive Selection: Only improvements are kept (survival of the fittest)
Guided Evolution: Feedback directs the next generation attempt
Configurable Iterations: Typically 2-10 iterations depending on use case

Production Implementation

The production system uses the implementation in /src/routes/api/(protected)/zen-agent/jobs/+server.ts. This is the only orchestration pattern actively used in the codebase.

Core Production Loop

typescript

// From jobs/+server.ts - The actual implementation used in production
for (let i = 1; i <= iterations; i++) {
	// 1. GENERATION: Create new artifact
	const newArtifact = await createArtifact(artifact, feedback, config);

	// 2. EVALUATION: Compare artifacts
	const evaluation = await evaluateArtifacts(artifact, newArtifact, config);

	// 3. PARSING: Handle different model response formats
	let result = '';
	if (modelHasNoJSONSupport()) {
		result = await parseResult(evaluation);
	} else {
		const cleanedEvaluation = stripMarkdownCodeBlock(evaluation);
		const parsedEval = JSON.parse(cleanedEvaluation);
		result = parsedEval.result;
	}

	// 4. SELECTION: Keep better artifact
	if (result === 'Second') {
		artifact = newArtifact;
	}

	// 5. FEEDBACK: Generate for next iteration
	feedback = await createFeedback(artifact, config);
}

Special Case: Single Iteration

When iterations = 1, the system skips evaluation and feedback:

typescript

if (iterations === 1) {
	// Direct generation without refinement loop
	const newArtifact = await createArtifact(artifact, feedback, config);

	// Save and return immediately
	await pool.query(`UPDATE ${tableName} SET ${columnName} = $1 WHERE id = $2`, [
		{ status: 'completed', result: newArtifact },
		recordId
	]);

	return newArtifact;
}

Experimental Patterns (Not Used in Production)

The codebase contains several experimental orchestration patterns that are not currently integrated into the production system. These are located in /src/lib/server/zen-agent/systems/ and include:

zenAgents.ts: Simple sequential loop for testing
evolution/: Population-based evolutionary algorithms
tasks/subtasks.ts: Nested refinement for multi-part tasks
proposal/: Various proposal-specific refinement strategies

These files appear to be development/testing implementations and are not imported or used by any production routes.

Flow Diagrams

Production Loop Flow Diagram

mermaid

flowchart TD
    Start([Start]) --> Check{iterations == 1?}

    Check -->|Yes| SingleGen[Generate artifact without refinement]
    SingleGen --> SaveSingle[Save to database]
    SaveSingle --> EndSingle([Return artifact])

    Check -->|No| Init[Initialize: artifact='', feedback='']
    Init --> Loop{i <= iterations?}

    Loop -->|Yes| UpdateStatus1[Update DB: 'Creating analysis']
    UpdateStatus1 --> Gen[1. GENERATE: createArtifact]
    Gen --> UpdateStatus2[Update DB: 'Evaluating results']
    UpdateStatus2 --> Eval[2. EVALUATE: evaluateArtifacts]

    Eval --> Parse{Model has JSON support?}
    Parse -->|No| ParseFallback[parseResult()]
    Parse -->|Yes| StripMarkdown[Strip markdown blocks]
    StripMarkdown --> ParseJSON[JSON.parse()]

    ParseFallback --> Decision{Result?}
    ParseJSON --> Decision

    Decision -->|Second is better| Update[artifact = newArtifact]
    Decision -->|First is better| Keep[Keep current artifact]
    Decision -->|Tie| Keep

    Update --> UpdateStatus3[Update DB: 'Creating feedback']
    Keep --> UpdateStatus3
    UpdateStatus3 --> Feedback[3. FEEDBACK: createFeedback]

    Feedback --> SaveIteration{Final iteration?}
    SaveIteration -->|Yes| SaveFinal[Save to database]
    SaveFinal --> End([Return final artifact])

    SaveIteration -->|No| Increment[i++]
    Increment --> Loop

    Loop -->|No| End

Error Handling Flow

mermaid

flowchart TD
    Start([Try block]) --> Operation[Agent Operation]
    Operation --> Success{Success?}

    Success -->|Yes| Continue([Continue loop])
    Success -->|No| ErrorCheck{Error type?}

    ErrorCheck -->|Quota/429| Wait[Wait 60 seconds]
    Wait --> UpdateDB[Update DB: 'Waiting for API']
    UpdateDB --> Retry{Retry < MAX?}

    ErrorCheck -->|Parse Error| RetryParse{Retry < MAX?}
    ErrorCheck -->|Other| Log[Log error]
    Log --> RetryOther{Retry < MAX?}

    Retry -->|Yes| Operation
    Retry -->|No| Fail[Mark as failed]

    RetryParse -->|Yes| Operation
    RetryParse -->|No| DefaultFirst[Use 'First' as default]

    RetryOther -->|Yes| Operation
    RetryOther -->|No| Fail

    DefaultFirst --> Continue
    Fail --> End([Return error])

Configuration Structure

Each agent operates based on an AgentConfig:

typescript

interface AgentConfig {
	// Context
	contextType: string; // What is being analyzed (e.g., "RFP", "Project")
	context: string; // The actual content to analyze

	// Artifact definition
	artifactType: string; // What to generate (e.g., "Summary", "Analysis")
	artifactTypeFileName: string; // For file naming

	// Evaluation criteria
	rankingCriteria: string; // Criteria for evaluation and improvement

	// Optional
	schema?: any; // JSON schema for structured output
	matchingData?: string; // Additional context (e.g., company skills)
	opportunityColumnName?: string; // Database column for results
}

Error Handling and Retries

The production implementation includes robust error handling:

typescript

// From jobs/+server.ts
while (!iterationSuccess && retryCount < MAX_RETRIES) {
	try {
		// Generation-Evaluation-Feedback cycle
		const newArtifact = await createArtifact(artifact, feedback, config);
		const evaluation = await evaluateArtifacts(artifact, newArtifact, config);
		// ... process results ...
		iterationSuccess = true;
	} catch (error) {
		// Handle quota errors with exponential backoff
		if (isQuotaError(error)) {
			await wait(60000); // Wait 60 seconds
			continue;
		}
		// Other errors may trigger retry
		retryCount++;
	}
}

Best Practices

1. Iteration Count

1 iteration: Simple generation without refinement
2-3 iterations: Standard refinement for most use cases
5-10 iterations: Deep refinement for critical content
Population-based: 4-10 individuals, 5-10 generations

2. Feedback Quality

Be specific and constructive in ranking criteria
Focus on measurable improvements
Avoid generic or vague feedback prompts

3. Evaluation Consistency

Use structured output (JSON schema) when possible
Implement parseResult() fallback for non-JSON models
Consider multiple evaluation perspectives

4. Performance Optimization

Use parallel processing for population-based approaches
Cache intermediate results for debugging
Implement proper error handling and retries

5. Configuration Design

typescript

// Good configuration example
const config: AgentConfig = {
	contextType: 'Tender Document',
	context: tenderContent,
	artifactType: 'Technical Requirements Analysis',
	artifactTypeFileName: 'tech-requirements',
	rankingCriteria: `
    Evaluate based on:
    1. Completeness: All requirements identified
    2. Clarity: Clear, unambiguous descriptions
    3. Structure: Logical organization
    4. Actionability: Requirements are implementable
  `,
	schema: technicalRequirementsSchema
};

Extension Points

Adding New Agent Types

Create configuration in /src/lib/server/zen-agent/configs/
Define the AgentType enum value
Add case in getConfig() function
Implement any special processing logic

Custom Orchestration Patterns

Create new orchestration file in /systems/
Import the three core agents
Implement custom loop logic
Export for use in API endpoints

Integration with Other Systems

The loop can be integrated with:

Database persistence (as shown in jobs endpoint)
Real-time status updates
Webhook notifications
Batch processing systems

Monitoring and Debugging

Logging Points

Each agent logs key events:

typescript

// ArtifactAgent
await logger.log('Creating artifact', 'artifactAgent', { artifactType });
await logger.log('Generated artifact', 'artifactAgent', { length });

// EvaluationAgent
await logger.log('Evaluating artifacts', 'evaluationAgent', { artifactType });
await logger.log('Generated evaluation', 'evaluationAgent', { result });

// FeedbackAgent
await logger.log('Creating feedback', 'feedbackAgent', { artifactType });
await logger.log('Generated feedback', 'feedbackAgent', { length });

File Output for Debugging

The system writes intermediate results for analysis:

typescript

writeToMarkdown(timestampedDir, `artifact-${i}`, artifact);
writeToMarkdown(timestampedDir, `evaluation-${i}`, evaluation);
writeToMarkdown(timestampedDir, `feedback-${i}`, feedback);

Summary

The Gen-Eval-Feedback loop is a powerful pattern for iterative content improvement. The three agents work together in a cycle:

Generate: Create or refine content based on feedback
Evaluate: Compare and select the better version
Feedback: Provide guidance for the next iteration

This pattern ensures continuous improvement while preventing regression, as only better artifacts are kept. The system is flexible enough to support various orchestration patterns from simple sequential processing to complex evolutionary algorithms.

The modular design allows for easy extension and customization while maintaining the core improvement loop that drives quality enhancement across all content types in the Zen Agent system.

Zen Agent: Generation, Evaluation, and Feedback Loop Documentation ​

Overview ​

Core Agents ​

1. ArtifactAgent (src/lib/server/zen-agent/agents/artifactAgent.ts) ​

2. EvaluationAgent (src/lib/server/zen-agent/agents/evaluationAgent.ts) ​

3. FeedbackAgent (src/lib/server/zen-agent/agents/feedbackAgent.ts) ​

The Generation-Evaluation-Feedback Loop ​

Basic Loop Implementation (src/routes/api/(protected)/zen-agent/jobs/+server.ts) ​

Key Loop Characteristics ​

Production Implementation ​

Core Production Loop ​

Special Case: Single Iteration ​

Experimental Patterns (Not Used in Production) ​

Flow Diagrams ​

Production Loop Flow Diagram ​

Error Handling Flow ​

Configuration Structure ​

Error Handling and Retries ​

Best Practices ​

1. Iteration Count ​

2. Feedback Quality ​

3. Evaluation Consistency ​

4. Performance Optimization ​

5. Configuration Design ​

Extension Points ​

Adding New Agent Types ​

Custom Orchestration Patterns ​

Integration with Other Systems ​

Monitoring and Debugging ​

Logging Points ​

File Output for Debugging ​

Summary ​

Zen Agent: Generation, Evaluation, and Feedback Loop Documentation

Overview

Core Agents

1. ArtifactAgent (`src/lib/server/zen-agent/agents/artifactAgent.ts`)

2. EvaluationAgent (`src/lib/server/zen-agent/agents/evaluationAgent.ts`)

3. FeedbackAgent (`src/lib/server/zen-agent/agents/feedbackAgent.ts`)

The Generation-Evaluation-Feedback Loop

Basic Loop Implementation (`src/routes/api/(protected)/zen-agent/jobs/+server.ts`)

Key Loop Characteristics

Production Implementation

Core Production Loop

Special Case: Single Iteration

Experimental Patterns (Not Used in Production)

Flow Diagrams

Production Loop Flow Diagram

Error Handling Flow

Configuration Structure

Error Handling and Retries

Best Practices

1. Iteration Count

2. Feedback Quality

3. Evaluation Consistency

4. Performance Optimization

5. Configuration Design

Extension Points

Adding New Agent Types

Custom Orchestration Patterns

Integration with Other Systems

Monitoring and Debugging

Logging Points

File Output for Debugging

Summary