Skip to content

Zen Agent: Generation, Evaluation, and Feedback Loop Documentation

Overview

The Zen Agent system implements an iterative refinement loop using three core agents that work together to generate, evaluate, and improve content through multiple iterations. This document explains how these agents interact and how the loop is orchestrated.

Core Agents

1. ArtifactAgent (src/lib/server/zen-agent/agents/artifactAgent.ts)

The ArtifactAgent is responsible for generating or refining content based on:

  • Current artifact state
  • Feedback from previous iterations
  • Configuration context and criteria

Key Function:

typescript
createArtifact(
  artifact: string,      // Current artifact (or "No {type} yet." for first iteration)
  feedback: string,      // Feedback to incorporate
  config: AgentConfig    // Configuration with context, criteria, schema
): Promise<string>

Process:

  1. Takes the current artifact and feedback as input
  2. Generates a prompt incorporating the feedback and ranking criteria
  3. Either generates with schema validation or raw text generation
  4. Returns the new/refined artifact

2. EvaluationAgent (src/lib/server/zen-agent/agents/evaluationAgent.ts)

The EvaluationAgent compares two artifacts and determines which is better:

Key Function:

typescript
evaluateArtifacts(
  firstArtifact: string,   // Original/old artifact
  secondArtifact: string,  // New/refined artifact
  config: AgentConfig      // Configuration with evaluation criteria
): Promise<string>        // Returns JSON with result: "First", "Second", or "Tie"

Evaluation Schema:

typescript
{
  result: "First" | "Second" | "Tie",
  explanation: string
}

Process:

  1. Compares two artifacts based on ranking criteria
  2. Uses structured output (JSON schema) for consistent evaluation
  3. Returns which artifact is better with explanation
  4. Includes parseResult() helper for models without JSON support

3. FeedbackAgent (src/lib/server/zen-agent/agents/feedbackAgent.ts)

The FeedbackAgent provides constructive criticism to guide improvements:

Key Function:

typescript
createFeedbackPrompt(
  artifact: string,       // Current artifact to critique
  config: AgentConfig     // Configuration with evaluation criteria
): Promise<string>        // Returns detailed feedback

Process:

  1. Analyzes the current artifact against ranking criteria
  2. Provides demanding, precise, constructive criticism
  3. Focuses on areas for improvement
  4. Returns actionable feedback for the next iteration

The Generation-Evaluation-Feedback Loop

Basic Loop Implementation (src/routes/api/(protected)/zen-agent/jobs/+server.ts)

The main job execution endpoint implements the core loop:

typescript
// Simplified loop structure
for (let i = 1; i <= iterations; i++) {
	// 1. GENERATION: Create new artifact incorporating feedback
	const newArtifact = await createArtifact(artifact, feedback, config);

	// 2. EVALUATION: Compare old vs new artifact
	const evaluation = await evaluateArtifacts(artifact, newArtifact, config);
	const result = JSON.parse(evaluation).result;

	// 3. SELECTION: Keep better artifact
	if (result === 'Second') {
		artifact = newArtifact; // New is better, replace
	}
	// If 'First' or 'Tie', keep current artifact

	// 4. FEEDBACK: Generate feedback for next iteration
	feedback = await createFeedback(artifact, config);
}

Key Loop Characteristics

  1. Iterative Refinement: Each iteration builds on the previous result
  2. Competitive Selection: Only improvements are kept (survival of the fittest)
  3. Guided Evolution: Feedback directs the next generation attempt
  4. Configurable Iterations: Typically 2-10 iterations depending on use case

Production Implementation

The production system uses the implementation in /src/routes/api/(protected)/zen-agent/jobs/+server.ts. This is the only orchestration pattern actively used in the codebase.

Core Production Loop

typescript
// From jobs/+server.ts - The actual implementation used in production
for (let i = 1; i <= iterations; i++) {
	// 1. GENERATION: Create new artifact
	const newArtifact = await createArtifact(artifact, feedback, config);

	// 2. EVALUATION: Compare artifacts
	const evaluation = await evaluateArtifacts(artifact, newArtifact, config);

	// 3. PARSING: Handle different model response formats
	let result = '';
	if (modelHasNoJSONSupport()) {
		result = await parseResult(evaluation);
	} else {
		const cleanedEvaluation = stripMarkdownCodeBlock(evaluation);
		const parsedEval = JSON.parse(cleanedEvaluation);
		result = parsedEval.result;
	}

	// 4. SELECTION: Keep better artifact
	if (result === 'Second') {
		artifact = newArtifact;
	}

	// 5. FEEDBACK: Generate for next iteration
	feedback = await createFeedback(artifact, config);
}

Special Case: Single Iteration

When iterations = 1, the system skips evaluation and feedback:

typescript
if (iterations === 1) {
	// Direct generation without refinement loop
	const newArtifact = await createArtifact(artifact, feedback, config);

	// Save and return immediately
	await pool.query(`UPDATE ${tableName} SET ${columnName} = $1 WHERE id = $2`, [
		{ status: 'completed', result: newArtifact },
		recordId
	]);

	return newArtifact;
}

Experimental Patterns (Not Used in Production)

The codebase contains several experimental orchestration patterns that are not currently integrated into the production system. These are located in /src/lib/server/zen-agent/systems/ and include:

  • zenAgents.ts: Simple sequential loop for testing
  • evolution/: Population-based evolutionary algorithms
  • tasks/subtasks.ts: Nested refinement for multi-part tasks
  • proposal/: Various proposal-specific refinement strategies

These files appear to be development/testing implementations and are not imported or used by any production routes.

Flow Diagrams

Production Loop Flow Diagram

mermaid
flowchart TD
    Start([Start]) --> Check{iterations == 1?}

    Check -->|Yes| SingleGen[Generate artifact without refinement]
    SingleGen --> SaveSingle[Save to database]
    SaveSingle --> EndSingle([Return artifact])

    Check -->|No| Init[Initialize: artifact='', feedback='']
    Init --> Loop{i <= iterations?}

    Loop -->|Yes| UpdateStatus1[Update DB: 'Creating analysis']
    UpdateStatus1 --> Gen[1. GENERATE: createArtifact]
    Gen --> UpdateStatus2[Update DB: 'Evaluating results']
    UpdateStatus2 --> Eval[2. EVALUATE: evaluateArtifacts]

    Eval --> Parse{Model has JSON support?}
    Parse -->|No| ParseFallback[parseResult()]
    Parse -->|Yes| StripMarkdown[Strip markdown blocks]
    StripMarkdown --> ParseJSON[JSON.parse()]

    ParseFallback --> Decision{Result?}
    ParseJSON --> Decision

    Decision -->|Second is better| Update[artifact = newArtifact]
    Decision -->|First is better| Keep[Keep current artifact]
    Decision -->|Tie| Keep

    Update --> UpdateStatus3[Update DB: 'Creating feedback']
    Keep --> UpdateStatus3
    UpdateStatus3 --> Feedback[3. FEEDBACK: createFeedback]

    Feedback --> SaveIteration{Final iteration?}
    SaveIteration -->|Yes| SaveFinal[Save to database]
    SaveFinal --> End([Return final artifact])

    SaveIteration -->|No| Increment[i++]
    Increment --> Loop

    Loop -->|No| End

Error Handling Flow

mermaid
flowchart TD
    Start([Try block]) --> Operation[Agent Operation]
    Operation --> Success{Success?}

    Success -->|Yes| Continue([Continue loop])
    Success -->|No| ErrorCheck{Error type?}

    ErrorCheck -->|Quota/429| Wait[Wait 60 seconds]
    Wait --> UpdateDB[Update DB: 'Waiting for API']
    UpdateDB --> Retry{Retry < MAX?}

    ErrorCheck -->|Parse Error| RetryParse{Retry < MAX?}
    ErrorCheck -->|Other| Log[Log error]
    Log --> RetryOther{Retry < MAX?}

    Retry -->|Yes| Operation
    Retry -->|No| Fail[Mark as failed]

    RetryParse -->|Yes| Operation
    RetryParse -->|No| DefaultFirst[Use 'First' as default]

    RetryOther -->|Yes| Operation
    RetryOther -->|No| Fail

    DefaultFirst --> Continue
    Fail --> End([Return error])

Configuration Structure

Each agent operates based on an AgentConfig:

typescript
interface AgentConfig {
	// Context
	contextType: string; // What is being analyzed (e.g., "RFP", "Project")
	context: string; // The actual content to analyze

	// Artifact definition
	artifactType: string; // What to generate (e.g., "Summary", "Analysis")
	artifactTypeFileName: string; // For file naming

	// Evaluation criteria
	rankingCriteria: string; // Criteria for evaluation and improvement

	// Optional
	schema?: any; // JSON schema for structured output
	matchingData?: string; // Additional context (e.g., company skills)
	opportunityColumnName?: string; // Database column for results
}

Error Handling and Retries

The production implementation includes robust error handling:

typescript
// From jobs/+server.ts
while (!iterationSuccess && retryCount < MAX_RETRIES) {
	try {
		// Generation-Evaluation-Feedback cycle
		const newArtifact = await createArtifact(artifact, feedback, config);
		const evaluation = await evaluateArtifacts(artifact, newArtifact, config);
		// ... process results ...
		iterationSuccess = true;
	} catch (error) {
		// Handle quota errors with exponential backoff
		if (isQuotaError(error)) {
			await wait(60000); // Wait 60 seconds
			continue;
		}
		// Other errors may trigger retry
		retryCount++;
	}
}

Best Practices

1. Iteration Count

  • 1 iteration: Simple generation without refinement
  • 2-3 iterations: Standard refinement for most use cases
  • 5-10 iterations: Deep refinement for critical content
  • Population-based: 4-10 individuals, 5-10 generations

2. Feedback Quality

  • Be specific and constructive in ranking criteria
  • Focus on measurable improvements
  • Avoid generic or vague feedback prompts

3. Evaluation Consistency

  • Use structured output (JSON schema) when possible
  • Implement parseResult() fallback for non-JSON models
  • Consider multiple evaluation perspectives

4. Performance Optimization

  • Use parallel processing for population-based approaches
  • Cache intermediate results for debugging
  • Implement proper error handling and retries

5. Configuration Design

typescript
// Good configuration example
const config: AgentConfig = {
	contextType: 'Tender Document',
	context: tenderContent,
	artifactType: 'Technical Requirements Analysis',
	artifactTypeFileName: 'tech-requirements',
	rankingCriteria: `
    Evaluate based on:
    1. Completeness: All requirements identified
    2. Clarity: Clear, unambiguous descriptions
    3. Structure: Logical organization
    4. Actionability: Requirements are implementable
  `,
	schema: technicalRequirementsSchema
};

Extension Points

Adding New Agent Types

  1. Create configuration in /src/lib/server/zen-agent/configs/
  2. Define the AgentType enum value
  3. Add case in getConfig() function
  4. Implement any special processing logic

Custom Orchestration Patterns

  1. Create new orchestration file in /systems/
  2. Import the three core agents
  3. Implement custom loop logic
  4. Export for use in API endpoints

Integration with Other Systems

The loop can be integrated with:

  • Database persistence (as shown in jobs endpoint)
  • Real-time status updates
  • Webhook notifications
  • Batch processing systems

Monitoring and Debugging

Logging Points

Each agent logs key events:

typescript
// ArtifactAgent
await logger.log('Creating artifact', 'artifactAgent', { artifactType });
await logger.log('Generated artifact', 'artifactAgent', { length });

// EvaluationAgent
await logger.log('Evaluating artifacts', 'evaluationAgent', { artifactType });
await logger.log('Generated evaluation', 'evaluationAgent', { result });

// FeedbackAgent
await logger.log('Creating feedback', 'feedbackAgent', { artifactType });
await logger.log('Generated feedback', 'feedbackAgent', { length });

File Output for Debugging

The system writes intermediate results for analysis:

typescript
writeToMarkdown(timestampedDir, `artifact-${i}`, artifact);
writeToMarkdown(timestampedDir, `evaluation-${i}`, evaluation);
writeToMarkdown(timestampedDir, `feedback-${i}`, feedback);

Summary

The Gen-Eval-Feedback loop is a powerful pattern for iterative content improvement. The three agents work together in a cycle:

  1. Generate: Create or refine content based on feedback
  2. Evaluate: Compare and select the better version
  3. Feedback: Provide guidance for the next iteration

This pattern ensures continuous improvement while preventing regression, as only better artifacts are kept. The system is flexible enough to support various orchestration patterns from simple sequential processing to complex evolutionary algorithms.

The modular design allows for easy extension and customization while maintaining the core improvement loop that drives quality enhancement across all content types in the Zen Agent system.