Skip to content

LLM-Based Tender Extraction Specification

Overview

This specification outlines the implementation of an LLM-powered feature that automatically extracts key information (name, description, and submission deadline) from uploaded tender documents when a user uploads them on the opportunities page.

Goals

  1. Automatic Information Extraction: When a user uploads a tender document, automatically extract:

    • Tender name
    • Tender description
    • Submission deadline
  2. Structured Output: Use JSON schema validation to ensure consistent, structured output from the LLM

  3. Rate Limiting: Implement rate limiting to prevent API abuse and control costs

  4. User Experience: Show extracted information immediately in the opportunities list

Technical Architecture

Frontend Flow

  1. User uploads tender document(s) via the existing upload button on /zen-agent/opportunities
  2. After file upload completes, trigger LLM extraction API call
  3. Display loading state while extraction is in progress
  4. Update opportunity with extracted data
  5. Show the updated opportunity in the list with extracted details

Backend API Endpoint

Create new endpoint: POST /api/(protected)/zen-agent/opportunities/extract-tender-info

Request Body:

json
{
	"content": "string", // The uploaded tender content
	"fileName": "string", // Original filename for context
	"opportunityId": "string" // The opportunity ID to update
}

Response:

json
{
	"success": true,
	"extractedData": {
		"name": "string",
		"description": "string",
		"submissionDeadline": "string" // ISO date format
	}
}

LLM Integration

Provider

Use the existing Google AI (Gemini) integration available at /src/lib/server/zen-agent/util/llm/googleAI.ts

  • Model: gemini-2.5-flash-lite-preview-06-17
  • Use generateWithSchema function for structured JSON output

Structured Output Schema

typescript
interface TenderExtractionResult {
	name: string;
	description: string;
	submissionDeadline: string | null; // ISO date or null if not found
}

Prompt Design

Analyze the following tender document and extract key information.

Instructions:
1. Extract the tender/opportunity name or title
2. Write a concise description (max 200 words) summarizing what is being procured
3. Find the submission deadline date and convert to ISO format (YYYY-MM-DD)
4. If any information cannot be found, use null for that field
5. Be factual and only extract information explicitly stated in the document

Document content:
---
{content}
---

Implementation Example

typescript
import { generateWithSchema } from '$lib/server/zen-agent/util/llm/googleAI';

// Define the schema for structured output
const extractionSchema = {
	type: 'object',
	properties: {
		name: { type: 'string' },
		description: { type: 'string' },
		submissionDeadline: { type: ['string', 'null'] }
	},
	required: ['name', 'description', 'submissionDeadline']
};

// Make the LLM call
const extractedData = await generateWithSchema(prompt, extractionSchema, 0.3);
const result = JSON.parse(extractedData);

Rate Limiting

Following the pattern from website-import:

typescript
// 5 requests per minute per user for tender extraction
const ratelimit = createRateLimit(5, '1 m', 'tender-extraction');
const { success } = await ratelimit.limit(locals.user.id);

if (!success) {
	throw error(429, 'Rate limit exceeded. Please wait before extracting another tender.');
}

Database Updates

After successful extraction, update the opportunity record:

sql
UPDATE opportunities
SET
  name = $1,
  description = $2,
  submission_deadline = $3,
  updated_at = CURRENT_TIMESTAMP
WHERE id = $4 AND owner_id = $5

Implementation Plan

Phase 1: Backend API Development

  1. Create extraction API endpoint (/api/(protected)/zen-agent/opportunities/extract-tender-info/+server.ts)

    • Implement rate limiting (5 requests per minute)
    • Validate request body
    • Implement LLM call with structured output
    • Update opportunity in database
    • Add comprehensive error handling and logging
  2. Add LLM utility function (/src/lib/server/zen-agent/util/llm/tenderExtraction.ts)

    • Create dedicated function for tender extraction
    • Implement retry logic for LLM failures
    • Add response validation against schema

Phase 2: Frontend Integration

  1. Modify opportunities page (/src/routes/(app)/(public)/zen-agent/opportunities/+page.svelte)

    • Add extraction API call after file upload
    • Show loading state during extraction
    • Handle extraction errors gracefully
    • Update opportunity list with extracted data
  2. Add loading states

    • Show spinner or progress indicator during extraction
    • Display extracted fields as they populate
    • Allow manual editing of extracted fields

Phase 3: Database Schema Updates

  1. Add submission_deadline column to opportunities table

    sql
    ALTER TABLE opportunities
    ADD COLUMN IF NOT EXISTS submission_deadline DATE;

    Note: Currently, submission_deadline is extracted from the JSON path tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline'. Adding it as a proper column will improve query performance and data consistency. The frontend already expects this field.

  2. Update existing data (migrate from JSON to column)

    sql
    UPDATE opportunities
    SET submission_deadline = (tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline')::DATE
    WHERE tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline' IS NOT NULL;
  3. Add extraction metadata (optional)

    sql
    ALTER TABLE opportunities
    ADD COLUMN IF NOT EXISTS extraction_metadata JSONB;

Phase 4: Testing & Error Handling

  1. Test various document formats

    • PDF documents
    • Plain text
    • Multiple file uploads
    • Documents in different languages
  2. Error scenarios

    • Rate limit exceeded
    • LLM extraction failures
    • Invalid document content
    • Network timeouts
  3. Monitoring

    • Log extraction success/failure rates
    • Monitor LLM token usage
    • Track average extraction times

Security Considerations

  1. Input Validation

    • Sanitize all user inputs
    • Limit document size (e.g., max 10MB)
    • Validate file types
  2. Rate Limiting

    • Per-user rate limits to prevent abuse
    • Monitor for suspicious patterns
  3. Data Privacy

    • Don't log sensitive tender content
    • Ensure proper user authorization

Future Enhancements

  1. Multi-language Support

    • Detect document language
    • Extract in original language
    • Translate if needed
  2. Advanced Extraction

    • Budget/value extraction
    • Technical requirements
    • Evaluation criteria
  3. Confidence Scoring

    • Add confidence scores to extracted fields
    • Highlight low-confidence extractions for review
  4. Batch Processing

    • Support multiple tender extraction in one request
    • Queue system for large batches

Success Metrics

  1. Accuracy: 90%+ accuracy on name and deadline extraction
  2. Performance: < 5 seconds average extraction time
  3. Adoption: 70%+ of uploaded tenders use extraction
  4. Error Rate: < 5% extraction failure rate

Company Profile Extraction Reference Implementation

The website import feature at /api/(protected)/zen-agent/website-import provides a good reference for this implementation:

Key Patterns from Company Profile Extraction:

  1. Rate Limiting Implementation:

    typescript
    const ratelimit = createRateLimit(1, '30 s', 'website-import');
    const { success } = await ratelimit.limit(locals.user.id);
  2. LLM Call Pattern:

    • Uses generate() function from googleAI.ts
    • Implements prompt engineering with language support
    • Cleans markdown fences from LLM output
    • Has fallback for LLM failures
  3. Error Handling:

    • Comprehensive try-catch blocks
    • Specific error messages for different failure modes
    • Returns partial results when possible
  4. Content Processing:

    • Extracts and cleans HTML content
    • Limits input text length (15000 chars)
    • Validates minimum content length

Dependencies

  • Google AI (Gemini) API access via existing googleAI.ts utilities
  • Upstash Redis for rate limiting via createRateLimit function
  • Existing authentication system
  • Existing file upload infrastructure

Timeline Estimate

  • Phase 1 (Backend): 2-3 days
  • Phase 2 (Frontend): 2 days
  • Phase 3 (Database): 1 day
  • Phase 4 (Testing): 2 days

Total: 7-8 days of development