LLM-Based Tender Extraction Specification
Overview
This specification outlines the implementation of an LLM-powered feature that automatically extracts key information (name, description, and submission deadline) from uploaded tender documents when a user uploads them on the opportunities page.
Goals
Automatic Information Extraction: When a user uploads a tender document, automatically extract:
- Tender name
- Tender description
- Submission deadline
Structured Output: Use JSON schema validation to ensure consistent, structured output from the LLM
Rate Limiting: Implement rate limiting to prevent API abuse and control costs
User Experience: Show extracted information immediately in the opportunities list
Technical Architecture
Frontend Flow
- User uploads tender document(s) via the existing upload button on
/zen-agent/opportunities
- After file upload completes, trigger LLM extraction API call
- Display loading state while extraction is in progress
- Update opportunity with extracted data
- Show the updated opportunity in the list with extracted details
Backend API Endpoint
Create new endpoint: POST /api/(protected)/zen-agent/opportunities/extract-tender-info
Request Body:
{
"content": "string", // The uploaded tender content
"fileName": "string", // Original filename for context
"opportunityId": "string" // The opportunity ID to update
}
Response:
{
"success": true,
"extractedData": {
"name": "string",
"description": "string",
"submissionDeadline": "string" // ISO date format
}
}
LLM Integration
Provider
Use the existing Google AI (Gemini) integration available at /src/lib/server/zen-agent/util/llm/googleAI.ts
- Model:
gemini-2.5-flash-lite-preview-06-17
- Use
generateWithSchema
function for structured JSON output
Structured Output Schema
interface TenderExtractionResult {
name: string;
description: string;
submissionDeadline: string | null; // ISO date or null if not found
}
Prompt Design
Analyze the following tender document and extract key information.
Instructions:
1. Extract the tender/opportunity name or title
2. Write a concise description (max 200 words) summarizing what is being procured
3. Find the submission deadline date and convert to ISO format (YYYY-MM-DD)
4. If any information cannot be found, use null for that field
5. Be factual and only extract information explicitly stated in the document
Document content:
---
{content}
---
Implementation Example
import { generateWithSchema } from '$lib/server/zen-agent/util/llm/googleAI';
// Define the schema for structured output
const extractionSchema = {
type: 'object',
properties: {
name: { type: 'string' },
description: { type: 'string' },
submissionDeadline: { type: ['string', 'null'] }
},
required: ['name', 'description', 'submissionDeadline']
};
// Make the LLM call
const extractedData = await generateWithSchema(prompt, extractionSchema, 0.3);
const result = JSON.parse(extractedData);
Rate Limiting
Following the pattern from website-import:
// 5 requests per minute per user for tender extraction
const ratelimit = createRateLimit(5, '1 m', 'tender-extraction');
const { success } = await ratelimit.limit(locals.user.id);
if (!success) {
throw error(429, 'Rate limit exceeded. Please wait before extracting another tender.');
}
Database Updates
After successful extraction, update the opportunity record:
UPDATE opportunities
SET
name = $1,
description = $2,
submission_deadline = $3,
updated_at = CURRENT_TIMESTAMP
WHERE id = $4 AND owner_id = $5
Implementation Plan
Phase 1: Backend API Development
Create extraction API endpoint (
/api/(protected)/zen-agent/opportunities/extract-tender-info/+server.ts
)- Implement rate limiting (5 requests per minute)
- Validate request body
- Implement LLM call with structured output
- Update opportunity in database
- Add comprehensive error handling and logging
Add LLM utility function (
/src/lib/server/zen-agent/util/llm/tenderExtraction.ts
)- Create dedicated function for tender extraction
- Implement retry logic for LLM failures
- Add response validation against schema
Phase 2: Frontend Integration
Modify opportunities page (
/src/routes/(app)/(public)/zen-agent/opportunities/+page.svelte
)- Add extraction API call after file upload
- Show loading state during extraction
- Handle extraction errors gracefully
- Update opportunity list with extracted data
Add loading states
- Show spinner or progress indicator during extraction
- Display extracted fields as they populate
- Allow manual editing of extracted fields
Phase 3: Database Schema Updates
Add submission_deadline column to opportunities table
sqlALTER TABLE opportunities ADD COLUMN IF NOT EXISTS submission_deadline DATE;
Note: Currently, submission_deadline is extracted from the JSON path
tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline'
. Adding it as a proper column will improve query performance and data consistency. The frontend already expects this field.Update existing data (migrate from JSON to column)
sqlUPDATE opportunities SET submission_deadline = (tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline')::DATE WHERE tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline' IS NOT NULL;
Add extraction metadata (optional)
sqlALTER TABLE opportunities ADD COLUMN IF NOT EXISTS extraction_metadata JSONB;
Phase 4: Testing & Error Handling
Test various document formats
- PDF documents
- Plain text
- Multiple file uploads
- Documents in different languages
Error scenarios
- Rate limit exceeded
- LLM extraction failures
- Invalid document content
- Network timeouts
Monitoring
- Log extraction success/failure rates
- Monitor LLM token usage
- Track average extraction times
Security Considerations
Input Validation
- Sanitize all user inputs
- Limit document size (e.g., max 10MB)
- Validate file types
Rate Limiting
- Per-user rate limits to prevent abuse
- Monitor for suspicious patterns
Data Privacy
- Don't log sensitive tender content
- Ensure proper user authorization
Future Enhancements
Multi-language Support
- Detect document language
- Extract in original language
- Translate if needed
Advanced Extraction
- Budget/value extraction
- Technical requirements
- Evaluation criteria
Confidence Scoring
- Add confidence scores to extracted fields
- Highlight low-confidence extractions for review
Batch Processing
- Support multiple tender extraction in one request
- Queue system for large batches
Success Metrics
- Accuracy: 90%+ accuracy on name and deadline extraction
- Performance: < 5 seconds average extraction time
- Adoption: 70%+ of uploaded tenders use extraction
- Error Rate: < 5% extraction failure rate
Company Profile Extraction Reference Implementation
The website import feature at /api/(protected)/zen-agent/website-import
provides a good reference for this implementation:
Key Patterns from Company Profile Extraction:
Rate Limiting Implementation:
typescriptconst ratelimit = createRateLimit(1, '30 s', 'website-import'); const { success } = await ratelimit.limit(locals.user.id);
LLM Call Pattern:
- Uses
generate()
function from googleAI.ts - Implements prompt engineering with language support
- Cleans markdown fences from LLM output
- Has fallback for LLM failures
- Uses
Error Handling:
- Comprehensive try-catch blocks
- Specific error messages for different failure modes
- Returns partial results when possible
Content Processing:
- Extracts and cleans HTML content
- Limits input text length (15000 chars)
- Validates minimum content length
Dependencies
- Google AI (Gemini) API access via existing
googleAI.ts
utilities - Upstash Redis for rate limiting via
createRateLimit
function - Existing authentication system
- Existing file upload infrastructure
Timeline Estimate
- Phase 1 (Backend): 2-3 days
- Phase 2 (Frontend): 2 days
- Phase 3 (Database): 1 day
- Phase 4 (Testing): 2 days
Total: 7-8 days of development