LLM-Based Tender Extraction Specification

Overview

This specification outlines the implementation of an LLM-powered feature that automatically extracts key information (name, description, and submission deadline) from uploaded tender documents when a user uploads them on the opportunities page.

Goals

Automatic Information Extraction: When a user uploads a tender document, automatically extract:
- Tender name
- Tender description
- Submission deadline
Structured Output: Use JSON schema validation to ensure consistent, structured output from the LLM
Rate Limiting: Implement rate limiting to prevent API abuse and control costs
User Experience: Show extracted information immediately in the opportunities list

Technical Architecture

Frontend Flow

User uploads tender document(s) via the existing upload button on /zen-agent/opportunities
After file upload completes, trigger LLM extraction API call
Display loading state while extraction is in progress
Update opportunity with extracted data
Show the updated opportunity in the list with extracted details

Backend API Endpoint

Create new endpoint: POST /api/(protected)/zen-agent/opportunities/extract-tender-info

Request Body:

json

{
	"content": "string", // The uploaded tender content
	"fileName": "string", // Original filename for context
	"opportunityId": "string" // The opportunity ID to update
}

Response:

json

{
	"success": true,
	"extractedData": {
		"name": "string",
		"description": "string",
		"submissionDeadline": "string" // ISO date format
	}
}

LLM Integration

Provider

Use the existing Google AI (Gemini) integration available at /src/lib/server/zen-agent/util/llm/googleAI.ts

Model: gemini-2.5-flash-lite-preview-06-17
Use generateWithSchema function for structured JSON output

Structured Output Schema

typescript

interface TenderExtractionResult {
	name: string;
	description: string;
	submissionDeadline: string | null; // ISO date or null if not found
}

Prompt Design

Analyze the following tender document and extract key information.

Instructions:
1. Extract the tender/opportunity name or title
2. Write a concise description (max 200 words) summarizing what is being procured
3. Find the submission deadline date and convert to ISO format (YYYY-MM-DD)
4. If any information cannot be found, use null for that field
5. Be factual and only extract information explicitly stated in the document

Document content:
---
{content}
---

Implementation Example

typescript

import { generateWithSchema } from '$lib/server/zen-agent/util/llm/googleAI';

// Define the schema for structured output
const extractionSchema = {
	type: 'object',
	properties: {
		name: { type: 'string' },
		description: { type: 'string' },
		submissionDeadline: { type: ['string', 'null'] }
	},
	required: ['name', 'description', 'submissionDeadline']
};

// Make the LLM call
const extractedData = await generateWithSchema(prompt, extractionSchema, 0.3);
const result = JSON.parse(extractedData);

Rate Limiting

Following the pattern from website-import:

typescript

// 5 requests per minute per user for tender extraction
const ratelimit = createRateLimit(5, '1 m', 'tender-extraction');
const { success } = await ratelimit.limit(locals.user.id);

if (!success) {
	throw error(429, 'Rate limit exceeded. Please wait before extracting another tender.');
}

Database Updates

After successful extraction, update the opportunity record:

sql

UPDATE opportunities
SET
  name = $1,
  description = $2,
  submission_deadline = $3,
  updated_at = CURRENT_TIMESTAMP
WHERE id = $4 AND owner_id = $5

Implementation Plan

Phase 1: Backend API Development

Create extraction API endpoint (/api/(protected)/zen-agent/opportunities/extract-tender-info/+server.ts)
- Implement rate limiting (5 requests per minute)
- Validate request body
- Implement LLM call with structured output
- Update opportunity in database
- Add comprehensive error handling and logging
Add LLM utility function (/src/lib/server/zen-agent/util/llm/tenderExtraction.ts)
- Create dedicated function for tender extraction
- Implement retry logic for LLM failures
- Add response validation against schema

Phase 2: Frontend Integration

Modify opportunities page (/src/routes/(app)/(public)/zen-agent/opportunities/+page.svelte)
- Add extraction API call after file upload
- Show loading state during extraction
- Handle extraction errors gracefully
- Update opportunity list with extracted data
Add loading states
- Show spinner or progress indicator during extraction
- Display extracted fields as they populate
- Allow manual editing of extracted fields

Phase 3: Database Schema Updates

Add submission_deadline column to opportunities table
sql
```
ALTER TABLE opportunities
ADD COLUMN IF NOT EXISTS submission_deadline DATE;
```
Note: Currently, submission_deadline is extracted from the JSON path tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline'. Adding it as a proper column will improve query performance and data consistency. The frontend already expects this field.

Update existing data (migrate from JSON to column)

sql

UPDATE opportunities
SET submission_deadline = (tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline')::DATE
WHERE tender_ai_summary->'ai_analysis'->'dates_and_duration'->>'submission_deadline' IS NOT NULL;

Add extraction metadata (optional)

sql

ALTER TABLE opportunities
ADD COLUMN IF NOT EXISTS extraction_metadata JSONB;

Phase 4: Testing & Error Handling

Test various document formats
- PDF documents
- Plain text
- Multiple file uploads
- Documents in different languages
Error scenarios
- Rate limit exceeded
- LLM extraction failures
- Invalid document content
- Network timeouts
Monitoring
- Log extraction success/failure rates
- Monitor LLM token usage
- Track average extraction times

Security Considerations

Input Validation
- Sanitize all user inputs
- Limit document size (e.g., max 10MB)
- Validate file types
Rate Limiting
- Per-user rate limits to prevent abuse
- Monitor for suspicious patterns
Data Privacy
- Don't log sensitive tender content
- Ensure proper user authorization

Future Enhancements

Multi-language Support
- Detect document language
- Extract in original language
- Translate if needed
Advanced Extraction
- Budget/value extraction
- Technical requirements
- Evaluation criteria
Confidence Scoring
- Add confidence scores to extracted fields
- Highlight low-confidence extractions for review
Batch Processing
- Support multiple tender extraction in one request
- Queue system for large batches

Success Metrics

Accuracy: 90%+ accuracy on name and deadline extraction
Performance: < 5 seconds average extraction time
Adoption: 70%+ of uploaded tenders use extraction
Error Rate: < 5% extraction failure rate

Company Profile Extraction Reference Implementation

The website import feature at /api/(protected)/zen-agent/website-import provides a good reference for this implementation:

Key Patterns from Company Profile Extraction:

Rate Limiting Implementation:

typescript

const ratelimit = createRateLimit(1, '30 s', 'website-import');
const { success } = await ratelimit.limit(locals.user.id);

LLM Call Pattern:
- Uses generate() function from googleAI.ts
- Implements prompt engineering with language support
- Cleans markdown fences from LLM output
- Has fallback for LLM failures
Error Handling:
- Comprehensive try-catch blocks
- Specific error messages for different failure modes
- Returns partial results when possible
Content Processing:
- Extracts and cleans HTML content
- Limits input text length (15000 chars)
- Validates minimum content length

Dependencies

Google AI (Gemini) API access via existing googleAI.ts utilities
Upstash Redis for rate limiting via createRateLimit function
Existing authentication system
Existing file upload infrastructure

Timeline Estimate

Phase 1 (Backend): 2-3 days
Phase 2 (Frontend): 2 days
Phase 3 (Database): 1 day
Phase 4 (Testing): 2 days

Total: 7-8 days of development

LLM-Based Tender Extraction Specification ​

Overview ​

Goals ​

Technical Architecture ​

Frontend Flow ​

Backend API Endpoint ​

LLM Integration ​

Provider ​

Structured Output Schema ​

Prompt Design ​

Implementation Example ​

Rate Limiting ​

Database Updates ​

Implementation Plan ​

Phase 1: Backend API Development ​

Phase 2: Frontend Integration ​

Phase 3: Database Schema Updates ​

Phase 4: Testing & Error Handling ​

Security Considerations ​

Future Enhancements ​

Success Metrics ​

Company Profile Extraction Reference Implementation ​

Key Patterns from Company Profile Extraction: ​

Dependencies ​

Timeline Estimate ​

LLM-Based Tender Extraction Specification

Overview

Goals

Technical Architecture

Frontend Flow

Backend API Endpoint

LLM Integration

Provider

Structured Output Schema

Prompt Design

Implementation Example

Rate Limiting

Database Updates

Implementation Plan

Phase 1: Backend API Development

Phase 2: Frontend Integration

Phase 3: Database Schema Updates

Phase 4: Testing & Error Handling

Security Considerations

Future Enhancements

Success Metrics

Company Profile Extraction Reference Implementation

Key Patterns from Company Profile Extraction:

Dependencies

Timeline Estimate