API Reference
Complete documentation for the Struxio REST API.
Struxio API Reference
Struxio provides a powerful RESTful API to manage documents, templates, extractions, and batch jobs. This document details all available endpoints for the open-source release.
Authentication
All API endpoints (except /v1/health) require authentication using a static API key.
Include the API key in the Authorization header of every request:
Authorization: Bearer <YOUR_STRUXIO_API_KEY>
The API key is defined by the
STRUXIO_API_KEYenvironment variable in your.envfile.
1. Documents
Documents are the files (PDFs, Images) that you wish to extract data from. Struxio uses a highly scalable direct-to-S3 upload pattern.
Check or Request Upload URL
Endpoint: POST /v1/documents/check
Checks if a document with a given MD5 hash already exists to save bandwidth. If not, it returns a secure pre-signed S3 URL for you to upload the file directly.
Request Body:
{
"md5_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
"file_name": "invoice-123.pdf",
"file_type": "application/pdf",
"size_bytes": 1048576
}
Response (Document does not exist):
{
"exists": false,
"upload_url": "https://s3.example.com/bucket/...",
"s3_key": "raw/.../invoice-123.pdf"
}
Note: If exists is true, the document is returned directly and you can skip the upload step.
Direct S3 Upload (Client-Side)
Perform an HTTP PUT request directly to the upload_url provided in the previous step.
curl -X PUT -T invoice-123.pdf "https://s3.example.com/bucket/..."
Confirm Upload
Endpoint: POST /v1/documents/confirm
Call this endpoint after a successful S3 upload to register the document in the Struxio database.
Request Body:
{
"s3_key": "raw/.../invoice-123.pdf",
"md5_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
"file_name": "invoice-123.pdf",
"file_type": "application/pdf",
"size_bytes": 1048576
}
List Documents
Endpoint: GET /v1/documents
Returns a paginated list of documents.
Get Document
Endpoint: GET /v1/documents/{id}
Returns metadata for a specific document.
Delete Document
Endpoint: DELETE /v1/documents/{id}
Deletes a document from the database and storage.
2. Templates
Templates define the schema and prompt instructions used to extract structured data from your documents.
Create a Template
Endpoint: POST /v1/templates
Request Body:
{
"name": "Invoice Schema",
"json_schema": {
"type": "object",
"properties": {
"invoice_number": { "type": "string" },
"total_amount": { "type": "number" },
"date": { "type": "string", "format": "date" }
},
"required": ["invoice_number", "total_amount"]
},
"prompt_template": "Extract the invoice number, total amount, and date from this invoice. If a field is missing, omit it."
}
List Templates
Endpoint: GET /v1/templates
Provides a list of all your created templates.
3. Extractions
Extractions represent the actual AI processing of a document using a specific template.
Start an Extraction (Asynchronous)
Endpoint: POST /v1/extractions
Starts an extraction job for a previously uploaded document.
Request Body:
{
"document_id": "uuid-of-document",
"template_id": "uuid-of-template"
}
Response: Returns the Extraction ID which you can use to poll for status.
Start an Inline Extraction (Synchronous)
Endpoint: POST /v1/extractions/inline
For smaller files, you can skip the S3 upload flow entirely. Send the file as a base64 encoded string, and the API will process and return the extracted data in a single synchronous request.
Request Body:
{
"template_id": "uuid-of-template",
"file_name": "receipt.jpg",
"file_type": "image/jpeg",
"file_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
"md5_hash": "optional-hash"
}
Poll Extraction Status
Endpoint: GET /v1/extractions/{id}
Check the status of an asynchronous extraction. Statuses include Pending, Processing, Completed, or Failed. When Completed, the response payload will include the extracted JSON data.
4. Batch Jobs
Batch jobs allow you to process hundreds or thousands of documents at once against a single template.
Submit a Batch Job
Endpoint: POST /v1/batches
Request Body:
{
"name": "Monthly Invoices Q3",
"template_id": "uuid-of-template",
"document_ids": [
"uuid-doc-1",
"uuid-doc-2",
"uuid-doc-3"
]
}
Poll Batch Status
Endpoint: GET /v1/batches/{id}
Returns the overall status of the batch job (e.g., how many extractions have succeeded, failed, or are still pending).
5. Models
List Available AI Models
Endpoint: GET /v1/models
Returns a list of AI models configured in the core database (e.g., gemini-1.5-pro, gemini-1.5-flash). Note that templates can optionally specify a target model; otherwise, the system defaults to the top-tier configured model.