Struxio LogoStruxio Docs
Menu hidden on mobile

API Reference

Complete documentation for the Struxio REST API.

Struxio API Reference

Struxio provides a powerful RESTful API to manage documents, templates, extractions, and batch jobs. This document details all available endpoints for the open-source release.

Authentication

All API endpoints (except /v1/health) require authentication using a static API key.

Include the API key in the Authorization header of every request:

Authorization: Bearer <YOUR_STRUXIO_API_KEY>

The API key is defined by the STRUXIO_API_KEY environment variable in your .env file.


1. Documents

Documents are the files (PDFs, Images) that you wish to extract data from. Struxio uses a highly scalable direct-to-S3 upload pattern.

Check or Request Upload URL

Endpoint: POST /v1/documents/check

Checks if a document with a given MD5 hash already exists to save bandwidth. If not, it returns a secure pre-signed S3 URL for you to upload the file directly.

Request Body:

{
  "md5_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
  "file_name": "invoice-123.pdf",
  "file_type": "application/pdf",
  "size_bytes": 1048576
}

Response (Document does not exist):

{
  "exists": false,
  "upload_url": "https://s3.example.com/bucket/...",
  "s3_key": "raw/.../invoice-123.pdf"
}

Note: If exists is true, the document is returned directly and you can skip the upload step.

Direct S3 Upload (Client-Side)

Perform an HTTP PUT request directly to the upload_url provided in the previous step.

curl -X PUT -T invoice-123.pdf "https://s3.example.com/bucket/..."

Confirm Upload

Endpoint: POST /v1/documents/confirm

Call this endpoint after a successful S3 upload to register the document in the Struxio database.

Request Body:

{
  "s3_key": "raw/.../invoice-123.pdf",
  "md5_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
  "file_name": "invoice-123.pdf",
  "file_type": "application/pdf",
  "size_bytes": 1048576
}

List Documents

Endpoint: GET /v1/documents

Returns a paginated list of documents.

Get Document

Endpoint: GET /v1/documents/{id}

Returns metadata for a specific document.

Delete Document

Endpoint: DELETE /v1/documents/{id}

Deletes a document from the database and storage.


2. Templates

Templates define the schema and prompt instructions used to extract structured data from your documents.

Create a Template

Endpoint: POST /v1/templates

Request Body:

{
  "name": "Invoice Schema",
  "json_schema": {
    "type": "object",
    "properties": {
      "invoice_number": { "type": "string" },
      "total_amount": { "type": "number" },
      "date": { "type": "string", "format": "date" }
    },
    "required": ["invoice_number", "total_amount"]
  },
  "prompt_template": "Extract the invoice number, total amount, and date from this invoice. If a field is missing, omit it."
}

List Templates

Endpoint: GET /v1/templates

Provides a list of all your created templates.


3. Extractions

Extractions represent the actual AI processing of a document using a specific template.

Start an Extraction (Asynchronous)

Endpoint: POST /v1/extractions

Starts an extraction job for a previously uploaded document.

Request Body:

{
  "document_id": "uuid-of-document",
  "template_id": "uuid-of-template"
}

Response: Returns the Extraction ID which you can use to poll for status.

Start an Inline Extraction (Synchronous)

Endpoint: POST /v1/extractions/inline

For smaller files, you can skip the S3 upload flow entirely. Send the file as a base64 encoded string, and the API will process and return the extracted data in a single synchronous request.

Request Body:

{
  "template_id": "uuid-of-template",
  "file_name": "receipt.jpg",
  "file_type": "image/jpeg",
  "file_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
  "md5_hash": "optional-hash"
}

Poll Extraction Status

Endpoint: GET /v1/extractions/{id}

Check the status of an asynchronous extraction. Statuses include Pending, Processing, Completed, or Failed. When Completed, the response payload will include the extracted JSON data.


4. Batch Jobs

Batch jobs allow you to process hundreds or thousands of documents at once against a single template.

Submit a Batch Job

Endpoint: POST /v1/batches

Request Body:

{
  "name": "Monthly Invoices Q3",
  "template_id": "uuid-of-template",
  "document_ids": [
    "uuid-doc-1",
    "uuid-doc-2",
    "uuid-doc-3"
  ]
}

Poll Batch Status

Endpoint: GET /v1/batches/{id}

Returns the overall status of the batch job (e.g., how many extractions have succeeded, failed, or are still pending).


5. Models

List Available AI Models

Endpoint: GET /v1/models

Returns a list of AI models configured in the core database (e.g., gemini-1.5-pro, gemini-1.5-flash). Note that templates can optionally specify a target model; otherwise, the system defaults to the top-tier configured model.