Skip to content

Unstructured-IO/unstructured-js-client

Repository files navigation

Typescript SDK for the Unstructured API

This is a Typescript client for the Unstructured API.

This is ahead of the currently published version (v0.10.6). Please refer here for usage.

SDK Installation

NPM

npm install unstructured-client

Yarn

yarn add unstructured-client

SDK Example Usage

Only the files parameter is required. See the general partition page for all available parameters.

import { UnstructuredClient } from "unstructured-client";
import { PartitionResponse } from "unstructured-client/dist/sdk/models/operations";
import * as fs from "fs";

const key = "YOUR-API-KEY";

const client = new UnstructuredClient({
    security: {
        apiKeyAuth: key,
    },
});

const filename = "sample-docs/layout-parser-paper.pdf";
const data = fs.readFileSync(filename);

client.general.partition({
    partitionParameters: {
        // Note that this currently only supports a single file
        files: {
            content: data,
            fileName: filename,
        },
        // Other partition params
        strategy: "fast",
    },
}).then((res: PartitionResponse) => {
    if (res.statusCode == 200) {
        console.log(res.elements);
    }
}).catch((e) => {
    console.log(e.statusCode);
    console.log(e.body);
});

Change the base URL

If you are self hosting the API, or developing locally, you can change the server URL when setting up the client.

const client = new UnstructuredClient({
    serverURL: "http://localhost:8000",
    security: {
        apiKeyAuth: key,
    },
});

// OR

const client = new UnstructuredClient({
    serverURL: "https://my-server-url",
    security: {
        apiKeyAuth: key,
    },
});

Custom HTTP Client

The TypeScript SDK makes API calls using an HTTPClient that wraps the native Fetch API. This client is a thin wrapper around fetch and provides the ability to attach hooks around the request lifecycle that can be used to modify the request or handle errors and response.

The HTTPClient constructor takes an optional fetcher argument that can be used to integrate a third-party HTTP client or when writing tests to mock out the HTTP client and feed in fixtures.

The following example shows how to use the "beforeRequest" hook to to add a custom header and a timeout to requests and how to use the "requestError" hook to log errors:

import { UnstructuredClient } from "unstructured-client";
import { HTTPClient } from "unstructured-client/lib/http";

const httpClient = new HTTPClient({
  // fetcher takes a function that has the same signature as native `fetch`.
  fetcher: (request) => {
    return fetch(request);
  }
});

httpClient.addHook("beforeRequest", (request) => {
  const nextRequest = new Request(request, {
    signal: request.signal || AbortSignal.timeout(5000)
  });

  nextRequest.headers.set("x-custom-header", "custom value");

  return nextRequest;
});

httpClient.addHook("requestError", (error, request) => {
  console.group("Request Error");
  console.log("Reason:", `${error}`);
  console.log("Endpoint:", `${request.method} ${request.url}`);
  console.groupEnd();
});

const sdk = new UnstructuredClient({ httpClient });

PartitionParameters

See the general partition page for all available parameters.

Splitting PDF by pages

In order to speed up processing of long PDF files, set splitPdfPage parameter to true. It will cause the PDF to be split into smaller batches at client side, before sending to API, and combining individual responses as single result. This will work only for PDF files, so don't set it for other types of files. Size of each batch is determined internally and it can vary between 2 and 20 pages per split.

The amount of parallel requests is controlled by splitPdfConcurrencyLevel parameter. By default it equals to 5. It can't be more than 15, to avoid too high resource usage and costs.

import { SplitPdfHook } from "unstructured-client/hooks/custom/SplitPdfHook";

...

client.general.partition({
    partitionParameters: {
        files: {
            content: data,
            fileName: filename,
        },
        // Set splitPdfPage parameter to false in order to disable splitting PDF
        splitPdfPage: true,
        // Modify splitPdfConcurrencyLevel to change the limit of parallel requests
        splitPdfConcurrencyLevel: 10,
    },
}).then((res: PartitionResponse) => {
    if (res.statusCode == 200) {
        console.log(res.elements);
    }
}).catch((e) => {
    console.log(e.statusCode);
    console.log(e.body);
});

Requirements

For supported JavaScript runtimes, please consult RUNTIMES.md.

File uploads

Certain SDK methods accept files as part of a multi-part request. It is possible and typically recommended to upload files as a stream rather than reading the entire contents into memory. This avoids excessive memory consumption and potentially crashing with out-of-memory errors when working with very large files. The following example demonstrates how to attach a file stream to a request.

Tip

Depending on your JavaScript runtime, there are convenient utilities that return a handle to a file without reading the entire contents into memory:

  • Node.js v20+: Since v20, Node.js comes with a native openAsBlob function in node:fs.
  • Bun: The native Bun.file function produces a file handle that can be used for streaming file uploads.
  • Browsers: All supported browsers return an instance to a File when reading the value from an <input type="file"> element.
  • Node.js v18: A file stream can be created using the fileFrom helper from fetch-blob/from.js.
import { openAsBlob } from "node:fs";
import { UnstructuredClient } from "unstructured-client";
import { Strategy } from "unstructured-client/sdk/models/shared";

const unstructuredClient = new UnstructuredClient({
    security: {
        apiKeyAuth: "YOUR_API_KEY",
    },
});

async function run() {
    const result = await unstructuredClient.general.partition({
        partitionParameters: {
            files: await openAsBlob("./sample-file"),
            strategy: Strategy.Auto,
        },
    });

    // Handle the result
    console.log(result);
}

run();

Maturity

This SDK is in beta, and there may be breaking changes between versions without a major version update. Therefore, we recommend pinning usage to a specific package version. This way, you can install the same version each time without breaking changes unless you are intentionally looking for the latest version.

Contributions

While we value open-source contributions to this SDK, this library is generated programmatically. Feel free to open a PR or a Github issue as a proof of concept and we'll do our best to include it in a future release!

SDK Created by Speakeasy