PartitionParameters

Fields

Field	Type	Required	Description	Example
`files`	File \| Blob \| shared.Files	✔️	The file to extract
`chunkingStrategy`	shared.ChunkingStrategy	➖	Use one of the supported strategies to chunk the returned elements. Currently supports: 'basic', 'by_page', 'by_similarity', or 'by_title'
`combineUnderNChars`	number	➖	If chunking strategy is set, combine elements until a section reaches a length of n chars. Default: 500
`coordinates`	boolean	➖	If true, return coordinates for each element. Default: false
`encoding`	string	➖	The encoding method used to decode the text input. Default: utf-8
`extractImageBlockTypes`	string[]	➖	The types of elements to extract, for use in extracting image blocks as base64 encoded data stored in metadata fields
`gzUncompressedContentType`	string	➖	If file is gzipped, use this content type after unzipping
`hiResModelName`	string	➖	The name of the inference model used when strategy is hi_res
`includeOrigElements`	boolean	➖	When a chunking strategy is specified, each returned chunk will include the elements consolidated to form that chunk as `.metadata.orig_elements`. Default: true.
`includePageBreaks`	boolean	➖	If True, the output will include page breaks if the filetype supports it. Default: false
`languages`	string[]	➖	The languages present in the document, for use in partitioning and/or OCR
`maxCharacters`	number	➖	If chunking strategy is set, cut off new sections after reaching a length of n chars (hard max). Default: 500
`multipageSections`	boolean	➖	If chunking strategy is set, determines if sections can span multiple sections. Default: true
`newAfterNChars`	number	➖	If chunking strategy is set, cut off new sections after reaching a length of n chars (soft max). Default: 1500
`ocrLanguages`	string[]	➖	The languages present in the document, for use in partitioning and/or OCR
`outputFormat`	shared.OutputFormat	➖	The format of the response. Supported formats are application/json and text/csv. Default: application/json.
`overlap`	number	➖	Specifies the length of a string ('tail') to be drawn from each chunk and prefixed to the next chunk as a context-preserving mechanism. By default, this only applies to split-chunks where an oversized element is divided into multiple chunks by text-splitting. Default: 0
`overlapAll`	boolean	➖	When `True`, apply overlap between 'normal' chunks formed from whole elements and not subject to text-splitting. Use this with caution as it entails a certain level of 'pollution' of otherwise clean semantic chunk boundaries. Default: False
`pdfInferTableStructure`	boolean	➖	Deprecated! Use skip_infer_table_types to opt out of table extraction for any file type. If False and strategy=hi_res, no Table Elements will be extracted from pdf files regardless of skip_infer_table_types contents.
`similarityThreshold`	number	➖	A value between 0.0 and 1.0 describing the minimum similarity two elements must have to be included in the same chunk. Note that similar elements may be separated to meet chunk-size criteria; this value can only guarantees that two elements with similarity below the threshold will appear in separate chunks.
`skipInferTableTypes`	string[]	➖	The document types that you want to skip table extraction with. Default: []
`splitPdfConcurrencyLevel`	number	➖	Number of maximum concurrent requests made when splitting PDF. Ignored on backend.
`splitPdfPage`	boolean	➖	Should the pdf file be split at client. Ignored on backend.
`startingPageNumber`	number	➖	When PDF is split into pages before sending it into the API, providing this information will allow the page number to be assigned correctly. Introduced in 1.0.27.
`strategy`	shared.Strategy	➖	The strategy to use for partitioning PDF/image. Options are fast, hi_res, auto. Default: auto	auto
`uniqueElementIds`	boolean	➖	When `True`, assign UUIDs to element IDs, which guarantees their uniqueness (useful when using them as primary keys in database). Otherwise a SHA-256 of element text is used. Default: False
`xmlKeepTags`	boolean	➖	If True, will retain the XML tags in the output. Otherwise it will simply extract the text from within the tags. Only applies to partition_xml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partitionparameters.md

partitionparameters.md

PartitionParameters

Fields

Files

partitionparameters.md

Latest commit

History

partitionparameters.md

File metadata and controls

PartitionParameters

Fields