Skip to content

Navigation Menu

Explore
For
- Enterprise
- Teams
- Startups
- Education
By Solution
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Unstructured-IO / unstructured Public

Notifications You must be signed in to change notification settings
Fork 528
Star 7k

Code
Issues 160
Pull requests 18
Discussions
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: Unstructured-IO/unstructured

Releases · Unstructured-IO/unstructured

0.4.0

11 Jan 18:05

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.4.0

0.4.0

Added generic partition brick that detects the file type and routes a file to the appropriate
partitioning brick.
Added a file type detection module.
Updated partition_html and partition_eml to support file-like objects in 'rb' mode.
Cleaning brick for removing ordered bullets clean_ordered_bullets.
Extract brick method for ordered bullets extract_ordered_bullets.
Test for clean_ordered_bullets.
Test for extract_ordered_bullets.
Added partition_docx for pre-processing Word Documents.
Added new REGEX patterns to extract email header information
Added new functions to extract header information parse_received_data and partition_header
Added new function to parse plain text files partition_text
Added new cleaners functions extract_ip_address, extract_ip_address_name, extract_mapi_id, extract_datetimetz
Add new Image element and function to find embedded images find_embedded_images
Added get_directory_file_info for summarizing information about source documents

Assets 2

All reactions

0.3.5

05 Jan 00:50

qued

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.3.5

0.3.5

Add support for local inference
Add new pattern to recognize plain text dash bullets
Add test for bullet patterns
Fix for partition_html that allows for processing div tags that have both text and child elements
Add ability to extract document metadata from .docx, .xlsx, and .jpg files.
Helper functions for identifying and extracting phone numbers
Add new function extract_attachment_info that extracts and decode the attachment of an email.
Staging brick to convert a list of Elements to a pandas dataframe.

Assets 2

All reactions

0.3.4

21 Dec 15:29

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.3.4

0.3.4

Python-3.7 compat

Assets 2

All reactions

0.3.3

20 Dec 20:03

yuming-long

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.3.3

0.3.3

Removes BasicConfig from logger configuration
Adds the partition_email partitioning brick
Adds the replace_mime_encodings cleaning bricks
Small fix to HTML parsing related to processing list items with sub-tags

Assets 2

All reactions

0.3.2

15 Dec 22:20

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.3.2

0.3.2

Added translate_text brick for translating text between languages
Add an apply method to make it easier to apply cleaners to elements

Assets 2

All reactions

0.3.1

14 Dec 18:00

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.3.1

0.3.1

Added __init.py__ to partition

Assets 2

All reactions

0.3.0

14 Dec 16:39

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.3.0

0.3.0

Implement staging brick for Argilla. Converts lists of Text elements to argilla dataset classes.
Removing the local PDF parsing code and any dependencies and tests.
Reorganizes the staging bricks in the unstructured.partition module
Allow entities to be passed into the Datasaur staging brick
Added HTML escapes to the replace_unicode_quotes brick
Fix bad responses in partition_pdf to raise ValueError
Adds partition_html for partitioning HTML documents.

Assets 2

All reactions

0.2.4

11 Nov 00:31

yuming-long

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.2.4

Add an alternative way of importing Final to support google colab

Assets 2

All reactions

0.2.3

10 Nov 21:37

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.2.3

0.2.3

Add cleaning bricks for removing prefixes and postfixes
Add cleaning bricks for extracting text before and after a pattern

Assets 2

All reactions

0.2.2

08 Nov 22:07

MthwRobinson

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

0.2.2

0.2.2

Add staging brick for Datasaur

Assets 2

All reactions

Previous 1 2 … 10 11 12 13 14 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.