New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New model addition: MarkupLM #692
Comments
Hi there! 👋 This does sound pretty interesting! I would imagine the built-in document parser should be sufficient. I'd be happy to review if you (or another community member) would like to open a PR! |
It's not entirely obvious to me what this model does from the hugging face docs, but if its able to make an xpath -> goal / feature then we can support local agentic solutions, or writing out instructions, running them would be a different story For example: benefit: Thats just one app idea, here's another: browser LLM powered site cloner: then pass those divs as css-selectors or xpath-selector? to select a div to clone / translate to nextjs components using gpt4 or some hugging face model that excels in coding front end things benefit: here's the app ideas i have that may leverage this (unsure tbh what the model does out of the box):
curious what others think |
@xenova - sounds good! I'll try to take a crack at it, and if any community members would like to help or offer their advice, that'd be appreciated. @jonathanpv - my main focus with the model has been to fine-tune it for cybersecurity related tasks. Here's a first draft of a fine-tuned model I trained: pogzyb/markuplm-phish. From my experience, a fine-tuned MarkupLM performed better than a fine-tuned BERT on phish/malicious website classification. The final goal of my project is to create a browser extension with the added benefit that the user's data stays local to their machine like you pointed out. I think the html to selenium code generation is good one! |
oh wow nice!
yep i wonder if thats all thats needed for an agent
oh wow reader mode would be a great feature thats a good idea |
Model description
The MarkupLM is BERT, but applied to HTML pages instead of raw text documents. Seems like there could be a lot of interesting uses for this type of model in the browser.
Prerequisites
Additional information
I think the most difficult part of the implementation will deal with markuplm's preprocessing. Specifically, markuplm uses a combination of a "feature extractor" and a "tokenizer". the "feature extractor" extracts nodes and xpaths from HTML strings. These nodes and xpaths are then fed to the "tokenizer" to produce xpath tag and subscript sequences. The Python implementation uses
BeautifulSoup
, so the JavaScript implementation might need a 3rd party HTML parsing library ifDOMParser
doesn't cut it.In short, there are 2 additional xpath inputs to the model needed: 'input_ids', 'token_type_ids', 'attention_mask', 'xpath_tags_seq', 'xpath_subs_seq'
Your contribution
I added huggingface/optimum#1784 in optimum, but I'm not much of a JavaScript developer. I'd be happy to try either implementing the preprocessing or the pipeline, but I would need some guidance/regular reviews.
The text was updated successfully, but these errors were encountered: