Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing HTML Headings inserts text, not heading and export to html wrong elements #1692

Open
constip opened this issue Aug 13, 2019 · 5 comments · May be fixed by #2533
Open

Importing HTML Headings inserts text, not heading and export to html wrong elements #1692

constip opened this issue Aug 13, 2019 · 5 comments · May be fixed by #2533

Comments

@constip
Copy link

constip commented Aug 13, 2019

When importing from HTML via \PhpOffice\PhpWord\Shared\Html::addHtml, all heading-elements are handled by parseHeading as normal text with the extra styles paragraph as Heading1,2,.... This should be handled by the usual addTitle.

The above seems to work fine when exporting to Word2007, however when exporting to html, addTitle produced a h1,h2,...-element, while the former produces <p class="Heading1,2,...">, which is a problem, because the styles for the headings don't apply.

Also, when exporting to html, a title (depth 0) is written as h0, while the styles are applied to the element Title.

Finally, when exporting to html, all texts get style="margin-top: 0; margin-bottom: 0;". This should not be there. They also don't get the Normal-class, meaning paragraph styles don't get applied.

This is a minor thing, but please also consider instead of * {font-family: Arial; font-size: 10pt;} only applying this to body. This way, the font is initialized and all other elements inherit it instead of re-setting it every time. This causes problems when for example using a span inside a h1 (the font-size is wrong).

@constip
Copy link
Author

constip commented Aug 13, 2019

On an unrelated note, I would suggest using $doc->loadHTML( ... ) instead of loadXML in PhpOffice\PhpWord\Shared\Html addHtml. HTML is more often than not not valid XHTML which only causes problems. Adding libxml_use_internal_errors(true); would complete that.

0b10011 added a commit to 0b10011/PHPWord that referenced this issue Aug 30, 2019
@0b10011
Copy link
Contributor

0b10011 commented Aug 30, 2019

I have a fix for this (see previous references), but waiting on #1669 to be merged so I can get it in. (May be a bit; it's a large PR and merges seem to be delayed by a couple weeks.)

0b10011 added a commit to 0b10011/PHPWord that referenced this issue Sep 3, 2019
@BenaddiRar
Copy link

any update ??

the error always occurs for me

@jhedstrom
Copy link

This is still an issue with headings, and the fixes in the commits from @0b10011 seem to work. Any chance to get those in?

@JoppeDC
Copy link

JoppeDC commented Dec 21, 2023

Cant believe this is still an issue, 4 years later

oleibman added a commit to oleibman/PHPWord that referenced this issue Dec 24, 2023
Fix PHPOffice#1692. Builds on work started some time ago by @0b10011, to whom primary credit is due.

Html Reader does not process the `head` section of the document, and, in particular, does not process its `style` section. It will, however, process inline styles, so 0b10011's model of adding the title as a text run (with styles) will work well once this change is applied. However, that model would not deal with the alternative method of assigning a Title Style, and just adding the title as text. In order to accommodate that, I have removed the declaration of heading font styles in the head section, and now generate them all inline in the body. This has the added benefit of being able to read the doc as html, then saving it as docx, preserving, at least in part, any user-defined font styles. Note that html does have pre-defined title styles, but docx does not.

@constip suggests in the original issue that margin top and bottom are being applied too frequently. I believe that was addressed by recently merged PR PHPOffice#2475. It is also suggested that the `*` css selector be dropped in favor of `body`. 2475 added the body selector. I agree that this renders the `*` selector unnecessary, and, as stated in the issue, it can cause problems. This PR drops that selector. It is also suggested that `loadHTML` be used instead of `loadXML`. This is not as easy a change as it seems, because loadHTML uses ISO-8859-1 charset rather than UTF-8, so I will not attempt that change.
@oleibman oleibman linked a pull request Dec 25, 2023 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants