Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF conversion is slow when CJK font is used #2120

Closed
martijnbrinkers opened this issue Apr 9, 2024 · 5 comments · Fixed by #2178
Closed

PDF conversion is slow when CJK font is used #2120

martijnbrinkers opened this issue Apr 9, 2024 · 5 comments · Fixed by #2178
Labels
performance Too slow renderings
Milestone

Comments

@martijnbrinkers
Copy link

If a CJK font is used, PDF conversion is slow.

Steps to reproduce (on AlmaLinux 9 but other Linux systems with a similar same font can be used as well)

Install font

$ sudo dnf install google-noto-sans-cjk-ttc-fonts

install weasyprint

$ python3 -m venv weasyprint
$ cd weasyprint
$ source ./bin/activate
$ pip install weasyprint

$ weasyprint --version
WeasyPrint version 61.2

create html with Chinese characters

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
&#27231;&#27083;&#20661;&#21048;&#30332;&#34892;&#35336;&#21123;&#19979;&#25919;&#24220;&#20661;&#21048;&#30340;&#20729;&#26684;&#21450;&#25910;&#30410;&#29575;
</body>
</html>

convert html to pdf with font installed

$ time weasyprint test.html test.pdf

real    0m2.920s
user    0m2.757s
sys     0m0.157s

now remove font

$ sudo dnf remove google-noto-sans-cjk-ttc-fonts

convert html to pdf without font installed

$ time weasyprint test.html test.pdf

real    0m0.480s
user    0m0.391s
sys     0m0.093s

For a simple html it takes about 6 times longer to convert to a PDF if a CJK font is used.

Is there a way to improve this?

@liZe
Copy link
Member

liZe commented Apr 9, 2024

Hi!

This extra time is used to remove the unused characters from the font. As Noto CJK is huge (~20MB on my computer) and includes a lot of characters, it can take a lot of time.

Is there a way to improve this?

Most of the time is spent in fonttools, and its subsetter has already been reported to be "slow" in fonttools/fonttools#2147. As proposed in that issue, we could try to use hb-subset instead.

@liZe liZe added the performance Too slow renderings label Apr 9, 2024
@martijnbrinkers
Copy link
Author

If I switch to an older release (52.2) conversion is much faster. 0.8 sec (version 52.2) vs 2.9 sec (version 61.2).

@liZe
Copy link
Member

liZe commented Apr 9, 2024

If I switch to an older release (52.2) conversion is much faster. 0.8 sec (version 52.2) vs 2.9 sec (version 61.2).

That’s because font subsetting was done by Cairo, that we don’t use anymore. We’d probably get equivalent performance by using hb-subset.

liZe added a commit that referenced this issue Apr 30, 2024
liZe added a commit that referenced this issue Apr 30, 2024
liZe added a commit that referenced this issue Apr 30, 2024
@liZe
Copy link
Member

liZe commented Apr 30, 2024

We’ve open a highly experimental branch that uses Harfbuzz to subset fonts, and that you can try: hb-subset. If your version of harfbuzz is recent enough (4.1 I think) and you have the harfbuzz-subset library installed (it’s sometimes provided by the harfbuzz package, but it’s sometimes in a separate package, depending on the distribution), it should be much faster.

@liZe
Copy link
Member

liZe commented Jun 7, 2024

@martijnbrinkers I’ve open a draft pull request, if you want to test it.

As the code is quite small and includes a fallback using the old way, we may consider merging this pull request. We still have to check where harfbuzz-subset is installed (probably Windows, maybe macOS, only some Linux distributions) and update the related documentation.

liZe added a commit that referenced this issue Jun 8, 2024
liZe added a commit that referenced this issue Jun 8, 2024
liZe added a commit that referenced this issue Jun 8, 2024
@liZe liZe closed this as completed in #2178 Jun 8, 2024
@liZe liZe added this to the 63.0 milestone Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants