Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user and a developer, I want to display Japanese characters in charts #760

Open
kyokonishito opened this issue Apr 3, 2019 · 9 comments

Comments

@kyokonishito
Copy link

As a user and a developer, I want to display Japanese characters in charts

Expected behavior

  • We can see all Japanese characters in labels and legends of all charts correctly.

Actual behavior

  • All Japanese characters in labels and legends of all charts are displayed with squares.

Steps to reproduce the behavior

import pixiedust
import pandas as pd
df = pd.DataFrame([["四月", 26],["May", 10],["June", 5]], columns=['key', 'value'])
display(df)

@vabarbosa
Copy link
Member

@kyokonishito can you confirm this is specifically when using the matplotlib renderer. if you switch to bokeh do you properly see the characters?

@vabarbosa
Copy link
Member

@DTAIEB i believe the issue is with the default font used by matplotlib does not support Japanese, Chinese, etc characters:

we could:

  1. start adding/including fonts packages.
  2. implement an option like: display(df, font_name='someFontName') and just pass the font_name to matplotlib to use
  3. make no changes and just recommend users choose a different renderer when dealing with international characters

my vote would be for 2 but this would leave it up to the user to make sure they have the preferred font already installed in their system and know the name so it can be passed in display.

@kyokonishito
Copy link
Author

@vabarbosa I confirmed bokeh can be displayed Japanese. However the pie chart does not have bokeh option.
I would like to have your option 2. We can get OSS fonts can be displayed Japanese. When I display wordcloud in Japanese on Watson Studio, I run following code. I set the font path to wordcloud:

import os
if not os.path.exists('/home/dsxuser/work/ipaexg00301/ipaexg.ttf'):
    !wget https://oscdl.ipa.go.jp/IPAexfont/ipaexg00301.zip
    !unzip ipaexg00301.zip
else:
    print('IPA font already installed')

from wordcloud import WordCloud

wordcloud = WordCloud(background_color="white", 
font_path="/home/dsxuser/work/ipaexg00301/ipaexg.ttf").generate(text)

@vabarbosa
Copy link
Member

vabarbosa commented Apr 4, 2019

@kyokonishito as i think about this more i am wondering if it is best to keep with pixiedust convention and put the font_path field in the Chart Options dialog? since it is only specific to matplotlib we can show the field in options dialog when using matplotlib and hide it when not using matplotlib.

with either case, i think we should probably also store the font_path value in the pixiedust user preference db, so once you set it you do not need to keep setting it for every chart.

i want to wait for feedback from @DTAIEB before committing any changes. however, in the meantime you can use try using my working branch which has a temporary fix to keep you from being blocked:

  1. install pixiedust from my working branch:
!pip install --upgrade --no-deps git+https://github.com/pixiedust/pixiedust.git@va-working-branch#egg=pixiedust
  1. restart kernel
  2. set the font_path parameter for display to the full path to the font file
display(df, font_path='/Users/va/blue/tools/jp-font/ipaexg00301/ipaexg.ttf')

and you should see characters displayed properly:

image

@kyokonishito
Copy link
Author

@vabarbosa Thank you! I tried your working branch. It works fine.
However there is one strange issue. I had an error if I set the fon_tpath with a String variable.

import pixiedust
import pandas as pd

df = pd.DataFrame([["四月", 26],["May", 10],["June", 5]],  columns=['key', 'value'])
#display(df, font_path='/home/dsxuser/work/ipaexg00301/ipaexg.ttf') #this works fine !

jp_font_path='/home/dsxuser/work/ipaexg00301/ipaexg.ttf'
display(df, font_path=jp_font_path) # this has [Errno 2] No such file or directory: 'jp_font_path'

pixiedust2019-04-05_11-02-38

@DTAIEB
Copy link
Member

DTAIEB commented Apr 5, 2019

@DTAIEB i believe the issue is with the default font used by matplotlib does not support Japanese, Chinese, etc characters:

we could:

  1. start adding/including fonts packages.
  2. implement an option like: display(df, font_name='someFontName') and just pass the font_name to matplotlib to use
  3. make no changes and just recommend users choose a different renderer when dealing with international characters

my vote would be for 2 but this would leave it up to the user to make sure they have the preferred font already installed in their system and know the name so it can be passed in display.

@vabarbosa Yes I agree with #2. To make it easier, perhaps we could also have a drop down contextual option to let users pick from some of the most common fonts.

@kyokonishito
Copy link
Author

kyokonishito commented Apr 14, 2019

@vabarbosa
I will introduce PixieDust in my session on a Japan IBM Cloud Community event on Apr 26. May I introduce your temporary fix for Japanese attendees in my session if the module is not fixed by Apr 26?

(the error about a String variable is still occurred)

@vabarbosa
Copy link
Member

@kyokonishito i will keep posted, but yes feel free to use the temporary fix in my branch in the meantime.

@kyokonishito
Copy link
Author

@vabarbosa Thanks! I will introduce your my working branch as a temporary fix in my session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants