Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Reta Vortaro offline xdxf file using code interpreter #1

Open
stefangrotz opened this issue Nov 17, 2023 · 3 comments
Open

Search Reta Vortaro offline xdxf file using code interpreter #1

stefangrotz opened this issue Nov 17, 2023 · 3 comments

Comments

@stefangrotz
Copy link
Member

stefangrotz commented Nov 17, 2023

You can search through the revo.xdxf file using code interpreter:

chatGPT-revo

Used prompt

I used this prompt with a minified version of an example python code and some information about the data structure:

# reta vortaro - revo.xdxf
When asked about the reta vortaro, revo or generally about complex multi-lingual dictionary questions, you can search revo.xdxf using python. The Esperanto words can be found in the <ar> elements, examples in <ex> and translations to other languages in <dtrn>. For non-Esperanto word searches always search in dtrn and return the corresponding Esperanto word and example if not specified differently. Here is an example of the data structure of translations: <dtrn> /de/ Beispiel, Muster, Vorbild</dtrn> There can be additional elements inside of the elements described above. Write robust code that can handle messy XML.

Here is an example of such a search for an Esperanto word:

import xml.etree.ElementTree as D
def A(file_path,word):
   C='def';E=D.parse(file_path);F=E.getroot()
   for A in F.findall('.//ar'):
   	B=A.find('k')
   	if B is not None and B.text.strip()==word:G=A.find(C).text if A.find(C)is not None else'No definition found';H=[A.text for A in A.findall('.//def/ex')]or['No examples found'];I=[A.text for A in A.findall('dtrn')]or['No translations found'];return{'word':word,'definition':G,'examples':H,'translations':I}
B='revo.xdxf'
C=A(B,'krokodili')
print(C)

Conclusion

IMO right now it is too slow to include it into EsperantoGPT. It takes almost one minute to look up a word. I tried to make it use minfied code to speed things up, but this hasn't worked until now.

@stefangrotz stefangrotz changed the title Reta Vortaro offline file using code interpreter Search Reta Vortaro offline xdxf file using code interpreter Nov 17, 2023
@stefangrotz
Copy link
Member Author

Also discussed in the revo repo: revuloj/revo-fonto#61

@stefangrotz
Copy link
Member Author

I managed to get a result after 20 seconds, still pretty slow.

Todo:

  • Improve the prompt to make it more reliable, especially for non-esperanto word searches
  • experiment with different file formats, might be quicker and more reliable (csv?)

@stefangrotz
Copy link
Member Author

stefangrotz commented Nov 28, 2023

The sql file also works (45 seconds). This is my prompt for it:

Reta Vortaro - revo-inx.db

Use this file to search for Esperanto words and their translations.

  1. Database Structure:

    • nodo: Contains main entries (Esperanto words). Key columns are mrk (unique marker) and kap (the word).
    • traduko: Holds translations. Key columns are mrk (linking to nodo) and lng (language code of the translation).
    • var: Stores variations of words. Key columns are mrk and var (variation).
    • Other tables like referenco, uzo, malong, bildo, artikolo, vortspeco, agordo may contain additional information but may not always have relevant data.
  2. Finding Words:

    • To locate an Esperanto word, query the nodo table using the kap column.
    • Example SQL: SELECT mrk FROM nodo WHERE kap = 'desired_word';
  3. Getting Translations:

    • Once you have the mrk from nodo, use it to find translations in the traduko table.
    • Example SQL: SELECT txt FROM traduko WHERE mrk = 'obtained_mrk' AND lng = 'language_code';
  4. Word Variations:

    • To find variations of a word, use the mrk in the var table.
    • Example SQL: SELECT var FROM var WHERE mrk = 'obtained_mrk';
  5. Additional Information:

    • For more details like usage or references, use the mrk to query tables like uzo or referenco.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant