Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3 custom regex split #6965

Merged
merged 88 commits into from
May 9, 2024
Merged

Commits on Apr 26, 2024

  1. Configuration menu
    Copy the full SHA
    6fbab2d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d2cfc22 View commit details
    Browse the repository at this point in the history
  3. Moved header files

    dragnil1 authored and ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    54f93eb View commit details
    Browse the repository at this point in the history
  4. Resolved issues

    dragnil1 authored and ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    1c924e4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    4056dc5 View commit details
    Browse the repository at this point in the history
  6. Updated/merged the deepseek coder pr

    jaggzh authored and ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    c8e7d95 View commit details
    Browse the repository at this point in the history
  7. Refactored code

    dragnil1 authored and ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    4c3e882 View commit details
    Browse the repository at this point in the history
  8. Adding unicode regex mappings

    dragnil1 authored and ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    a5710a4 View commit details
    Browse the repository at this point in the history
  9. Adding unicode regex function

    dragnil1 authored and ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    7e308ed View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    feeaf4f View commit details
    Browse the repository at this point in the history
  11. Fixed issues

    dragnil1 authored and ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    7535803 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    36d9832 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    06d3e69 View commit details
    Browse the repository at this point in the history
  14. lint : fix whitespaces

    ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    c56e19d View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    7a44e44 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    d999cf6 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    aeafb43 View commit details
    Browse the repository at this point in the history
  18. tests : add sample usage

    ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    e1b2bf7 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    ed42711 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    4907e41 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    e8c206b View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    e989176 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    e3f6dc7 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    9b4d63a View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    43e12ce View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    1b9b79d View commit details
    Browse the repository at this point in the history
  27. lint : fix

    ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    8791e94 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    a774d70 View commit details
    Browse the repository at this point in the history
  29. wip

    ggerganov committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    c160818 View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2024

  1. Configuration menu
    Copy the full SHA
    96965f6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ad92983 View commit details
    Browse the repository at this point in the history
  3. minor

    ggerganov committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    4434c9d View commit details
    Browse the repository at this point in the history
  4. unicode : set bomb

    ggerganov committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    a22645c View commit details
    Browse the repository at this point in the history
  5. unicode : set bomb

    ggerganov committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    2affd0b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ce5485a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    91eaa41 View commit details
    Browse the repository at this point in the history
  8. unicode : try fix windows

    ggerganov committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    581c4a0 View commit details
    Browse the repository at this point in the history

Commits on Apr 28, 2024

  1. Configuration menu
    Copy the full SHA
    b97add5 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' into gg/bpe-preprocess

    ggml-ci
    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    d63cc90 View commit details
    Browse the repository at this point in the history
  3. unicode : clean-up

    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    e972e6c View commit details
    Browse the repository at this point in the history
  4. unicode : simplify

    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    ee6d1b3 View commit details
    Browse the repository at this point in the history
  5. llama3 custom regex split

    jaime-m-p committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    e11fe2f View commit details
    Browse the repository at this point in the history
  6. convert : add convert-hf-to-gguf-update.py

    ggml-ci
    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    7642973 View commit details
    Browse the repository at this point in the history
  7. lint : update

    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    4e3e6d8 View commit details
    Browse the repository at this point in the history
  8. convert : add falcon

    ggml-ci
    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    1c888eb View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    1545550 View commit details
    Browse the repository at this point in the history
  10. lint : fix

    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    491f233 View commit details
    Browse the repository at this point in the history
  11. lint : fix

    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    e8dd4a1 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    02fd977 View commit details
    Browse the repository at this point in the history
  13. convert : add comments

    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    0f9058c View commit details
    Browse the repository at this point in the history
  14. convert : exercise contractions

    ggml-ci
    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    7808150 View commit details
    Browse the repository at this point in the history
  15. Using char32_t for codepoints

    jaime-m-p committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    5cc4b2c View commit details
    Browse the repository at this point in the history
  16. lint : fix

    ggerganov committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    7b1210f View commit details
    Browse the repository at this point in the history
  17. already exists unicode_tolower()

    jaime-m-p committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    6e4d2af View commit details
    Browse the repository at this point in the history
  18. Typing

    jaime-m-p committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    2a48873 View commit details
    Browse the repository at this point in the history
  19. Restore BOM

    jaime-m-p committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    0cf9ed3 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2024

  1. Configuration menu
    Copy the full SHA
    ef4cca9 View commit details
    Browse the repository at this point in the history
  2. tests : refactor vocab tests

    ggml-ci
    ggerganov committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    43708d2 View commit details
    Browse the repository at this point in the history
  3. tests : add more vocabs and tests

    ggml-ci
    ggerganov committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    c68d259 View commit details
    Browse the repository at this point in the history
  4. unicode : cleanup

    ggerganov committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    af05268 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c21ab18 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    866e394 View commit details
    Browse the repository at this point in the history
  7. Fix merge

    jaime-m-p committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    a0c870d View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    120cf37 View commit details
    Browse the repository at this point in the history
  9. tests : disable obsolete

    ggml-ci
    ggerganov committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    9a7d430 View commit details
    Browse the repository at this point in the history
  10. tests : use faster bpe test

    ggml-ci
    ggerganov committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    6d6ce93 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    3202676 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    80cb312 View commit details
    Browse the repository at this point in the history
  13. Merge remote-tracking branch 'upstream/gg/bpe-preprocess' into gg/bpe…

    …-preprocess
    jaime-m-p committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    b66cdd1 View commit details
    Browse the repository at this point in the history
  14. Move unused variable value

    jaime-m-p committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    5c38f6e View commit details
    Browse the repository at this point in the history
  15. GPT2 custom regex split

    jaime-m-p committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    1d8fcc0 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2024

  1. Add alternative regex for custom aplit llama3

    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
    jaime-m-p and ggerganov committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    2cd1eb0 View commit details
    Browse the repository at this point in the history
  2. Style

    jaime-m-p committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    0c6d820 View commit details
    Browse the repository at this point in the history

Commits on May 3, 2024

  1. Add bruteforce random tests for token encoding

    jaime-m-p committed May 3, 2024
    Configuration menu
    Copy the full SHA
    3e3e283 View commit details
    Browse the repository at this point in the history
  2. wip: fixing unicode codepoint ranges

    jaime-m-p committed May 3, 2024
    Configuration menu
    Copy the full SHA
    4d441e4 View commit details
    Browse the repository at this point in the history

Commits on May 4, 2024

  1. Configuration menu
    Copy the full SHA
    798b576 View commit details
    Browse the repository at this point in the history
  2. Fix merge

    jaime-m-p committed May 4, 2024
    Configuration menu
    Copy the full SHA
    69a49ac View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8fd849e View commit details
    Browse the repository at this point in the history
  4. llama3 custom regex split: fix \s

    jaime-m-p committed May 4, 2024
    Configuration menu
    Copy the full SHA
    67832e5 View commit details
    Browse the repository at this point in the history
  5. Restore BOM

    jaime-m-p committed May 4, 2024
    Configuration menu
    Copy the full SHA
    edf375d View commit details
    Browse the repository at this point in the history

Commits on May 7, 2024

  1. Style

    jaime-m-p committed May 7, 2024
    Configuration menu
    Copy the full SHA
    a5fa2fe View commit details
    Browse the repository at this point in the history
  2. wip: generate NDF table

    jaime-m-p committed May 7, 2024
    Configuration menu
    Copy the full SHA
    def3d13 View commit details
    Browse the repository at this point in the history
  3. Ignore special tokens for testing

    jaime-m-p committed May 7, 2024
    Configuration menu
    Copy the full SHA
    7761f8e View commit details
    Browse the repository at this point in the history

Commits on May 8, 2024

  1. Clean gen-unicode-data.py

    jaime-m-p committed May 8, 2024
    Configuration menu
    Copy the full SHA
    70ca1fe View commit details
    Browse the repository at this point in the history
  2. Refactor random tokenizer test

    jaime-m-p committed May 8, 2024
    Configuration menu
    Copy the full SHA
    77cbb79 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ea47119 View commit details
    Browse the repository at this point in the history

Commits on May 9, 2024

  1. lint : fix

    ggerganov committed May 9, 2024
    Configuration menu
    Copy the full SHA
    8de8b6d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    12a7b69 View commit details
    Browse the repository at this point in the history