Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM: Optimize cohere model #10878

Merged
merged 9 commits into from May 7, 2024
Merged

Conversation

hzjane
Copy link
Contributor

@hzjane hzjane commented Apr 25, 2024

Description

Optimize c4ai-command-r-v01 (35B)

  • RMSNorm
  • MLP
  • normal kv_cache
  • fused qkv
  • fused rope (error output)
  • esimd_sdp
  • flash_attention
  • quantize kv cache

Performance https://github.com/analytics-zoo/nano/issues/1322#issuecomment-2078491160

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • N/A
  • Unit test
  • Application test
  • Document test
  • ...

5. New dependencies

  • New Python dependencies
    - Dependency1
    - Dependency2
    - ...
  • New Java/Scala dependencies and their license
    - Dependency1 and license1
    - Dependency2 and license2
    - ...

@hzjane hzjane marked this pull request as ready for review May 6, 2024 02:08
@hzjane hzjane requested a review from glorysdj May 6, 2024 02:15
Copy link
Contributor

@glorysdj glorysdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hzjane hzjane merged commit 191b184 into intel-analytics:main May 7, 2024
16 of 18 checks passed
@jason-dai
Copy link
Contributor

Need to add an example, and update "verified model" table.

@hzjane
Copy link
Contributor Author

hzjane commented May 7, 2024

Need to add an example, and update "verified model" table.

OK, i will add it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants