User Chinese-Llama-2-7b.ggmlv3.q4_1.bin Model UnicodeDecodeError: 'utf-8' #12

xyz-dev · 2023-07-26T09:23:31Z

On Macbook CPU inference，run in Docker，user Chinese Error
models GGML:
https://huggingface.co/LinkSoul/Chinese-Llama-2-7b-ggml/blob/main/Chinese-Llama-2-7b.ggmlv3.q4_1.bin

llama.cpp: loading model from /app/model/llama-2-7b-chat.ggmlv3.q4_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 4000
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 1.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 4537.35 MB (+ 2000.00 MB per state)
llama_new_context_with_model: kv self size = 2000.00 MB
AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Running on CPU with llama.cpp.
Caching examples at: '/app/gradio_cached_examples/19'
Caching example 1/5
Caching example 2/5
Caching example 3/5
Caching example 4/5
Caching example 5/5
Caching complete

Running on local URL: http://0.0.0.0:7860

Could not create share link. Missing file: /usr/local/lib/python3.9/site-packages/gradio/frpc_linux_amd64_v0.2.

Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps:

Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
Rename the downloaded file to: frpc_linux_amd64_v0.2
Move the file to this location: /usr/local/lib/python3.9/site-packages/gradio
Keyboard interruption in main thread... closing server.
llama.cpp: loading model from /app/model/llama-2-7b-chat.ggmlv3.q4_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 4000
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 1.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 4561.77 MB (+ 2000.00 MB per state)
llama_new_context_with_model: kv self size = 2000.00 MB
AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Llama.generate: prefix-match hit
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/gradio/routes.py", line 439, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1389, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1108, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 347, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 340, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 323, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 692, in gen_wrapper
yield from f(*args, **kwargs)
File "/app/app.py", line 115, in generate
for response in generator:
File "/app/model.py", line 105, in run
text = str(b_text, encoding="utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 0: unexpected end of data
Llama.generate: prefix-match hit
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/gradio/routes.py", line 439, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1389, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1108, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 347, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 340, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 323, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 692, in gen_wrapper
yield from f(*args, **kwargs)
File "/app/app.py", line 115, in generate
for response in generator:
File "/app/model.py", line 105, in run
text = str(b_text, encoding="utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 0: unexpected end of data
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/gradio/routes.py", line 439, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1389, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1108, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 347, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 340, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 323, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.9/site-packages/gradio/utils.py", line 692, in gen_wrapper
yield from f(*args, **kwargs)
File "/app/app.py", line 115, in generate
for response in generator:
File "/app/model.py", line 105, in run
text = str(b_text, encoding="utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: unexpected end of data

liltom-eth · 2023-07-27T01:10:27Z

Have you tried Chinese-Llama-2-7b GPU version?

xyz-dev · 2023-07-27T06:05:37Z

Have you tried Chinese-Llama-2-7b GPU version?

I am a beginner.I don't have GPU resources, so I want to run it on my MacBook using the CPU.

liltom-eth · 2023-07-28T09:56:07Z

@xyz-dev check this for more details.
might need some change on text = str(b_text, encoding="utf-8") to support some chinese characters.

touchtop · 2023-08-14T11:11:30Z

I have tested some Chinese Llama2 models in ggml type , and found that in streaming mode, the above issue does seem to exist, however all thing is ok when I turned off the stream mode in the procedure of generating answer. I don't know why, and hope that it's a useful clue. :)

btw. my machine is win10+wsl2+ubuntu 22.04+ nvidia rtx 2080 8GB.

liltom-eth mentioned this issue Jul 27, 2023

[FEATURE] Chinese Llama2 #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Chinese-Llama-2-7b.ggmlv3.q4_1.bin Model UnicodeDecodeError: 'utf-8' #12

User Chinese-Llama-2-7b.ggmlv3.q4_1.bin Model UnicodeDecodeError: 'utf-8' #12

xyz-dev commented Jul 26, 2023

liltom-eth commented Jul 27, 2023

xyz-dev commented Jul 27, 2023

liltom-eth commented Jul 28, 2023 •

edited

touchtop commented Aug 14, 2023

User Chinese-Llama-2-7b.ggmlv3.q4_1.bin Model UnicodeDecodeError: 'utf-8' #12

User Chinese-Llama-2-7b.ggmlv3.q4_1.bin Model UnicodeDecodeError: 'utf-8' #12

Comments

xyz-dev commented Jul 26, 2023

liltom-eth commented Jul 27, 2023

xyz-dev commented Jul 27, 2023

liltom-eth commented Jul 28, 2023 • edited

touchtop commented Aug 14, 2023

liltom-eth commented Jul 28, 2023 •

edited