You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importlancedbfrom'vectordb'importexpressfrom'express'import{pipeline}from'@xenova/transformers'import{Schema,Field,FixedSizeList,Float64,Int32,Utf8}from"apache-arrow";constpipe=awaitpipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constdb=awaitlancedb.connect('data/sample-lancedb')constembed_fun={sourceColumn: 'text',embed: asyncfunction(batch){letresult=[]for(lettextofbatch){constres=awaitpipe(text,{pooling: 'mean',normalize: true})result.push(Array.from(res['data']))}return(result)}}constschema=newSchema([newField("id",newInt32()),newField("text",newUtf8()),newField("type",newUtf8()),newField("vector",newFixedSizeList(384,newField("item",newFloat64())))]);debugger;consttables=awaitdb.tableNames()lettableif(!tables.includes("food_table")){table=awaitdb.createTable({name: "food_table", schema, embed_fun })}else{table=awaitdb.openTable('food_table',embed_fun)}constapp=express()app.use(express.json())app.get('/',async(req,res)=>{constresults=awaittable.search("a sweet fruit to eat").metricType("cosine").limit(2).execute()res.json(results)})app.post('/',async(req,res)=>{awaittable.add(req.body)res.send('OK')})constport=3000app.listen(port,()=>{console.log(`Listening port on ${port}`)})
This appears to be due to the fact that the user is specifying the "vector" column which is also the output of the embed function. The application logic is unable to handle this scenario.
i guess it depends on how we want to handle this. Do we want to treat this as a user error, or do we want to add logic to check for columns matching the embedding functions?
LanceDB version
v0.4.19
What happened?
When adding data, we get the error:
This comes from the line:
https://github.com/apache/arrow/blob/6a28035c2b49b432dc63f5ee7524d76b4ed2d762/js/src/table.ts#L136-L137
The one difference is that in
schema
,vector
is not nullable, while inbatch.schema
,vector
is nullable.Are there known steps to reproduce?
Here is the user provided repro: https://paste.mozilla.org/udbe1bNs
Original message: https://discord.com/channels/1030247538198061086/1197630540271067258/1237552085525074000
Copy of repro
TypeError: Table and inner RecordBatch schemas must be equivalent.
The text was updated successfully, but these errors were encountered: