sniff_csv how to parse the Columns value safely #12071
-
Hi, import duckdb
sample_filename = "path/to/my/file.csv"
conn = duckdb.connect(":memory:", config={"threads": 1})
query = conn.execute(
"SELECT Delimiter, Quote, Escape, HasHeader, Columns FROM sniff_csv(?)",
[sample_filename],
)
query_result = query.fetchone()
(delimiter, quote, escape, has_header, encoded_columns) = query_result
col_query = conn.execute(f"SELECT {encoded_columns}")
decoded_columns = col_query.fetchone()[0].keys() but that is quite SQL-injection-adjacent and it fails for some specific CSVs that have apostrophe in some of their header names (see cloud_services_adoption.csv for one such example). I was able to sidestep the problem by changing the original code to col_query = conn.execute(f"DESCRIBE SELECT * FROM '{sample_filename}' LIMIT 0")
decoded_columns = [item[0] for item in col_query.fetchall()] which works even for the files with apostrophes. But it runs the sniffer a second time which is not ideal, I would still like to be able to read this information directly from the original sniff_csv results. Is there any better way of getting the column names please? Or is it a bug that the output of sniff_csv does not have the apostrophes escaped? EDIT: raised as #12089 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Would it make sense for Then one could potentially use duckdb's json functions to parse the output. |
Beta Was this translation helpful? Give feedback.
-
I think that the "correct" fix here is that the |
Beta Was this translation helpful? Give feedback.
I think that the "correct" fix here is that the
columns
type should be a list of structs, and not a varchar