Skip to content

Releases: pathwaycom/pathway

v0.12.0

10 Jun 06:06
Compare
Choose a tag to compare

Added

  • pw.PyObjectWrapper that enables passing python objects of any type to the engine.
  • cache_strategy option added for pw.io.http.rest_connector. It enables cache configuration, which is useful for duplicated requests.
  • allow_misses argument to Table.ix and Table.ix_ref methods which allows for filling rows with missing keys with None values.
  • pw.io.deltalake.write output connector that streams the changes of a given table into a DeltaLake storage.
  • pw.io.airbyte.read now supports data extraction with Google Cloud Runs.

Removed

  • BREAKING: Removed Table.having method.
  • BREAKING: Removed pw.DATE_TIME_UTC, pw.DATE_TIME_NAIVE and pw.DURATION as dtype markers. Instead, pw.DateTimeUtc, pw.DateTimeNaive and pw.Duration should be used, which are wrappers for corresponding pandas types.
  • BREAKING: Removed class transformers from public API: pw.ClassArg, pw.attribute, pw.input_attribute, pw.input_method, pw.method, pw.output_attribute and pw.transformer.
  • BREAKING: Removed several methods from pw.indexing module: binsearch_oracle, filter_cmp_helper, filter_smallest_k and prefix_sum_oracle.

v0.11.2

27 May 08:33
Compare
Choose a tag to compare

Added

  • pathway.assert_table_has_schema and pathway.table_transformer now accept allow_subtype argument, which, if True, allows column types in the Table be subtypes of types in the Schema.
  • next method to pw.io.python.ConnectorSubject (python connector) that enables passing values of any type to the engine, not only values that are json-serializable. The next method should be the preferred way of passing values from the python connector.

Changed

  • The format argument of pw.io.python.read is deprecated. A data format is inferred from the method used (next_json, next_str, next_bytes) and the provided schema.

Removed

  • Removed pw.numba_apply and numba dependency.

Fixed

  • Fixed pw.this desugaring bug, where __getitem__ in .ix context was not working properly.
  • pw.io.sqlite.read now checks if the data matches the passed schema.

v0.11.1

16 May 19:30
Compare
Choose a tag to compare

Added

  • query and query_as_of_now of pathway.stdlib.indexing.data_index.DataIndex now accept in metadata_column parameter a column with data of type str | None.
  • pathway.xpacks.connectors.sharepoint module under Pathway for Business License.

v0.11.0

10 May 14:56
Compare
Choose a tag to compare

Added

  • Embedders in the LLM xpack now have method get_embedding_dimension that returns number of dimension used by the chosen embedder.
  • pathway.stdlib.indexing.nearest_neighbors, with implementations of pathway.stdlib.indexing.data_index.InnerIndex based on k-NN via LSH (implemented in Pathway), and k-NN provided by USearch library.
  • pathway.stdlib.indexing.vector_document_index, with a few predefined instances of pathway.stdlib.indexing.data_index.DataIndex.
  • pathway.stdlib.indexing.bm25, with implementations of pathway.stdlib.indexing.data_index.InnerIndex based on BM25 index provided by Tantivy.
  • pathway.stdlib.indexing.full_text_document_index, with a predefined instance of pathway.stdlib.indexing.data_index.DataIndex.
  • Introduced the reranker module under llm.xpacks. Includes few re-ranking strategies and utility functions for RAG applications.

Changed

  • BREAKING: windowby generates IDs of produced rows differently than in the previous version.
  • BREAKING: pw.io.csv.write prints printable non-ascii characters as regular text, not \u{xxxx}.
  • BREAKING: Connector methods pw.io.elasticsearch.read, pw.io.debezium.read, pw.io.fs.read, pw.io.jsonlines.read, pw.io.kafka.read, pw.io.python.read, pw.io.redpanda.read, pw.io.s3.read now check the type of the input data. Previously it was not checked if the provided format was "json"/"jsonlines". If the data is inconsistent with the provided schema, the row is skipped and the error message is emitted.
  • BREAKING: query and query_as_of_now methods of pathway.stdlib.indexing.data_index.DataIndex now return pathway.JoinResult, to allow resolving column name conflicts (between columns in the table with queries and table with index data).
  • BREAKING: DataIndex methods query and query_as_of_now now return score in a column named _pw_index_reply_score (defined as _SCORE variable in pathway.stdlib.indexing.colnames.py).

Removed

  • BREAKING: pathway.stdlib.indexing.data_index.VectorDocumentIndex class, some predefined instances are now meant to be obtained via methods provided in pathway.stdlib.indexing.vector_document_index.
  • BREAKING: with_distances parameter of query and query_as_of_now methods in pathway.stdlib.indexing.data_index.DataIndex. Instead of 'distance', we now operate with a more general term 'score' (higher = better). For distance based indices score is usually defined as negative distance. Score is now always included in the answer, as long as underlying index returns something that indicates quality of a match.

v0.10.1

30 Apr 12:25
Compare
Choose a tag to compare

Added

  • query method to VectorStoreServer to enable compatible API with DataIndex.
  • AdaptiveRAGQuestionAnswerer to xpacks.question_answering. End-to-end pipeline and accompanying code for Private RAG showcase.

v0.10.0

24 Apr 22:21
Compare
Choose a tag to compare

Added

  • Pathway now warns when unintentionally creating Table with empty universe.
  • pw.io.kafka.write in raw and plaintext formats now supports output for tables with multiple columns. For such tables, it requires the specification of the column that must be used as a value of the produced Kafka messages and gives a possibility to provide column which must be used as a key.
  • pw.io.kafka.write can now output values from the table using Kafka message headers in 'raw' and 'plaintext' output format.

Changed

  • instance arguments to groupby, join, with_id_from now determine how entries are distributed between machines.
  • flatten results remain on the same machine as their source entries.
  • join sends each record between machines at most once.
  • BREAKING: flatten, join, groupby (if used with instance), with_id_from (if used with instance) generate IDs of the produced rows differently than in the previous versions.
  • pathway spawn with multiple workers prints only output from the first worker.

v0.9.0

18 Apr 21:01
Compare
Choose a tag to compare

Added

  • pw.reducers.latest and pw.reducers.earliest that return the value with respectively maximal and minimal processing time assigned.
  • pw.io.kafka.write can now produce messages containing raw bytes in case the table consists of a single binary column and raw mode is specified. Similarly, this method will provide plaintext messages if plaintext mode is chosen and the table consists of a single string-typed column.
  • pw.io.pubsub.write connector for publishing Pathway tables into Google PubSub.
  • Argument strict_prompt to answer_with_geometric_rag_strategy and answer_with_geometric_rag_strategy_from_index that allows optimizing prompts for smaller open-source LLM models.
  • Temporarily switch LiteLLMChat's generation method to sync version due to a bug while using json mode with Ollama.

Changed

  • BREAKING: pw.io.kafka.read will not parse the messages from UTF-8 in case raw mode was specified. To preserve this behavior you can use the plaintext mode.
  • BREAKING: Table.flatten now flattens one column and spreads every other column of the table, instead of taking other columns from the argument list.

v0.8.6

10 Apr 20:16
Compare
Choose a tag to compare

Added

  • pw.io.bigquery.write connector for writing Pathway tables into Google BigQuery.
  • parameter filepath_globpattern to query method in VectorStoreClient for specifying which files should be considered in the query.
  • Improved compatibility of pw.Json with standard methods such as len(), int(), float(), bool(), iter(), reversed() when feasible.

Changed

  • pw.io.postgres.write can now parallelize writes to several threads if several workers are configured.
  • Pathway now checks types of pointers rigorously. Indexing table with mismatched number/types of columns vs what was used to create index will now result in a TypeError.
  • pw.Json.as_float() method now supports integer JSON values.

v0.8.5

27 Mar 22:03
Compare
Choose a tag to compare

Added

  • New function answer_with_geometric_rag_strategy_from_index, which allows to use answer_with_geometric_rag_strategy without the need to first retrieve documents from index.
  • Added support for custom state serialization to udf_reducer.
  • Introduced instance parameter in AsyncTransformer. All calls with a given (instance, processing_time) pair are returned at the same processing time. Ordering is preserved within a single instance.
  • Added successful, failed, finished properties to AsyncTransformer. They return tables with successful calls, failed calls and all finished calls, respectively.

Changed

  • Property result of AsyncTransformer is deprecated. Property successful should be used instead.
  • pw.io.csv.read, pw.io.jsonlines.read, pw.io.fs.read, pw.io.plaintext.read now handle path as a glob pattern and read all matched files and directories recursively.

v0.8.4

18 Mar 17:52
Compare
Choose a tag to compare

Fixed

  • Pathway will only require LiteLLM package, if you use one of the wrappers for LiteLLM.
  • Retries are implemented in pw.io.airbyte.read.
  • State processing protocol is updated in pw.io.airbyte.read.