Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Remove batching regex from FilePathDataConnector #9898

Merged
merged 100 commits into from
May 16, 2024
Merged
Changes from 1 commit
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
8f7754a
wip
joshua-stauffer May 7, 2024
8b2b948
Merge branch 'develop' into m/v1-290/remove_batching_regex_from_asset…
joshua-stauffer May 7, 2024
46547ce
remove batching regex from FilePathDataConnector
joshua-stauffer May 7, 2024
6cf78be
docstrings
joshua-stauffer May 7, 2024
b256a90
fix unit tests
joshua-stauffer May 7, 2024
845f17c
remove regex from assets
joshua-stauffer May 7, 2024
981fae7
fix some tests
joshua-stauffer May 7, 2024
4840009
schema sync
cdkini May 8, 2024
53846b4
patch test_config.py
cdkini May 8, 2024
65f159c
clean up more tests
cdkini May 8, 2024
afc5609
patch filesystem data connector tests
cdkini May 8, 2024
cc3a83f
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 8, 2024
0a6b5f3
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 8, 2024
0336144
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 9, 2024
50ba788
update dbfs connector
cdkini May 9, 2024
2fb1e03
more cleanup
cdkini May 9, 2024
e41fb92
update partitioner name
cdkini May 9, 2024
be24e0d
get pandas filesystem passing
cdkini May 9, 2024
99e495f
more work
cdkini May 9, 2024
d33dd65
more progress
cdkini May 9, 2024
0e366bd
misc updates
cdkini May 9, 2024
729ea7c
schema sync
cdkini May 9, 2024
2338e77
unit test patch
cdkini May 9, 2024
f93eaa9
spark tests
cdkini May 9, 2024
7364ec9
mypy
cdkini May 9, 2024
a99fb2c
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 10, 2024
698f94a
mypy
cdkini May 10, 2024
75800ba
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 10, 2024
1bb978d
mypy
cdkini May 10, 2024
22d903e
mypy
cdkini May 10, 2024
c1730f7
another one
cdkini May 10, 2024
ddecfb8
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 13, 2024
59c1828
fix test_filesystem_asset_workflows.py
joshua-stauffer May 13, 2024
69db012
s3 connector tests
cdkini May 13, 2024
d882f89
Merge branch 'm/v1-290/remove_batching_regex_from_asset_sig' of https…
cdkini May 13, 2024
51dd490
spark s3 datasource tests
cdkini May 13, 2024
bf97780
add another fork to FilePathDataConnector
joshua-stauffer May 13, 2024
183a9cd
use batch def
joshua-stauffer May 13, 2024
84294dd
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 14, 2024
4680832
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 14, 2024
23f704e
patch integration conftest
cdkini May 14, 2024
666301a
dbfs tests
cdkini May 14, 2024
195c7de
spark filesystem datasource tests
cdkini May 14, 2024
7648dac
more spark filesystem datasource tests
cdkini May 14, 2024
a199710
mypy
cdkini May 14, 2024
749855e
mypy
cdkini May 14, 2024
c6eda54
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 14, 2024
e53bdea
fix spark tests using regex in asset init
joshua-stauffer May 14, 2024
e69263d
delete test connection test
cdkini May 14, 2024
b77f498
patch pandas s3 datasource tests
cdkini May 14, 2024
5494e27
Merge branch 'm/v1-290/remove_batching_regex_from_asset_sig' of https…
cdkini May 14, 2024
a3d56b6
add _preprocess_batching_regex overrides
cdkini May 14, 2024
d1b6394
gcs
cdkini May 14, 2024
fcf6877
misc
cdkini May 14, 2024
a13d048
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 14, 2024
44421b2
patch test
cdkini May 14, 2024
80861e2
use alternative syntax
cdkini May 14, 2024
ee16dc2
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 14, 2024
b3f338b
directory data connector returns fully qualified path
joshua-stauffer May 15, 2024
722cadc
patch more
cdkini May 15, 2024
497613a
docs tests
cdkini May 15, 2024
78044e6
update fixture, remove test_connectio ntest
joshua-stauffer May 15, 2024
e1e4322
update glossary_batch_request
cdkini May 15, 2024
7c84cf6
Merge branch 'm/v1-290/remove_batching_regex_from_asset_sig' of https…
cdkini May 15, 2024
4f9f3ab
update doc snippet
joshua-stauffer May 15, 2024
32c4acd
add name to batch def
joshua-stauffer May 15, 2024
b859e22
update script to pass while waiting for datasource docs updates
joshua-stauffer May 15, 2024
97d57d8
update batch_parameter_keys to not duplicate
joshua-stauffer May 15, 2024
3e16ecd
ensure tests are using canonical regexes
joshua-stauffer May 15, 2024
0e2e76a
more tests which should be failing
joshua-stauffer May 15, 2024
07ce4b4
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 15, 2024
3ba54af
Merge branch 'm/v1-290/remove_batching_regex_from_asset_sig' of https…
cdkini May 15, 2024
eef02c2
update pyi
joshua-stauffer May 15, 2024
aeed333
fix tests
joshua-stauffer May 15, 2024
e28da70
update pyi files
cdkini May 15, 2024
91dec17
start cleaning up mypy errors
cdkini May 15, 2024
ef1d2d2
more mypy
cdkini May 15, 2024
3e74ceb
non-functional test cleanup
joshua-stauffer May 15, 2024
5438034
fix patch path
joshua-stauffer May 15, 2024
b0bfd54
more mypy
cdkini May 15, 2024
3d8bfae
patch fluent conftest
cdkini May 15, 2024
6f83cfa
gcs test
cdkini May 15, 2024
ee31783
cleanup azure tests
joshua-stauffer May 15, 2024
ebca39f
patch test_get_batch_list_from_fully_specified_batch_request
cdkini May 15, 2024
53fff98
fix how_to_connect_to_data_on_s3_using_pandas.py
joshua-stauffer May 15, 2024
a0006b5
start on docs errors
cdkini May 15, 2024
0aabd8d
Merge branch 'm/v1-290/remove_batching_regex_from_asset_sig' of https…
cdkini May 15, 2024
3ef4691
test moving preprocess step
cdkini May 15, 2024
b9cbaea
preprocess another instance
cdkini May 15, 2024
0a29b23
add comments
cdkini May 15, 2024
636e137
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini May 16, 2024
601293b
try to only preprocess when regex is provided through partioner
cdkini May 16, 2024
fd4ab11
add back regex preprocessing
joshua-stauffer May 16, 2024
835fabd
bugfix
joshua-stauffer May 16, 2024
44a647f
fix docs tests by removing unnecessary prefix
joshua-stauffer May 16, 2024
f5d7c41
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2024
9b50e3d
cleanup remaining tests
joshua-stauffer May 16, 2024
3f5c8ab
remove comments
joshua-stauffer May 16, 2024
5d26129
cleanup unnecessary fork
joshua-stauffer May 16, 2024
b9e6f81
add back fork for legacy tests ;(
joshua-stauffer May 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,8 @@ def _get_unfiltered_batch_definition_list(
batch_definition_set = set()
# if the batch request hasn't specified a batching_regex, fallback to a default
if batch_request.partitioner:
batching_regex = self._preprocess_batching_regex(batch_request.partitioner.regex)
# --- HEY JOSH! ---: Is this where we want to preprocess?
batching_regex = batch_request.partitioner.regex
joshua-stauffer marked this conversation as resolved.
Show resolved Hide resolved
else:
# todo: remove
batching_regex = MATCH_ALL_PATTERN
Expand Down Expand Up @@ -299,7 +300,9 @@ def _get_batch_spec_params_file(self, batch_definition: LegacyBatchDefinition) -
"""File specific implementation of batch spec parameters"""
if not batch_definition.batching_regex:
raise RuntimeError("BatchDefinition must contain a batching_regex.") # noqa: TRY003
batching_regex = self._preprocess_batching_regex(batch_definition.batching_regex)

# --- HEY JOSH! ---: Is this where we want to preprocess?
batching_regex = batch_definition.batching_regex

regex_parser = RegExParser(
regex_pattern=batching_regex,
Expand Down Expand Up @@ -330,6 +333,7 @@ def _get_batch_spec_params_directory(self, batch_definition: LegacyBatchDefiniti

def _preprocess_batching_regex(self, regex: re.Pattern) -> re.Pattern:
"""Add the FILE_PATH_BATCH_SPEC_KEY group to regex if not already present."""
# --- HEY JOSH! ---: Is this where we want to preprocess?
regex_parser = RegExParser(
regex_pattern=regex,
unnamed_regex_group_prefix=self._unnamed_regex_group_prefix,
Expand All @@ -346,7 +350,7 @@ def _get_data_references_cache(
self, batching_regex: re.Pattern
) -> Dict[str, List[LegacyBatchDefinition] | None]:
"""Access a map where keys are data references and values are LegacyBatchDefinitions."""
batching_regex = self._preprocess_batching_regex(regex=batching_regex)
# --- HEY JOSH! ---: Is this where we want to preprocess?
batch_definitions = self._data_references_cache[batching_regex]
if batch_definitions:
return batch_definitions
Expand Down Expand Up @@ -396,6 +400,7 @@ def _build_batch_definition(
def _build_batch_identifiers(
self, data_reference: str, batching_regex: re.Pattern
) -> Optional[IDDict]:
# --- HEY JOSH ---: Is this where we want to preprocess?
regex_parser = RegExParser(
regex_pattern=batching_regex,
unnamed_regex_group_prefix=self._unnamed_regex_group_prefix,
Expand Down