-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41558: [C++] Improve fixed_width_test_util.h #41575
Conversation
|
@felipecrv Hello. Is this what you were thinking? I would appreciate it if you could review it when you have time. Thank you. |
#include <cstddef> | ||
#include <cstdint> | ||
#include <memory> | ||
#include <vector> | ||
|
||
#include "arrow/array/builder_primitive.h" | ||
#include "arrow/builder.h" | ||
#include "arrow/type.h" | ||
#include "arrow/util/checked_cast.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant reducing the code in the header. We should keep the necessary includes.
The class should only declare the public function members here and all the private members and template code can be defined in the .cc
file (to be created) outside of the class.
See how chunk_resolver.h/cc
does it reducing the bloat in the header:
arrow/cpp/src/arrow/chunk_resolver.cc
Lines 44 to 45 in 51689a0
template <typename T> | |
inline std::vector<int64_t> MakeChunksOffsets(const std::vector<T>& chunks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so I need to organize the code according to C++ conventions. I should include only the declarations in the .h file and put the implementations in the .cc file.
I will refer to the code you shared, organize it accordingly, and then upload it.
Thank you for your guidance.
|
|
1 similar comment
|
@@ -120,84 +170,6 @@ class NestedListGenerator { | |||
} | |||
return nested_builder->Finish(); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have the definition of NestedListArray
in the .cc
as well.
75ed690
to
d4961bc
Compare
@felipecrv Hello, and thank you for your review. I have applied the changes as advised, but I am a bit unsure about one aspect, particularly because I am not very familiar with template functions. Should the Additionally, I made modifications to the I would greatly appreciate your advice on these matters. Thank you! |
Sorry, I missed that it was called from others. We can make it a non-template by using - /// \tparam Visit a function type with signature
- /// void(const std::shared_ptr<DataType>& inner_type,
- /// const std::vector<int>& list_sizes)
- template <class Visit>
static void VisitAllNestedListConfigurations(
- const std::vector<std::shared_ptr<DataType>>& inner_value_types, Visit&& visit,
+ const std::vector<std::shared_ptr<DataType>>& inner_value_types,
+ const std::function<void(const std::shared_ptr<DataType>& inner_type,
+ const std::vector<int>& list_sizes)> &visit,
int max_depth = 3, int max_power_of_2_size = 32) {
We avoid relative paths in CMakeLists.txt because that gets very confusing very quickly. You should list this new arrow/cpp/src/arrow/CMakeLists.txt Lines 637 to 643 in 2ca9ad2
The trick to find where to list new |
I think I applied everything you suggested:
Additionally, I want to express my gratitude for your patience and thorough review. It has been instrumental in completing this PR. Thank you! cc @felipecrv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I want to express my gratitude for your patience and thorough review. It has been instrumental in completing this PR.
Thank you for not giving up ;) C++ is weird.
I added comments about includes, but other than that everything is looking very good.
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#include "fixed_width_test_util.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter is complaining about this line because includes should be fully qualified based on the cpp/src/
root, so this should be "arrow/testing/fixed_width_test_util.h"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the removal of include bloat in the header, you should also add all the includes for things referred from this file:
"arrow/array/builder_primitive.h"
forNumericBuilder
andArrowBuilder
"arrow/array/builder_base.h"
forMakeBuilder
[1]<cstdint>
<limits>
<vector>
<functional>
"arrow/type.h"
[1] my header had an include to arrow/builder.h
, but that was unnecessary now that I thought more about it
#include <vector> | ||
|
||
#include "arrow/builder.h" | ||
#include "arrow/testing/visibility.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to "include what you use" [1] as much as possible. Even when headers that we need are already included by other headers.
<memory>
<functional>
<cstdint>
"arrow/type.h"
forDataType
(complete type is needed for the pointers and vector declarations)"arrow/type_fwd.h"
forward-declaresArrayBuilder
(which we don't need to be a complete type in the header because we only pointers/references to it in this header)
With these included, you should be able to remove "arrow/builder.h"
which is a very expensive header.
[1] https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/WhyIWYU.md
template Status AppendNumeric<Int8Type>(ArrayBuilder* builder, int64_t* next_value); | ||
template Status AppendNumeric<Int16Type>(ArrayBuilder* builder, int64_t* next_value); | ||
template Status AppendNumeric<Int32Type>(ArrayBuilder* builder, int64_t* next_value); | ||
template Status AppendNumeric<Int64Type>(ArrayBuilder* builder, int64_t* next_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need these (in this context) because since this template is in an anon namespace the compiler will only instantiate the 4 instances that are referred to in this compilation unit.
8a3f0b5
to
f76416f
Compare
@felipecrv I applied your suggestions Thank you for review :) |
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 7aff9d5. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change Improve the `fixed_width_test_util.h`. ### What changes are included in this PR? - Move the `fixed_width_test_util.h` to `arrow/testing` - Divide the `fixed_width_test_util` to `.cc` and `.h` - Remove unused headers ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#41558 Authored-by: Hyunseok Seo <hsseo0501@gmail.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Rationale for this change
Improve the
fixed_width_test_util.h
.What changes are included in this PR?
fixed_width_test_util.h
toarrow/testing
fixed_width_test_util
to.cc
and.h
Are these changes tested?
Yes
Are there any user-facing changes?
No