-
Notifications
You must be signed in to change notification settings - Fork 45
feat: add find field (by id and name) support to schema #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add find field (by id and name) support to schema #180
Conversation
nullccxsy
commented
Aug 15, 2025
- add insensitive way to find schemafield(list, struct, map)
- change the complexity of find name to O(1)
- test insensitive way to find schemafield(list, struct, map)
adc4817
to
ec73749
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3b0653a
to
ae4515e
Compare
0394c01
to
ef4b2de
Compare
src/iceberg/schema.cc
Outdated
NametoIdVisitor::GetPath(path, std::string(field.name()), case_sensitive_); | ||
short_full_path = GetPath(short_path, std::string(field.name()), case_sensitive_); | ||
name_to_id_[full_path] = field.field_id(); | ||
name_to_id_.emplace(short_full_path, field.field_id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the return value and error out if failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it fails, it means that the short name conflicts with the canonical name. In this case, the short name is discarded, so don't need to check the return value.
8319453
to
8ba65bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I think we are close to the finish line.
Please rename to title to |
Please note that the java impl has |
src/iceberg/schema.h
Outdated
Result<Status> InitIdToIndexMap() const; | ||
Result<Status> InitNameToIndexMap() const; | ||
Result<Status> InitLowerCaseNameToIndexMap() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they "Ensure" rather than Init
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May name it LazyIdToIndexMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose here is not "Init" ( Maybe reinitialize ) Mapping, it do Ensure
the mapping to be "initialized", I guess EnusreIdToIndexMap()
is ok to me.
Generally, LazyIdToIndexMap
seems to be another api like Result<const map<...>&> LazyIdToIndexMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I feel a bit afraid about lazy initialize mapping, since it would be thread-unsafetly. But maybe it's not a severe problem here.
For lazy initialization, we may leverage |
So this requires a mutex and three once_flag? It might be huge but it's ok to me... |
ok, this will be fixed |
We can support this is another patch, currently we can add a todo here. |
0f85748
to
f0dbff4
Compare
9de800c
to
c3b4bd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @nullccxsy!
d5f8d92
to
4df8eb7
Compare
src/iceberg/schema.h
Outdated
/// | ||
/// Enables std::unordered_map to directly accept std::string_view lookup keys | ||
/// without creating temporary std::string objects, using C++20's transparent lookup. | ||
struct string_hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to move it to string_utils.h so type.h can reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can do this altogether in #194
- Add GetFieldByName method to support nested field queries with dot notation - Implement InitNameToIndexMap and InitIdToIndexMap for field mapping - Introduce SchemaFieldVisitor for schema traversal - Implement name pruning logic for element and value in nested structure
- Introduce IdVisitor for schema traversal to init id_to_index_(Map) - Introduce NameVisitor for schema traversal to init name_to_index_(Map), lowercase_name_to_index_(map)
…ava impl 1. Refactor IdVisitor: Add VisitNested() to handle NestedType, make Visit() generic for all Type hierarchy (avoid anti-pattern) 2. Merge id_to_index_ + full_schemafield_ into id_to_field_ (std::unordered_map<int32_t, const ref<SchemaField>>) to reduce memory footprint 3. Align with Java impl's idToField design (support future alias/identifier fields without refactor)
- change the type of id from size_t to int_32 - rename GetPath to BuildPath - fix bug new_short_path should build from short_path, not path
…sult call in macros - Added logic to detect and handle duplicate names in NameToIdVisitor to prevent invalid schema errors. - Fixed duplicate result call issue in src/iceberg/util/macros.h by optimizing ICEBERG_RETURN_UNEXPECTED macro.
… conversion function - Implemented detection for duplicate field IDs, Names in visitors, returning kInvalidSchema error with message. - Updated lowercase conversion in NameToIdVisitor
…reduce copies and unify initialization - Switched field_ and schema_ members in test classes to std::unique_ptr for unique ownership and to avoid duplicate and avoid creating duplicate objects.
… std::string creation
17bcac2
to
9aa221d
Compare