Skip to content

Conversation

josecsotomorales
Copy link
Contributor

@josecsotomorales josecsotomorales commented Sep 16, 2025

Description

  • Make StringConverter to serialize Databricks complex objects (arrays/maps/structs), JDBC arrays/structs, generic collections, and fall back to toString() so the driver no longer hits Invalid conversion to String
  • Relax ComplexDataTypeParser so ISO timestamps with T separators and timezone offsets parse correctly, avoiding Arrow ingestion failures
  • Extend the unit suite to cover the new string handling and offset timestamp scenarios

Testing

  • Added specific tests which can be executed with mvn -Dtest=StringConverterTest,ComplexDataTypeParserTest test

Additional Notes to the Reviewer

  • Sanity-check the new fallback path in StringConverter—it now trusts arbitrary toString() implementations when no explicit formatter applies

- extend `StringConverter` to stringify Databricks complex wrappers, JDBC arrays/structs, and generic collections while falling back to `toString` for unknown objects (src/main/java/com/databricks/jdbc/api/impl/converters/StringConverter.java:25)
- relax `ComplexDataTypeParser` timestamp parsing to accept ISO strings with `T` separators and timezone offsets (src/main/java/com/databricks/jdbc/api/impl/ComplexDataTypeParser.java:196)
- broaden regression coverage for the new string handling and offset timestamp parsing (src/test/java/com/databricks/jdbc/api/impl/converters/StringConverterTest.java:166, src/test/java/com/databricks/jdbc/api/impl/ComplexDataTypeParserTest.java:134)
…lidation

  - Extend StringConverter to stringify Databricks complex wrappers (DatabricksArray,
    DatabricksMap, DatabricksStruct), JDBC arrays/structs, and generic collections
    while falling back to toString() for unknown objects
  - Add VARIANT type validation to ensure only String objects are accepted,
    maintaining backward compatibility
  - Relax ComplexDataTypeParser timestamp parsing to accept ISO strings with
    'T' separators and timezone offsets (e.g., "2025-03-18T12:08:31.552223-07:00")
  - Broaden test coverage for new string handling and offset timestamp parsing
@josecsotomorales
Copy link
Contributor Author

@jayantsing-db, this PR fixed other major issues with complex datatype parsing. Can you please review?

…lidation

  - Extend StringConverter to stringify Databricks complex wrappers (DatabricksArray,
    DatabricksMap, DatabricksStruct), JDBC arrays/structs, and generic collections
    while falling back to toString() for unknown objects
  - Add VARIANT type validation to ensure only String objects are accepted,
    maintaining backward compatibility
  - Relax ComplexDataTypeParser timestamp parsing to accept ISO strings with
    'T' separators and timezone offsets (e.g., "2025-03-18T12:08:31.552223-07:00")
  - Broaden test coverage for new string handling and offset timestamp parsing

Signed-off-by: josecsotomorales <[email protected]>
…ality

- Test SQL Array handling with mock objects and null cases
- Test generic Struct conversion with attribute arrays and null handling
- Test generic Map and Collection string conversion with mixed types
- Test Java array conversion for primitive and object arrays
- Test string escaping functionality via reflection
- Test nested data structure conversion (maps in collections, etc.)
- Add SQLException error path testing for robustness

These tests improve branch coverage to meet the 85% requirement while
ensuring all new complex type conversion paths are properly tested.

Signed-off-by: josecsotomorales <[email protected]>
@josecsotomorales
Copy link
Contributor Author

@jayantsing-db, this PR fixed other major issues with complex datatype parsing. Can you please review?

Do you have any feedback on this PR? This currently affects users who encounter string parsing issues in fields with complex data types. It would be great if we could get it merged.

Copy link
Collaborator

@samikshya-db samikshya-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment, LGTM otherwise. Adding @shivam2680 to double check conversions

Address PR review feedback by simplifying timestamp parsing logic:
- Try Timestamp.valueOf() on original text first
- If that fails, try with T replaced by space
- Only fall back to custom parser for complex formats
- Extract custom parsing logic to separate method for clarity

Also update NEXT_CHANGELOG.md to properly describe all changes in this PR.

Signed-off-by: josecsotomorales <[email protected]>
@josecsotomorales josecsotomorales force-pushed the fix-complex-type-conversion branch from 3d252ec to 7c7140c Compare September 25, 2025 19:07
- Refactor ComplexDataTypeParser to reuse TimestampConverter logic,
  eliminating duplicate timestamp parsing code while maintaining
  backward compatibility for timezone offset handling
- Remove unnecessary stripQuotes() call since JsonNode.asText()
  already strips JSON quotes
- Remove redundant DatabricksStruct instanceof check in StringConverter
- Add defensive null checks for map entries and collection parameters
- Standardize comma handling pattern across all StringConverter methods
- Use WildcardUtil.isNullOrEmpty() for consistency with codebase

Signed-off-by: josecsotomorales <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants