Skip to content

WstxSAXParser error handling when used with JAXB validation #196

@winfriedgerlach

Description

@winfriedgerlach

Use case: JAXB unmarshalling with schema validation enabled, unmarshalling an invalid XML document
Environment: WstxSAXParser 6.6.0 for parsing, JDK 17 Xerces for validation, jaxb-runtime 4.0.4
Expected outcome: org.xml.sax.SAXParseException with a useful message (e.g., "Content of element x incomplete, y expected")
Actual outcome: java.lang.AssertionError without any useful message or cause

We are using the Woodstox SAX parser with JAXB as a drop-in replacement for JDK's default SAX parser due to better performance. We have enabled XML Schema validation for JAXB by calling Unmarshaller.setSchema(). As Woodstox does not provide a javax.xml.validation.SchemaFactory, JAXB automatically falls back to JDK's built-in Xerces for XML Schema validation.

We can indeed confirm that this setup works, XML messages are parsed and validated. Unfortunately, due to the error handling in WstxSAXParser, the helpful SAXParseException generated by Xerces during XML validation is not thrown, but instead jaxb-runtime throws an AssertionError in UnmarshallingContext.endDocument():

https://github.com/eclipse-ee4j/jaxb-ri/blob/13402b7f1f19c840206936b56ad73e693eff2463/jaxb-ri/runtime/impl/src/main/java/org/glassfish/jaxb/runtime/v2/runtime/unmarshaller/UnmarshallingContext.java#L584-L585

When looking at the relevant code in Woodstox, two things come to mind:

} catch (IOException io) {
throwSaxException(io);
} catch (XMLStreamException strex) {
throwSaxException(strex);
} finally {
if (mContentHandler != null) {
mContentHandler.endDocument();
}

  1. the SAXParseException from schema validation is not handled in the catch (might be OK?)
  2. in finally, endDocument() is called, which leads to the AssertionError in UnmarshallingContext.endDocument()

If I understand JDK's JavaDoc correctly, it seems to be wrong to call endDocument() in case of an "fatal error", which may explain the diffferent behavior when using the Xerces SAX parser:

https://github.com/openjdk/jdk/blob/b419e9517361ed9d28f8ab2f5beacf5adfe3db91/src/java.xml/share/classes/org/xml/sax/ContentHandler.java#L144-L151

Could the .endDocument() call be moved away from the finally without breaking anything else? Looking forward to your opinion!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions