Spring Introduce hook in ResourceEntityResolver to override fallback to remote access for DTD/XSD resolution

This is definitely true on Spring 6.0.2, I suspect it's equally true for 5.3.x and likely everything going back many years...

I would not characterize this as a security issue, but it seems like a poor default decision to allow the org.springframework.beans.factory.xml.ResourceEntityResolver class to fall through to remote http(s) resolution of XML entities.

I would think at minimum that ought to be off by default, or certainly an optional behavior at least, but there's no obvious way I saw to override that behavior save for duplicating the class and injecting the replacement into our custom bean defintion reader.

There's also a nasty bug in Java's XML parser that causes it to handle null coming back from an EntityResolver by just doing its own resolution, but throwing an exception out of the resolveEntity call does prevent that. So I think it would be a good improvement to either default throw, or at least provide some kind of option to do so.

Comment From: sbrannen

In which scenarios are you using ResourceEntityResolver to resolve DTDs or XSDs in XML from untrusted sources?

Comment From: scantor

It happens automatically when you happen to, for example, reference a schema with an import in it. If you accomodate the import in the appropriate way via spring.schemas, it works, but an accidental mistake on the classpath can leave the import un-hijacked, so to speak, and it falls through to remotely attempt resolution.

That's how we noticed the behavior. We've frequently had to react to accidental remote fetches over the years, this time I just happened to have the time to bottom out the logic and locate the spot I needed to replace, but once I realized what it was doing, I felt it was worth mentioning.

I don't argue that it's unreasonable to allow remote access (well, I would, but I'm not going to argue it here), but as a default, it didn't seem ideal to me, and there also didn't appear to be any direct means of preventing it without actually copying the Spring class and injecting the "part" of it that isn't doing that.

Comment From: scantor

I guess to be fair, there's another consideration here, which is that the documentation and behavior of Java's XML parser, with regard to the EntityResolver interface, is that returning null results in the systemId being directly resolved anyway. So another way to frame this I guess is, if all you're doing is fetching the systemId remotely, that's effectively what Java's doing already. So it's plausible to say "hey, write less code".

So having said that, if you were to consider this behavior suboptimal and change it, you'd really have to change it to throw an exception, since removing the section of code outright won't change the behavior anyway.

Comment From: jhoeller

We could log a warning (or info statement) whenever we fall back to remote access there. After all, accidental remote access is unfortunate for cloud deployment, even more so for native images where one certainly intends to include all relevant resources.

Comment From: scantor

Another possible suggestion: rewrite the ResourceEntityResolver slightly to expose a protected method for doing the remote fetch step, so subclassers could easily block that step without having ot duplicate the earlier section of code with the local resolution.

Comment From: sbrannen

Let's go with a combination of the two proposals.

Extract the External dtd/xsd lookup via https functionality to a new protected method that subclasses can override.
Log a warning (or info) statement in that newly extracted method.

Comment From: sbrannen

Superseded by #29697

Comment From: scantor

It appears the hook/fix for this was done without allowing the protected method to raise an IOException, so there's no way to properly signal back "don't do this" since returning null will just fall through.

Comment From: simonbasle

@scantor since the protected method is expected to be overridden, isn't it possible cover that case by having the override throw a RuntimeException ?

Comment From: scantor

I don't know what that will do exactly, since your base class doesn't catch that. It may be more fatal than it should otherwise be, but it will definitely cause non-localized logging and handling of the condition.

Comment From: simonbasle

the fact that the fallback is hit means the resourcePath could not be defined. if an override of that particular flavor of resolver can't be allowed to fall back to resolving the sytemId as a URL and can't produce a fallback InputSource, then I'd argue it has indeed encountered a fatal case that calling code needs to deal with anyway.

I'm not sure we can add the throws SAXException, IOException clause at this point, breaking implementations that could potentially have been made since this was released. although it's unlikely there's been any in these 4 months except for yours)...

Comment From: scantor

Ok, I did run a quick trial and it lands the exception in more or less the same try/catch block as it did before, so I can live with that answer, I just wanted to note the inconsistency.