TL;DR:
It'd be very useful to have a parameter to turn off the body returned by BasicErrorController. There're already some customization options available as to what to include in the body, but there's no switch to omit the entire body. In the following case, the failure only happened in rare cases, and didn't affect the actual response. However, the response was rendered unusable due to the error attributes added by BasicErrorController
.
Long story:
We recently upgraded from 2.1.2.RELEASE to 2.3.4, and in the process, were hit by a quite hard to find bug.
- We use
spring-cloud-starter-netflix-zuul
, and have written custom Zuul filters. One of those filters manipulates the response. - After the upgrade, we started seeing instances where the following would be appended to the response. This appeared to be happening randomly.
{
"timestamp": "2021-05-06T03:52:06.332+00:00",
"status": 200,
"error": "OK",
"message": "",
"path": "/my-service/v2.1/search/"
}
- After lots of digging, we found that WebMvcMetricsFilter.getTimer was throwing an exception. We export metrics to Graphite using a custom
HierarchicalNameMapper
, and due to a change in behavior in the code, the mapper was mapping two different metrics to the same hierarchical path. This manifested as the following exception in the logs:
A metric named some_prefix.gateway.gauge.http.server.requests.percentile.0_5.localhost.some_path.200 already exists
at c.c.m.MetricRegistry.register(MetricRegistry.java:156)
at i.m.c.i.d.DropwizardMeterRegistry.newGauge(DropwizardMeterRegistry.java:98)
- Due to the aforementioned exception,
BasicErrorController.error
was triggered, which returns a map of error attributes that translates to the JSON being appended to the response.
In the end, we fixed the HierarchicalNameMapper
, and things returned to normal. However, the behavior described above isn't new; BasicErrorController
has existed since 1.0.0
, and the only reason we didn't see this issue before was because there was no uncaught exception in the Spring Boot filters.
Comment From: bclozel
Hello @asarkar
I'm sorry you had to deal with this situation. Is that "change in behavior in the code" in Spring Boot or somewhere else? Is there something helpful we could add to the release notes to help other developers?
If I understand correctly, this problem is the unfortunate consequence of several things happening at the same time:
- a "change in behavior" that causes a bug in a custom metrics component (and throws exceptions)
- Servlet response wrappers (probably involved with the Zuul filters) that manipulate the response body and prevents the response from being committed
I'm not sure that providing a configuration property to disable the error response body entirely would have helped here. At best, you would have been able to hide the problem: the response body would look right but metrics would still be wrong and your application would be throwing lots of exceptions at runtime.
If I'm not mistaken, Spring Boot is already checking in several places that the response is not already committed before trying to write to the response body in error handling cases. Maybe Zuul here is making those checks useless. Catching exceptions thrown by the metrics infrastructure would also hide the problem.
I guess you noticed the issue by looking at HTTP responses? Did something fail in your test suite? How did you track down the source of the problem? Did you have the relevant exceptions stacktraces in the logs?
Comment From: asarkar
@bclozel
- "change in behavior in the code" is our problem, not Spring's.
configuration property to disable the error response body entirely
- This would be helpful. The exception to create a new timer (from
WebMvcMetricsFilter
) is already logged at error level, so there's no chance of it going unnoticed. Also, metrics is a non-functional requirements, and even if we didn't have one metric for the call, the response is still perfectly valid. It doesn't look good for us if we fail to provide a response because we couldn't create a metric.
response is not already committed
- It's possible the response isn't committed in our custom Zuul filters. I'd not go out my way to fix it at this time, because
spring-cloud-starter-netflix-zuul
is in maintenance mode; we plan to move to Spring Cloud Gateway in near future.
Catching exceptions thrown by the metrics infrastructure would also hide the problem.
- You can always log it at error level if the body is disabled (or always). You don't have to catch it even, you could just log the error attributes goes in the response body.
Did something fail in your test suite? 5. Yes.
How did you track down the source of the problem? 6. I work with some smart people, and I've good debugging skills.
Did you have the relevant exceptions stacktraces in the logs? 7. Not something that directly led me to the root cause. There was a stacktrace that indicated the failure to create the metric, but nothing from
BasicErrorController.
(also see point 4)
Comment From: bclozel
Thanks for your comments.
This is indeed a very specific situation and the information at hand was enough to track down the origin of the problem.
Disabling the response body would only partially hide the problem, as other metrics and the HTTP response (status and headers) might still be wrong. I don't think we should introduce this feature and I don't see how the current support could be improved to work around such issues.
I think that excluding the ErrorMvcAutoConfiguration
is the best way to ensure that error handling is completely disabled and prevents such problems.
Thanks!