After a spring boot websocket client connects to the websocket spring boot server, the server send some heartbeats, and then after some minutes in the trace logs shows "Connection to broker closed.". Removing setTaskScheduler() from the client seems to work without problems.
Websocket client with SockJsClient and heartbeats enabled. Websocket server with enableStompBrokerRelay() (ActiveMQ) and withSockJS()
Here a POC with the client and server, it needs a ActiveMQ broker: https://github.com/rd-o/websocket-spring-boot-bug
Comment From: rd-o
2020-01-02 17:07:49.992 DEBUG 3329 --- [nboundChannel-6] org.springframework.web.SimpLogging : Cleaning up connection state for session 81b3512603de443c92c3d65c63756df7
2020-01-02 17:07:49.992 TRACE 3329 --- [tboundChannel-8] org.springframework.web.SimpLogging : Encoding STOMP ERROR, headers={message=[failed to forward heart-beat in session 81b3512603de443c92c3d65c63756df7]}
2020-01-02 17:07:49.993 DEBUG 3329 --- [nboundChannel-6] org.springframework.web.SimpLogging : Closing TCP connection in session 81b3512603de443c92c3d65c63756df7
2020-01-02 17:07:49.993 DEBUG 3329 --- [ent-scheduler-2] org.springframework.web.SimpLogging : Cleaning up connection state for session 81b3512603de443c92c3d65c63756df7
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Preparing to write SockJsFrame content='a["ERROR\nmessage:failed to forward heart-beat in session 81b3512603de443c92c3d6...(truncated)'
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='a["ERROR\nmessage:failed to forward heart-beat in session 81b3512603de443c92c3d6...(truncated)'
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-9] org.springframework.web.SimpLogging : Encoding STOMP ERROR, headers={message=[Connection to broker closed.]}
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-8] o.s.w.s.adapter.NativeWebSocketSession : Sending TextMessage payload=[a["ERROR\n..], byteCount=120, last=true], StandardWebSocketSession[id=40594c37-39f4-75e0-9927-9216996c87d3, uri=ws://localhost:4321/gs-guide-websocket/236/81b3512603de443c92c3d65c63756df7/websocket]
2020-01-02 17:07:49.995 TRACE 3329 --- [tboundChannel-9] .s.h.ConcurrentWebSocketSessionDecorator : Another send already in progress: session id '81b3512603de443c92c3d65c63756df7':, "in-progress" send time 2 (ms), buffer size 62 bytes
2020-01-02 17:07:49.995 TRACE 3329 --- [tboundChannel-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Preparing to write SockJsFrame content='a["ERROR\nmessage:Connection to broker closed.\ncontent-length:0\n\n\u0000"]'
2020-01-02 17:07:49.996 DEBUG 3329 --- [tboundChannel-9] s.w.s.s.t.s.WebSocketServerSockJsSession : Closing SockJS session 81b3512603de443c92c3d65c63756df7 with CloseStatus[code=1002, reason=null]
Comment From: wilkinsona
This may be a symptom of AMQ-4710. @rd-o do you see the same problem with a different broker?
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: spring-projects-issues
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.
Comment From: kandonorbert
We are facing the same issue after updating the Spring Boot version from "1.5.12.RELEASE" to "2.1.9.RELEASE". ActiveMQ version is 5.15.3. Randomly we get exception in the logs which states that the connection was lost (code 1002). The ActiveMQ is up and running, no error in the logs. Looks like only the SpringBoot application server restart fixes the connection, the connection is not reestablished without server restart. We had no such problems in the past 2.5 years with the old SpringBoot version, and with the same ActiveMQ instance. Now we are planning to go live with the new Spring Boot, however this issue looks like a very risky one. Any feedback would be appreciated.
Comment From: wilkinsona
1002 indicates that the connection was closed due to a protocol error. Without some more information, the code alone isn't enough to confirm that it's the same issue as was originally reported here. If you can capture the problem with debug or trace logging enabled, that may help to determine the nature of the error. If you manage to do that or you can provide a minimal sample that reproduces the problem, please open a new issue and we can investigate.
Comment From: lrozenblyum
@kandonorbert, @rd-o were you able to localize the problem/apply some solution for it? It looks like we're affected by a similar random 1002 issue after upgrading
Spring WebSocket: 4.3.17.Release -> 5.3.3
Comment From: rd-o
@lrozenblyum
I couldn't find a solution for this
Comment From: jbakoc1
@rd-o I have the same issue and I believe it is related to the Page Visibility functionality https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibility_API After some time the browser suspends the tab and the heartbeat is not processed. But I don't know why we are getting 1002, instead of 1001. @wilkinsona can we reopen the issue and provide you with all the info you need?
Comment From: wilkinsona
But I don't know why we are getting 1002, instead of 1001
This sort of protocol level problem is unlikely to be caused by Spring Boot as it's only involved in configuring some of the components and not in their inter-communication. If your client is receiving a 1002, it's likely to be coming from the STOMP broker. Which broker are you using?
Comment From: jbakoc1
RabbitMQ 3.8.16 with STOMP plugin
Comment From: jbakoc1
But I don't know why we are getting 1002, instead of 1001
This sort of protocol level problem is unlikely to be caused by Spring Boot as it's only involved in configuring some of the components and not in their inter-communication. If your client is receiving a 1002, it's likely to be coming from the STOMP broker. Which broker are you using?
What kind of configurations you mean. There are no firewalls involved between spring host & the rabbit broker and the default tcp port 61613 is open between them. Can you point to the right direction?
Comment From: wilkinsona
I mean the configuration of Spring Framework's STOMP support.
Can you reproduce the problem with debug or trace level logging enabled both in your Boot app and in the Rabbit broker? It's not yet clear to me why you think this is a Spring Boot problem rather than a problem in one of the other components that's involved. Seeing some logs that capture the problem occurring should help to narrow down the cause.
Comment From: jbakoc1
@wilkinsona here is the log from spring boot. I am trying to reach the rabbit admin to get the logs from rabbit... spring-trace-logs.txt
Comment From: wilkinsona
Thanks for the logs.
It looks like the app has received a DISCONNECT
from the client:
2021-10-01 11:41:08,042 TRACE org.springframework.web.socket.messaging.StompSubProtocolHandler [https-openssl-apr-443-exec-2] From client: DISCONNECT receipt=close-2 session=5d1d057a-c5b2-2105-a087-098804f469d7
It forwards this to the broker:
2021-10-01 11:41:08,043 DEBUG org.springframework.core.log.CompositeLog [clientInboundChannel-8] Forwarding DISCONNECT receipt=close-2 session=5d1d057a-c5b2-2105-a087-098804f469d7
The connection to the broker is then closed:
2021-10-01 11:41:08,047 DEBUG org.springframework.core.log.CompositeLog [tcp-client-scheduler-1] TCP connection to broker closed in session 5d1d057a-c5b2-2105-a087-098804f469d7
And this results in an error being sent to the client:
2021-10-01 11:41:08,049 TRACE org.springframework.core.log.CompositeLog [clientOutboundChannel-7] Encoding STOMP ERROR, headers={message=[Connection to broker closed.]}
It looks to me like something is going wrong in the handling of the DISCONNECT
. Either the connection is being closed when it should not have been, or the relay isn't coping as it should with the connection being closed. This code is part of Spring Framework so we'll transfer this issue so that the Framework team can take a look.
Comment From: rd-o
@rd-o I have the same issue and I believe it is related to the Page Visibility functionality https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibility_API After some time the browser suspends the tab and the heartbeat is not processed. But I don't know why we are getting 1002, instead of 1001.
@jbakoc1 In my case the client and server are using Spring Boot
Comment From: jbakoc1
Hey @snicoll do we have any update on the issue?
Comment From: itunic
hello ,Is there a solution to this problem?
Comment From: mohammed-eladly-afaqy
Hello, is there any update for this issue?
Comment From: fabioportieri
same problem, with spring boot 2.7.3 and rabbitmq 3.11, angular as client is there a workaround? maybe implementing a custom heartbeats mechanism?
Comment From: fabioportieri
i think i may have found the reason, at least in my case, given this log:
2023-04-05T16:50:36.608+02:00 TRACE 11760 --- [boundChannel-99] o.s.w.s.adapter.NativeWebSocketSession : Sending TextMessage payload=[a["\n"]], byteCount=7, last=true], StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket]
2023-04-05T16:50:41.577+02:00 DEBUG 11760 --- [ient-loop-nio-7] r.n.resources.PooledConnectionProvider : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] Channel closed, now: 1 active connections, 0 inactive connections and 0 pending acquire requests.
2023-04-05T16:50:41.579+02:00 DEBUG 11760 --- [ient-loop-nio-7] reactor.netty.ReactorNetty : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] Non Removed handler: StompMessageDecoder, context: ChannelHandlerContext(StompMessageDecoder, [id: 0x82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)}
2023-04-05T16:50:41.579+02:00 DEBUG 11760 --- [ient-loop-nio-7] reactor.netty.ReactorNetty : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] Non Removed handler: reactor.left.onChannelReadIdle, context: ChannelHandlerContext(reactor.left.onChannelReadIdle, [id: 0x82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)}
2023-04-05T16:50:41.580+02:00 DEBUG 11760 --- [ient-loop-nio-7] r.n.r.DefaultPooledConnectionProvider : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] onStateChange(ChannelOperations{PooledConnection{channel=[id: 0x82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673]}}, [disconnecting])
2023-04-05T16:50:41.581+02:00 TRACE 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Preparing to write SockJsFrame content='a["ERROR\nmessage:Connection to broker closed.\ncontent-length:0\n\n\u0000"]'
2023-04-05T16:50:41.581+02:00 TRACE 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='a["ERROR\nmessage:Connection to broker closed.\ncontent-length:0\n\n\u0000"]'
2023-04-05T16:50:41.581+02:00 TRACE 11760 --- [oundChannel-100] o.s.w.s.adapter.NativeWebSocketSession : Sending TextMessage payload=[a["ERROR\n..], byteCount=76, last=true], StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket]
2023-04-05T16:50:41.582+02:00 DEBUG 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Closing SockJS session ajdkf4vv with CloseStatus[code=1002, reason=null]
2023-04-05T16:50:41.582+02:00 TRACE 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='c[1002,""]'
2023-04-05T16:50:41.582+02:00 TRACE 11760 --- [oundChannel-100] o.s.w.s.adapter.NativeWebSocketSession : Sending TextMessage payload=[c[1002,""]], byteCount=10, last=true], StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket]
2023-04-05T16:50:41.582+02:00 DEBUG 11760 --- [oundChannel-100] o.s.w.s.adapter.NativeWebSocketSession : Closing StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket]
2023-04-05T16:50:41.585+02:00 DEBUG 11760 --- [oundChannel-100] s.w.s.h.LoggingWebSocketHandlerDecorator : WebSocketServerSockJsSession[id=ajdkf4vv] closed with CloseStatus[code=1002, reason=null]
2023-04-05T16:50:41.585+02:00 DEBUG 11760 --- [oundChannel-100] o.s.w.s.m.SubProtocolWebSocketHandler : Clearing session ajdkf4vv
2023-04-05T16:50:41.586+02:00 DEBUG 11760 --- [oundChannel-100] i.d.m.w.w.WebsocketEventsListener : Websocket session disconnected: ajdkf4vv
it appears the culprit here is netty tcp client, most likely there is a timeout setting for idle connections, that in turns close the websocket, more information here: https://github.com/reactor/reactor-netty/issues/612
EDIT i tried patching my ReactorNettyTcpClient, setting this pool:
this.poolResources = ConnectionProvider.builder("tcp-client-pool").maxConnections(10000)
.maxIdleTime(Duration.ofSeconds(Integer.MAX_VALUE))
.disposeTimeout(Duration.ofSeconds(Integer.MAX_VALUE)).disposeInactivePoolsInBackground(Duration.ofSeconds(Integer.MAX_VALUE), Duration.ofSeconds(Integer.MAX_VALUE))
.maxLifeTime(Duration.ofSeconds(Integer.MAX_VALUE)).build();
but it didn't have any effect, aniway the trace log is this:
2023-04-05T17:53:50.771+02:00 DEBUG 18280 --- [ient-loop-nio-3] r.n.resources.PooledConnectionProvider : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Channel closed, now: 1 active connections, 0 inactive connections and 0 pending acquire requests.
2023-04-05T17:53:50.772+02:00 DEBUG 18280 --- [ient-loop-nio-3] reactor.netty.ReactorNetty : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Non Removed handler: StompMessageDecoder, context: ChannelHandlerContext(StompMessageDecoder, [id: 0xb874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)}
2023-04-05T17:53:50.772+02:00 DEBUG 18280 --- [ient-loop-nio-3] reactor.netty.ReactorNetty : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Non Removed handler: reactor.left.onChannelReadIdle, context: ChannelHandlerContext(reactor.left.onChannelReadIdle, [id: 0xb874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)}
2023-04-05T17:53:50.775+02:00 TRACE 18280 --- [ient-loop-nio-3] reactor.netty.channel.ChannelOperations : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Disposing ChannelOperation from a channel
java.lang.Exception: ChannelOperation terminal stack
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:465)
at reactor.netty.channel.ChannelOperations.onInboundClose(ChannelOperations.java:434)
at reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:73)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
currently trying to figure out why the channel get disposed
Comment From: fabioportieri
i solved my issue, actually there was not need to mess with netty/reactor or spring, since the problem was in my rabbitmq broker, the problem went away once i configured it with
tcp_listen_options.keepalive = true
Comment From: snicoll
@rd-o and those affected, can you please check if configuring the broker to keep the connection alive fixes the issue for you as well?
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: spring-projects-issues
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.