统部分业务包重启,导致系统无法访问,重启网关后系统恢复正常。

Blade 未结 2 119
263778608
263778608 剑圣 2024-09-19 10:54
悬赏:5

一、该问题的重现步骤是什么?

1. 重启系统部分业务包,比如order、oa包,有时候会导致其他业务包无法访问,比如user、system包的接口调用失败,报404,auth包鉴权失败,网关gateway不可用等问题。

2. 最后需要重启网关gateway然后系统恢复正常。

3.是不是重启后存网关上的路由找不到?还是存在所谓的历史版本不一致,导致连接访问失败? 比如redis的数据都访问不到。


二、你期待的结果是什么?实际看到的又是什么?

期待的结果:任意重启业务包,对系统不产生影响,只会应该当前包的服务,其他的服务不应该产生影响,特别是网关要保证可用。

实际看到:网关gateway不可用,其他业务包无法访问。



三、你正在使用的是什么产品,什么版本?在什么操作系统上?

bladex 

springblade 2.6.0.RELEASE

四、请提供详细的错误堆栈信息,这很重要。

其他业务服务的日志:

2024-09-18 22:38:13.302 ERROR 324061 --- [ing.beat.sender] com.alibaba.nacos.client.naming          : [NA] failed to request 


java.net.SocketTimeoutException: connect timed out

        at java.net.PlainSocketImpl.socketConnect(Native Method)

        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

        at java.net.Socket.connect(Socket.java:589)

        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)

        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)

        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)

        at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)

        at sun.net.www.http.HttpClient.New(HttpClient.java:339)

        at sun.net.www.http.HttpClient.New(HttpClient.java:357)

        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)

        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)

        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)

        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)

        at com.alibaba.nacos.client.naming.net.HttpClient.request(HttpClient.java:86)

        at com.alibaba.nacos.client.naming.net.NamingProxy.callServer(NamingProxy.java:427)

        at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:462)

        at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:395)

        at com.alibaba.nacos.client.naming.net.NamingProxy.sendBeat(NamingProxy.java:337)

        at com.alibaba.nacos.client.naming.beat.BeatReactor$BeatTask.run(BeatReactor.java:108)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)


2024-09-19 01:58:33.123 ERROR 324061 --- [ing.beat.sender] com.alibaba.nacos.client.naming          : [NA] failed to request 


java.net.SocketTimeoutException: connect timed out

        at java.net.PlainSocketImpl.socketConnect(Native Method)

        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

        at java.net.Socket.connect(Socket.java:589)

        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)

        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)

        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)

        at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)

        at sun.net.www.http.HttpClient.New(HttpClient.java:339)

        at sun.net.www.http.HttpClient.New(HttpClient.java:357)

        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)

        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)

        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)

        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)

        at com.alibaba.nacos.client.naming.net.HttpClient.request(HttpClient.java:86)

        at com.alibaba.nacos.client.naming.net.NamingProxy.callServer(NamingProxy.java:427)

        at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:462)

        at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:395)

        at com.alibaba.nacos.client.naming.net.NamingProxy.sendBeat(NamingProxy.java:337)

        at com.alibaba.nacos.client.naming.beat.BeatReactor$BeatTask.run(BeatReactor.java:108)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)


(END)


网关的日志部分日志,直接报401鉴权失败。但是gateway重启后是可以正常访问的。

2024-09-19 09:52:31.230  INFO 5476 --- [r-http-epoll-31] o.springblade.gateway.filter.AuthFilter  : -----AuthFilter filter isSkip-----/blade-user/getUserInfo?phone=17603059577

2024-09-19 09:52:31.233  INFO 5476 --- [r-http-epoll-31] o.s.g.filter.GlobalResponseLogFilter     : 


================ Gateway Response Start  ================

<=== 401 GET: /blade-user/getUserInfo?phone=17603059577

===Headers===  transfer-encoding: [chunked]

===Headers===  Content-Type: [application/json;charset=UTF-8]

================  Gateway Response End  =================




五、若有更多详细信息,请在下面提供。

2条回答
  • 2024-09-19 11:27

    你重启任意服务,都会导致nacos断连,而网关和nacos是有一个心跳周期的,如果周期还没到,他还是会以为服务下线了,自然就连接不上。等下一个周期心跳结束,才会把服务设置为正常可访问的状态。

    快速解决方案就是在服务重启成功后,把网关重启,这样就不用等心跳周期,重启完就能生效。

    0 讨论(0)
  • 2024-10-18 14:59

    不像你说的

    等下一个周期心跳结束,才会把服务设置为正常可访问的状态。

    都几个小时了,业务服务还是一直访问不上。网关这边报的错,业务服务拒绝连接。

    网关错误日志如下:================  Gateway Request End  =================


    ERROR 24327 --- [r-http-epoll-14] a.w.r.e.AbstractErrorWebExceptionHandler : [91d3fb11] 500 Server Error for HTTP POST "/blade-b2ecproject/b2ec/queryTodayHistoryBalance"


    io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: 拒绝连接: /192.168.123.50:9116

    Caused by: java.net.ConnectException: finishConnect(..) failed: 拒绝连接

            at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) ~[netty-transport-native-unix-common-4.1.51.Final.jar!/:4.1.51.Final]

            at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) ~[netty-transport-native-unix-common-4.1.51.Final.jar!/:4.1.51.Final]

            at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:672) [netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar!/:4.1.51.Final]

            at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:649) [netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar!/:4.1.51.Final]

            at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:529) [netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar!/:4.1.51.Final]

            at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465) ~[netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar!/:4.1.51.Final]

            at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar!/:4.1.51.Final]

            at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-common-4.1.51.Final.jar!/:4.1.51.Final]

            at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.51.Final.jar!/:4.1.51.Final]

            at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.51.Final.jar!/:4.1.51.Final]

            at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]

    -------

    业务服务是正常重启的,没看到报任何错误。

    业务启动日志如下:

    2024-10-18 14:45:15.565  INFO 30885 --- [sync-executor-1] o.s.core.launch.StartEventListener       : ---[BLADE-B2ECPROJECT]---启动完成,当前使用的端口:[9116],环境变量:[prod]---

    2024-10-18 14:45:16.037  INFO 30885 --- [  XNIO-1 task-2] io.undertow.servlet                      : Initializing Spring DispatcherServlet 'dispatcherServlet'

    2024-10-18 14:45:16.038  INFO 30885 --- [  XNIO-1 task-2] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'

    2024-10-18 14:45:16.127  INFO 30885 --- [  XNIO-1 task-2] o.s.web.servlet.DispatcherServlet        : Completed initialization in 88 ms


    --

    尝试过重启网关或者还原旧的业务包重启服务,是可以的。

    0 讨论(0)
提交回复