某个服务生产数据量大了就挂掉了

Blade 未结

关注举报

 1  1265

wangjie 剑圣 2022-05-11 17:19

一、该问题的重现步骤是什么？

============== Sql Start ==============

Execute SQL : select id, id_hex from factory_secondtest_data_SSD_1 where is_deleted = 0 and id_hex in ('D81897') and id_hex != '000000' and time >= 1644564004000 and 1652253604488 >= time

Execute Time: 0.367ms

============== Sql End ==============

2022-05-11 15:31:13.130 INFO [blade-iot,4c8401370367262c,4c8401370367262c,true] 13611 --- [ XNIO-1 task-1] o.s.core.mp.plugins.SqlLogInterceptor :

============== Sql Start ==============

Execute SQL : select id, id_hex from factory_secondtest_data_SSD_1 where is_deleted = 0 and id_hex in ('D82D03') and id_hex != '000000' and time >= 1644564564000 and 1652254164485 >= time

Execute Time: 0.455ms

============== Sql End ==============

2022-05-11 15:31:13.131 INFO [blade-iot,1307c52cb2f7cf8f,1307c52cb2f7cf8f,false] 13611 --- [ XNIO-1 task-12] o.s.core.mp.plugins.SqlLogInterceptor :

============== Sql Start ==============

Execute SQL : select id, id_hex from factory_secondtest_data_SSD_1 where is_deleted = 0 and id_hex in ('D82D22') and id_hex != '000000' and time >= 1644564440000 and 1652254040042 >= time

Execute Time: 0.296ms

============== Sql End ==============

2022-05-11 15:31:15.663 ERROR [blade-iot,,,] 13611 --- [t.remote.worker] c.a.n.c.remote.client.grpc.GrpcClient : Server check fail, please check server 47.134.186.192 ,port 9848 is available , error ={}

java.util.concurrent.TimeoutException: Waited 3000 milliseconds (plus 4260 milliseconds, 423900 nanoseconds delay) for com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$GrpcFuture@477b491f[status=PENDING, info=[GrpcFuture{clientCall={delegate={delegate=ClientCallImpl{method=MethodDescriptor{fullMethodName=Request/request, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@583c8ed6, responseMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@5faeb374, schemaDescriptor=com.alibaba.nacos.api.grpc.auto.RequestGrpc$RequestMethodDescriptorSupplier@652d09c7}}}}}]]

at com.alibaba.nacos.shaded.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:508)

at com.alibaba.nacos.common.remote.client.grpc.GrpcClient.serverCheck(GrpcClient.java:146)

at com.alibaba.nacos.common.remote.client.grpc.GrpcClient.connectToServer(GrpcClient.java:268)

at com.alibaba.nacos.common.remote.client.RpcClient.reconnect(RpcClient.java:529)

at com.alibaba.nacos.common.remote.client.RpcClient$3.run(RpcClient.java:374)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

java.util.concurrent.TimeoutException: Waited 3000 milliseconds (plus 4259 milliseconds, 819861 nanoseconds delay) for com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$GrpcFuture@1c3b0e0a[status=PENDING, info=[GrpcFuture{clientCall={delegate={delegate=ClientCallImpl{method=MethodDescriptor{fullMethodName=Request/request, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@583c8ed6, responseMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@5faeb374, schemaDescriptor=com.alibaba.nacos.api.grpc.auto.RequestGrpc$RequestMethodDescriptorSupplier@652d09c7}}}}}]]

at com.alibaba.nacos.shaded.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:508)

at com.alibaba.nacos.common.remote.client.grpc.GrpcClient.serverCheck(GrpcClient.java:146)

at com.alibaba.nacos.common.remote.client.grpc.GrpcClient.connectToServer(GrpcClient.java:268)

at com.alibaba.nacos.common.remote.client.RpcClient.reconnect(RpcClient.java:529)

at com.alibaba.nacos.common.remote.client.RpcClient$3.run(RpcClient.java:374)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

2022-05-11 15:31:15.664 INFO [blade-iot,1307c52cb2f7cf8f,1307c52cb2f7cf8f,false] 13611 --- [ XNIO-1 task-12] o.s.core.mp.plugins.SqlLogInterceptor :

============== Sql Start ==============

Execute SQL : select id, id_hex from factory_secondtest_data_SSD_1 where is_deleted = 0 and id_hex in ('D82D23') and id_hex != '000000' and time >= 1644564440000 and 1652254040044 >= time

Execute Time: 0.696ms

============== Sql End ==============

二、你期待的结果是什么？实际看到的又是什么？

运行日志如上，某个服务突然挂掉，服务从nacos下线了，等会又会自动启动，几台机器生产数据没问题，机器多了就会某个服务挂掉，请帮忙分析下啥原因？

三、你正在使用的是什么产品，什么版本？在什么操作系统上？

四、请提供详细的错误堆栈信息，这很重要。

五、若有更多详细信息，请在下面提供。

1条回答

admin - BladeX作者

2022-05-11 18:09
看报错是连接超时，导致心跳检测失败，然后下线。等下次心跳检测正常后服务又上线了。
```
java.util.concurrent.TimeoutException: Waited 3000 milliseconds
```
一般出现这种情况，比较大的概率有两种，可以有针对地处理一下：
一、接口执行时间过长，超过了等待时间，导致心跳失败，服务下线。（处理方法是监测哪些接口时间长，调用几次接口看看是否下线，然后优化接口）
二、内存或cpu占用资源过多，导致服务下线。（处理方法就是先关闭几个服务，再调用，看看是否还会挂掉）
0 讨论(0) 评论: 提交评论加载中...