BladeX Docker部署 Saber无法正常使用

Blade 未结 2 1675
brucedong
brucedong 2020-10-06 12:26

一、该问题的重现步骤是什么?

   0.使用bladex/script/docker/app/deploy.sh 部署,部署过程中发现无法获取harbor镜像,修改docker-compose.yaml中的image地址后,并重新制作report,成功启动所有docker,其他未作调整:

image.png

  1. Saber项目直接使用yarn build 生成 dist文件;

  2. 上传dist下所有文件至 docker app_web-nginx_1 /usr/share/nginx/html/目录:

image.png

3.浏览器访问 http://192.168.1.102:8000,出现未知错误:

image.png

4.F12调试发现500错误:


  1. Request URL:
    http://192.168.1.102:8000/api/blade-auth/oauth/captcha
    Request Method:
    GET
    Status Code:
    500 Internal Server Error

    响应文本:

  2. {"code":500,"data":null,"message":"Failed to handle request [GET http://192.168.1.102/blade-auth/oauth/captcha]: finishConnect(..) failed: Connection refused: /172.30.0.91:8100"}

5.WEB-NGINX配置文件如下:

/ # cat /etc/nginx/nginx.conf 
user  root;
worker_processes  1;
error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;
events {
    worker_connections  1024;
}
http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  /var/log/nginx/access.log  main;
    sendfile        on;
    #tcp_nopush     on;
    keepalive_timeout  65;
    #gzip  on;
    #include /etc/nginx/conf.d/*.conf;
    upstream gateway {
                 server 172.30.0.81;
                 server 172.30.0.82;
                 server 172.30.0.83;
             }
    server {
      listen       8000;
      server_name  web;
      root         /usr/share/nginx/html;
      location / {
      }
      location ^~ /oauth/redirect {
           rewrite ^(.*)$ /index.html break;
      }
      location ^~ /api {
           proxy_set_header Host $host;
           proxy_set_header X-Real-IP $remote_addr;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_buffering off;
           rewrite ^/api/(.*)$ /$1 break;
           proxy_pass http://gateway;
      }
    }
}
/ # 


二、你期待的结果是什么?实际看到的又是什么?

正常登录,实际api调用500错误


三、你正在使用的是什么产品,什么版本?在什么操作系统上?

BladeX/最新版/CentOS7/Docker 19


四、请提供详细的错误堆栈信息,这很重要。

浏览器报错,消息如上


五、若有更多详细信息,请在下面提供。

2条回答
  •  admin
    admin (楼主)
    2020-10-06 12:33

     1. 日志显示的是connect refused,无法链接,说明是网络的问题,无法连接到172.30.0.91的服务

     2. 所以你需要看一下nacos的服务是否都注册成功, 你需要把nacos注册的服务已经注册ip也贴一下,另外使用docker logs -f xxx看一下服务启动是否成功

     3. 如果内网地址不通,访问的话自然会拒绝请求,还有docker部署需要在一台服务器内,如果是多台服务器需要借助docker swarm来达到内网互通

     4. 部署的时候会有两个nginx,一个nginx端口8000用于部署前端,一个nginx端口88用于对外暴露网关,你可以使用宿主机ip:88端口的地址来测试一下后端接口是否已经调通,具体看:https://sns.bladex.cn/article-14982.html

     5. 你只部署了gateway1和gateway2,分别对应子网ip81、82,那么webnginx反向代理的83就可以删掉了

    作者追问:2020-10-06 12:33

    #1. docker是单台服务器安装

    #2. Nacos服务列表

    服务名

    分组名称

    集群数目

    实例数

    健康实例数

    触发保护阈值

    操作

    blade-user

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-auth

    DEFAULT_GROUP

    1

    2

    2

    false

    详情|示例代码|删除

    blade-zipkin

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-report

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-admin

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-turbine

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-desk

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-log

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-system

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    blade-gateway

    DEFAULT_GROUP

    1

    2

    2

    false

    详情|示例代码|删除

    服务名

    分组名称

    集群数目

    实例数

    健康实例数

    触发保护阈值

    操作

    blade-resource

    DEFAULT_GROUP

    1

    1

    1

    false

    详情|示例代码|删除

    #3. Nacos注册在172.30.0.48

    image.png

    #4. 88端口是通的,88/doc.html能够访问,但“授权模块”“工作台模块”出现一样的报错,只有系统模块能够正常加载:

    image.png

    #5. gateway 83已经从nginx配置中删除,还是没效果。


    回答: 2020-10-06 12:33

    那就说明是服务没有完全部署成功,你到每个docker服务使用docker logs -f xxx 看看,对应到blade-auth、blade-gateway、blade-user等等服务打印的日志是什么


    还有就是使用postman越过前端,直接调用token接口看看是否可以正确返回,如果无法返回,就说明与前端无关了,主要去排查部署的后端服务问题,具体帖子:https://sns.bladex.cn/article-14982.html


    一般情况下,docker单服务器部署遇到问题大多是网段、naocs配置未读取导致的错误,从而无法正常调用,你需要着重去排查下

    作者追问:2020-10-06 12:33

    无法访问的 8100是blade-auth服务:

    image.png

    作者追问:2020-10-06 12:33


    2020-10-06 09:34:08.243  WARN 1 --- [ing.beat.sender] com.alibaba.nacos.client.naming          : failed to request http://172.30.0.48:8848/nacos/v1/ns/instance/beat?app=blade-auth&namespaceId=public&port=8100&clusterName=DEFAULT&ip=172.30.0.91&serviceName=DEFAULT_GROUP%40%40blade-auth&encoding=UTF-8 from 172.30.0.48,

    2020-10-06 09:34:08.244 ERROR 1 --- [ing.beat.sender] com.alibaba.nacos.client.naming          : [NA] failed to request ,

    ,

    java.net.ConnectException: Connection refused (Connection refused),

    at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) ~[na:1.8.0_265],

    at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) ~[na:1.8.0_265],

    at java.net.AbstractPlainSocketImpl.connect(Unknown Source) ~[na:1.8.0_265],

    at java.net.SocksSocketImpl.connect(Unknown Source) ~[na:1.8.0_265],

    at java.net.Socket.connect(Unknown Source) ~[na:1.8.0_265],

    at sun.net.NetworkClient.doConnect(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.http.HttpClient.openServer(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.http.HttpClient.openServer(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.http.HttpClient.<init>(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.http.HttpClient.New(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.http.HttpClient.New(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(Unknown Source) ~[na:1.8.0_265],

    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source) ~[na:1.8.0_265],

    at com.alibaba.nacos.client.naming.net.HttpClient.request(HttpClient.java:82) ~[nacos-client-1.2.1.jar!/:na],

    at com.alibaba.nacos.client.naming.net.NamingProxy.callServer(NamingProxy.java:433) [nacos-client-1.2.1.jar!/:na],

    at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:482) [nacos-client-1.2.1.jar!/:na],

    at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:401) [nacos-client-1.2.1.jar!/:na],

    at com.alibaba.nacos.client.naming.net.NamingProxy.sendBeat(NamingProxy.java:343) [nacos-client-1.2.1.jar!/:na],

    at com.alibaba.nacos.client.naming.beat.BeatReactor$BeatTask.run(BeatReactor.java:108) [nacos-client-1.2.1.jar!/:na],

    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [na:1.8.0_265],

    at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.8.0_265],

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [na:1.8.0_265],

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [na:1.8.0_265],

    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_265],

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_265],

    at java.lang.Thread.run(Unknown Source) [na:1.8.0_265],

    ,

    2020-10-06 09:34:08.245 ERROR 1 --- [ing.beat.sender] com.alibaba.nacos.client.naming          : request: /nacos/v1/ns/instance/beat failed, servers: [172.30.0.48:8848], code: 500, msg: java.net.ConnectException: Connection refused (Connection refused),

    2020-10-06 09:34:08.247 ERROR 1 --- [ing.beat.sender] com.alibaba.nacos.client.naming          : [CLIENT-BEAT] failed to send beat: {"cluster":"DEFAULT","ip":"172.30.0.91","metadata":{"preserved.register.source":"SPRING_CLOUD"},"period":5000,"port":8100,"scheduled":false,"serviceName":"DEFAULT_GROUP@@blade-auth","stopped":false,"weight":1.0}, code: 500, msg: failed to req API:/api//nacos/v1/ns/instance/beat after all servers([172.30.0.48:8848]) tried: java.net.ConnectException: Connection refused (Connection refused),


    回答: 2020-10-06 12:33

     1. 先检查下nacos是否为单机模式,如果不是单机模式则改为单机(yml默认为单机模式)

     2. 是91的服务报错还是91、92都报错?如果只有91的话,先关掉91的服务,只启动92,再跑一下是否可以跑通

     3. 先把服务跑通,然后再着重排查ip联通的问题

提交回复