分析Centos系统下LNMP频繁502 Bad Gateway问题

最近VPS总是出现 Nginx 502 Bad Gateway 错误,导致网页无法正常访问,但FTP和SSH正常连接,很是伤脑筋!这次好好整治一下!

根据问题,应该是 php-fpm 出了问题,先查看日志文件 /usr/local/php/logs/php-fpm.log

日志内容大致如下(借用了一下28、29两天的记录):

Jan 28 22:50:00.309235 [NOTICE] fpm_unix_init_main(), line 284: getrlimit(nofile): max:1024, cur:1024
Jan 28 22:50:00.309552 [NOTICE] fpm_event_init_main(), line 88: libevent: using epoll
Jan 28 22:50:00.309617 [NOTICE] fpm_init(), line 52: fpm is running, pid 7967
Jan 28 22:50:00.310444 [NOTICE] fpm_children_make(), line 352: child 7968 (pool default) started
Jan 28 22:50:00.311328 [NOTICE] fpm_children_make(), line 352: child 7969 (pool default) started
Jan 28 22:50:00.312208 [NOTICE] fpm_children_make(), line 352: child 7970 (pool default) started
Jan 28 22:50:00.313161 [NOTICE] fpm_children_make(), line 352: child 7971 (pool default) started
Jan 28 22:50:00.314210 [NOTICE] fpm_children_make(), line 352: child 7972 (pool default) started
Jan 28 22:50:00.314242 [NOTICE] fpm_event_loop(), line 107: libevent: entering main loop
Jan 29 16:58:30.845059 [NOTICE] fpm_got_signal(), line 70: received SIGUSR2
Jan 29 16:58:30.856418 [NOTICE] fpm_pctl(), line 256: switching to 'reloading' state
Jan 29 16:58:30.856449 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 3 SIGQUIT to child 7972 (pool default)
Jan 29 16:58:30.856463 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 3 SIGQUIT to child 7971 (pool default)
Jan 29 16:58:30.856475 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 3 SIGQUIT to child 7970 (pool default)
Jan 29 16:58:30.856487 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 3 SIGQUIT to child 7969 (pool default)
Jan 29 16:58:30.856537 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 3 SIGQUIT to child 7968 (pool default)
Jan 29 16:58:30.856546 [NOTICE] fpm_pctl_kill_all(), line 181: 5 children are still alive
Jan 29 16:58:35.845629 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 15 SIGTERM to child 7972 (pool default)
Jan 29 16:58:35.845662 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 15 SIGTERM to child 7971 (pool default)
Jan 29 16:58:35.845671 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 15 SIGTERM to child 7970 (pool default)
Jan 29 16:58:35.845678 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 15 SIGTERM to child 7969 (pool default)
Jan 29 16:58:35.845686 [NOTICE] fpm_pctl_kill_all(), line 172: sending signal 15 SIGTERM to child 7968 (pool default)
Jan 29 16:58:35.845691 [NOTICE] fpm_pctl_kill_all(), line 181: 5 children are still alive
Jan 29 16:58:36.450929 [NOTICE] fpm_got_signal(), line 48: received SIGCHLD
Jan 29 16:58:36.451016 [WARNING] fpm_children_bury(), line 215: child 7968 (pool default) exited on signal 15 SIGTERM after 65319.297391 seconds from start
Jan 29 16:58:36.451062 [WARNING] fpm_children_bury(), line 215: child 7971 (pool default) exited on signal 15 SIGTERM after 65319.294705 seconds from start
Jan 29 16:58:36.451088 [WARNING] fpm_children_bury(), line 215: child 7972 (pool default) exited on signal 15 SIGTERM after 65319.293684 seconds from start
Jan 29 16:58:36.451103 [NOTICE] fpm_got_signal(), line 48: received SIGCHLD
Jan 29 16:58:36.451834 [NOTICE] fpm_got_signal(), line 48: received SIGCHLD
Jan 29 16:58:36.451868 [WARNING] fpm_children_bury(), line 215: child 7969 (pool default) exited on signal 15 SIGTERM after 65319.297343 seconds from start
Jan 29 16:58:36.451891 [WARNING] fpm_children_bury(), line 215: child 7970 (pool default) exited on signal 15 SIGTERM after 65319.296487 seconds from start
Jan 29 16:58:36.451903 [NOTICE] fpm_pctl_exec(), line 95: reloading: execvp("/usr/local/php/bin/php-cgi", {"/usr/local/php/bin/php-cgi", "--fpm", "--fpm-config", "/usr/local/php/etc/php-fpm.conf"})

满眼的NOTICE错误,据观察至后几天,错误日志都是如此!据网络资料分析说,这类错误大都是由于php线程打开文件句柄受限导致的错误,这里综合各位童鞋的分析,整理记录如下,希望能解决此类 502 问题!

首先检查一下ulimit -n的值,SSH输入命令:

# ulimit -n
返回:65535

1、提升服务器的文件句柄打开

SSH命令:# vi /etc/security/limits.conf,在结尾处添加以下内容:

* soft nofile 65535
* hard nofile 65535

2、提升nginx的进程文件打开数

# vi /usr/local/nginx/conf/nginx.conf
查看 worker_rlimit_nofile 51200;

3、修改 php-fpm.conf 配置文件

前面确认了 ulimit -n 值为 65535,/usr/local/php/etc/php-fpm.conf 中的选项 rlimit_files 确保和此数值一致。

<value name="rlimit_files">65535</value>
<value name="max_requests">10240</value>

4、修改 sysctl.conf

# vi /etc/sysctl.conf

底部添加

fs.file-max=65535

至此,重启 /root/lnmp restart 生效,看看还有没有类似错误信息出现!

ps.为减小php-fpm.log文件大小,可将 /usr/local/php/etc/php-fpm.conf 中的 Log level 由 notice 修改为 ERROR,这样能降低日志的生成速度!

Log level
    <value name="log_level">Error</value>

2012.02.20更新:貌似 502 Bad Gateway 已经消失了!!