When Tomcat stops responding to Apache

Today our multi-node tomcat servers became unresponsive to user/web traffic.  A quick look at our monitoring tools indicated that the tomcat servers were running healthily.  While the application administrator looked at catalina.out to see if we were missing something, I dug into the load balancer logs.  I immediately saw the following errors:

[Date] [error] ajp_read_header: ajp_ilink_receive failed
[Date] [error] (70007)The timeout specified has expired: proxy: read response failed from IP_ADDRESS:8009 (HOSTNAME)
[Date] [error] proxy: BALANCER: (balancer://envCluster). All workers are in error state for route (nodeName)

So we understood the problem, next we needed to understand why and how to fix it.  The AJP documentation confirmed that the default AJP connection pool is configured with a size of 200 and an accept count (request queue when all connections are busy) of 100. We confirmed that these matched our tomcat AJP settings.  Increasing the MaxClients setting in Apache’s configuration and a quick restart put us back in business.

Note:

Examining logs we could see that today there was a marked increase in testing activity that exposed this problem.  A further read of the Tomcat AJP documentation revealed that the connections remain open indefinitely until the client closes them unless the ‘keepAliveTimeout’ is set on the AJP connection pool.  Ideally, the AJP connections should grow as load increases and then reduce back to an optimal number when load decreases. The ‘keepAliveTimeout’ has the effect of closing all connections that are inactive.  Our keepAliveTimeout settings were set and working but I thought I should include that information here since if we didn’t have that setting this problem would most likely have manifested much earlier than it did.

Solution:

Configure Apache ‘MaxClients’ to be equal to the total number of Tomcat AJP ‘maxConnections’ accross all nodes.

This was already set, however you will also want to make sure you configure Tomcat AJP ‘keepAliveTimeout’ to close connections after a period of inactivity.

References:
Tomcat AJP: http://tomcat.apache.org/tomcat-7.0-doc/config/ajp.html
Apache MPM Worker: http://httpd.apache.org/docs/2.2/mod/worker.html