Kuali

When Tomcat stops responding to Apache

Today our multi-node tomcat servers became unresponsive to user/web traffic.  A quick look at our monitoring tools indicated that the tomcat servers were running healthily.  While the application administrator looked at catalina.out to see if we were missing something, I dug into the load balancer logs.  I immediately saw the following errors:

[Date] [error] ajp_read_header: ajp_ilink_receive failed
[Date] [error] (70007)The timeout specified has expired: proxy: read response failed from IP_ADDRESS:8009 (HOSTNAME)
[Date] [error] proxy: BALANCER: (balancer://envCluster). All workers are in error state for route (nodeName)

So we understood the problem, next we needed to understand why and how to fix it.  The AJP documentation confirmed that the default AJP connection pool is configured with a size of 200 and an accept count (request queue when all connections are busy) of 100. We confirmed that these matched our tomcat AJP settings.  Increasing the MaxClients setting in Apache’s configuration and a quick restart put us back in business.

Note:

Examining logs we could see that today there was a marked increase in testing activity that exposed this problem.  A further read of the Tomcat AJP documentation revealed that the connections remain open indefinitely until the client closes them unless the ‘keepAliveTimeout’ is set on the AJP connection pool.  Ideally, the AJP connections should grow as load increases and then reduce back to an optimal number when load decreases. The ‘keepAliveTimeout’ has the effect of closing all connections that are inactive.  Our keepAliveTimeout settings were set and working but I thought I should include that information here since if we didn’t have that setting this problem would most likely have manifested much earlier than it did.

Solution:

Configure Apache ‘MaxClients’ to be equal to the total number of Tomcat AJP ‘maxConnections’ accross all nodes.

This was already set, however you will also want to make sure you configure Tomcat AJP ‘keepAliveTimeout’ to close connections after a period of inactivity.

References:
Tomcat AJP: http://tomcat.apache.org/tomcat-7.0-doc/config/ajp.html
Apache MPM Worker: http://httpd.apache.org/docs/2.2/mod/worker.html

 

memcached

In support of the Kuali project.

Setting up true fail over for the Kuali application servers.  Currently if a node went down, the user would need to re-authenticate.  The following procedure configures the system so it can lose a node and the users on that node will not lose their session.

My part on the system side was fairly straightforward:

yum install memcached
iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 11211 -j ACCEPT
service iptables save
chkconfig memcached on
service memcached start

With that configured the work to enable tomcat to leverage memcached can begin:

Parts of the following information was found at (www.bradchen.com)

Download the most recent copy of the following jars (links provided) and install them to the tomcat_dir/lib directory:

For each jar, open tomcat_dir/conf/context.xml, and add the following lines inside the <Context> tag:

<Manager className="de.javakaffee.web.msm.MemcachedBackupSessionManager"
    memcachedNodes="n1:localhost:11211"
    requestUriIgnorePattern=".*.(ico|png|gif|jpg|css|js)$" />

If memcached is listening on a different port, change the value in memcachedNodes.  port 11211 is the default port for memcached.

Open tomcat_dir/conf/server.xml, look for the following lines:

<Server port="8005" ...>
    ...
    <Connector port="8080" protocol="HTTP/1.1" ...>
    ...
    <Connector port="8009" protocol="AJP/1.3" ...>

Change the ports, so the two installations listen to different ports. This is optional, but I would also disable the HTTP/1.1 connector by commenting out its <Connector> tag, as the setup documented here only requires the AJP connector to be enabled.

Finally, look for this line, also in tomcat_dir/conf/server.xml:

<Engine name="Catalina" defaultHost="localhost" ...>

Add the jvmRoute property, and assign it a value, that is different between the two installations. For example:

<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm1" ...>

And, for the second instance:

<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm2" ...>

That’s it for Tomcat configuration. This configuration uses memcached-session-manager’s default serialization strategy and enables sticky session support. For more configuration options, refer to the links in the references section.

In our apache load balancer we add the following definition:

ProxyPass /REFpath balancer://Cluster_Name
ProxyPassReverse /REFpath balancer://Cluster_Name

<Proxy balancer://Cluster_Name>
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm1  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm2  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm3  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm4  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   ProxySet lbmethod=byrequests
   ProxySet stickysession=JSESSIONID|jsessionid
   ProxySet nofailover=On
</Proxy

Note that the BalancerMember lines point to the ports and jvmRoutes configured above.  This sets up a load balancer that dispatches web requests to multiple Tomcat installations. When one of the Tomcat instance gets shutdown, requests will be served by the other one that is still up. As a result, user does not experience downtime when one of the Tomcat instances is taken down for maintenance or application redeployment.

This step also sets up sticky session. What this means is that, if user begins session with instance 1, she would be served by instance 1 throughout the entire session, unless of course this instance goes down. This can be beneficial in a clustered environment, as application servers can use session data stored locally without contacting a remote memcached.