Failover

Disk Woes

I hope to never use this document again but thought it worth documenting in case someone else has need of the information.  I powered my desktop off for a planned power outage.  When I powered it back on the system failed to boot reporting either “Error 17” or “Error 25”, in short the software raid (mirrored disks) were corrupted…  The timing of this event could not have been better.  The power outage included our data center, so I had to power over 100 systems on without my desktop!  Thank God for Live CDs!!  Following the power on there were other issues to deal with so it was almost a week before I could deal with my failed desktop.  Here is what I tried:

“sata to USB cable” since the drive was part of a raid pair this didn’t work and I didn’t waste a lot of time on it.  What it did help me discover was which disk was bad.

Knowing which disk was bad I then confirmed the failed drive using the BIOS and boot sequence on my desktop.  I confirmed it was /dev/sda that was failed.  I was able to get a replacement disk on the same size from our desktop support team.  With the new disk installed here is what I did and the results.

Boot the system to an Ubuntu Live CD

I don’t have time to add much description now but the commands and sequence should hopefully help for now.  Feel free to post a question in the comments if you have any.

sudo mdadm --query --detail /dev/md/1
sudo mdadm --assemble --scan
sudo mdadm --query --detail /dev/md/1
sudo mdadm --assemble 
sudo mdadm --assemble --scan

sudo mdadm --query --detail /dev/md/1
sudo mdadm --query --detail /dev/md/0
sudo mdadm --query --detail /dev/md/2
sudo mdadm --query --detail /dev/md/3

sudo mdadm --stop /dev/md/0
sudo mdadm --stop /dev/md/1
sudo mdadm --stop /dev/md/2
sudo mdadm --stop /dev/md/3

sudo mdadm --query --detail /dev/md/0
sudo mdadm --query --detail /dev/md/1
sudo mdadm --query --detail /dev/md/2
sudo mdadm --stop /dev/md/2
sudo mdadm --query --detail /dev/md/3
sudo mdadm --stop /dev/md/3

sudo fdisk -l

cat /proc/mdstat 
sudo mdadm --assemble --scan

cat /proc/mdstat 
sudo mount /dev/md3 /mnt
cat /proc/mdstat 
sudo mount /dev/sdb1 /mnt

sudo fdisk -l

sudo mdadm stop /dev/md/0n3

cat /proc/mdstat
sudo mdadm --manage /dev/md0 --fail /dev/sda1
sudo mdadm --manage /dev/md0 --fail /dev/sda
sudo mdadm --manage /dev/md1 --fail /dev/sda2
sudo mdadm --manage /dev/md2 --fail /dev/sda3
cat /proc/mdstat
sudo sfdisk -d /dev/sda > sda.out
sudo sfdisk -d /dev/sdb |sudo sfdisk /dev/sda
sudo sfdisk -d /dev/sda > sda.out

sudo fdisk -l
sudo mdadm --manage /dev/md0 --add /dev/sda1
sudo mdadm --manage /dev/md1 --add /dev/sda2
sudo mdadm --manage /dev/md2 --add /dev/sda3
sudo mdadm --manage /dev/md3 --add /dev/sda5
cat /proc/mdstat 
watch cat /proc/mdstat 

Every 2.0s: cat /proc/mdstat                                                                                                                                                                 Mon Aug 17 13:15:31 2015

Personalities : [raid1]
md0 : active raid1 sda1[2] sdb1[1]
      4093888 blocks super 1.1 [2/2] [UU]

md1 : active raid1 sda2[2] sdb2[1]
      819136 blocks super 1.0 [2/2] [UU]

md3 : active raid1 sda5[2] sdb5[1]
      278538048 blocks super 1.1 [2/1] [_U]
      [==============>......]  recovery = 70.4% (196127360/278538048) finish=15.0min speed=91334K/sec
      bitmap: 0/3 pages [0KB], 65536KB chunk

md2 : active raid1 sda3[2] sdb3[1]
      204668800 blocks super 1.1 [2/2] [UU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

unused devices: <none>

Good Luck

 

 

 

 

memcached

In support of the Kuali project.

Setting up true fail over for the Kuali application servers.  Currently if a node went down, the user would need to re-authenticate.  The following procedure configures the system so it can lose a node and the users on that node will not lose their session.

My part on the system side was fairly straightforward:

yum install memcached
iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 11211 -j ACCEPT
service iptables save
chkconfig memcached on
service memcached start

With that configured the work to enable tomcat to leverage memcached can begin:

Parts of the following information was found at (www.bradchen.com)

Download the most recent copy of the following jars (links provided) and install them to the tomcat_dir/lib directory:

For each jar, open tomcat_dir/conf/context.xml, and add the following lines inside the <Context> tag:

<Manager className="de.javakaffee.web.msm.MemcachedBackupSessionManager"
    memcachedNodes="n1:localhost:11211"
    requestUriIgnorePattern=".*.(ico|png|gif|jpg|css|js)$" />

If memcached is listening on a different port, change the value in memcachedNodes.  port 11211 is the default port for memcached.

Open tomcat_dir/conf/server.xml, look for the following lines:

<Server port="8005" ...>
    ...
    <Connector port="8080" protocol="HTTP/1.1" ...>
    ...
    <Connector port="8009" protocol="AJP/1.3" ...>

Change the ports, so the two installations listen to different ports. This is optional, but I would also disable the HTTP/1.1 connector by commenting out its <Connector> tag, as the setup documented here only requires the AJP connector to be enabled.

Finally, look for this line, also in tomcat_dir/conf/server.xml:

<Engine name="Catalina" defaultHost="localhost" ...>

Add the jvmRoute property, and assign it a value, that is different between the two installations. For example:

<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm1" ...>

And, for the second instance:

<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm2" ...>

That’s it for Tomcat configuration. This configuration uses memcached-session-manager’s default serialization strategy and enables sticky session support. For more configuration options, refer to the links in the references section.

In our apache load balancer we add the following definition:

ProxyPass /REFpath balancer://Cluster_Name
ProxyPassReverse /REFpath balancer://Cluster_Name

<Proxy balancer://Cluster_Name>
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm1  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm2  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm3  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   BalancerMember ajp://HOSTNAME:8009/REFpath route=jvm4  timeout=600 min=10 max=100 ttl=60 retry=120 connectiontimeout=10
   ProxySet lbmethod=byrequests
   ProxySet stickysession=JSESSIONID|jsessionid
   ProxySet nofailover=On
</Proxy

Note that the BalancerMember lines point to the ports and jvmRoutes configured above.  This sets up a load balancer that dispatches web requests to multiple Tomcat installations. When one of the Tomcat instance gets shutdown, requests will be served by the other one that is still up. As a result, user does not experience downtime when one of the Tomcat instances is taken down for maintenance or application redeployment.

This step also sets up sticky session. What this means is that, if user begins session with instance 1, she would be served by instance 1 throughout the entire session, unless of course this instance goes down. This can be beneficial in a clustered environment, as application servers can use session data stored locally without contacting a remote memcached.