<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Research Computing @ USF</title>
	<atom:link href="http://rc.usf.edu/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://rc.usf.edu/blog</link>
	<description>Because sometimes it takes more than a "high-end" PC to get the job done.</description>
	<lastBuildDate>Thu, 30 Apr 2009 03:13:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>2009 Updates</title>
		<link>http://rc.usf.edu/blog/?p=21</link>
		<comments>http://rc.usf.edu/blog/?p=21#comments</comments>
		<pubDate>Thu, 30 Apr 2009 03:04:02 +0000</pubDate>
		<dc:creator>brs</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=21</guid>
		<description><![CDATA[We haven&#8217;t posted here in a while and I&#8217;d like to change the frequency of our updates to this blog.  I guess its because I&#8217;m not very communicative (I suppose we need a PR person), but there&#8217;s been a lot of stuff going on behind the scenes these last few months.

&#8220;Large&#8221; Memory Sun Host
We recent [...]]]></description>
			<content:encoded><![CDATA[<p>We haven&#8217;t posted here in a while and I&#8217;d like to change the frequency of our updates to this blog.  I guess its because I&#8217;m not very communicative (I suppose we need a PR person), but there&#8217;s been a lot of stuff going on behind the scenes these last few months.</p>
<p><span id="more-21"></span></p>
<p><strong>&#8220;Large&#8221; Memory Sun Host</strong></p>
<p>We recent acquired the host formerly known as &#8220;Sunblast&#8221;, a v890 with 16 UltraSPARC IV 1.3GHz CPUs and 64GB of RAM.  We added this to the cluster in order to facilitate the execution of large memory, multi-threaded simulations using HFSS, ANSYS, and some other FEM and Multi-Physics applications.  Currently, we&#8217;re testing with a few users over HFSS.  The system had been under-utilized before, accepting logins and running XDMCP sessions for various users.  Now that it is part of the grid and is properly managed by GridEngine, we can put this hardware to work doing what it was meant to.  Hopefully, we&#8217;ll be able to add to or replace this system with some newer multi-core, multi-socket systems with even more RAM (on the order of 128GB or more).  A general announement of availability should hit the Circe and Sunblast mailing lists within the next couple of days.</p>
<p><strong>MSL Cluster</strong></p>
<p>Dr. Ivan Oleynik of the Physics department just added 36 Dual-Quad Core Opteron 2384s with 16GB of RAM on InfiniBand with the help of a DoD grant.  After ironing out some issues with the IB Verbs library and OpenMPI,  we recompiled our OpenMPI libraries to support PSM (QLogic&#8217;s user-space IB driver interface) along with OpenIB and released the system for use today.  Within a few weeks, those systems should be part of a low-priority queue where extra cycles can be utilized by other researchers.</p>
<p><strong>Storage</strong></p>
<p>After several months of testing and design, the new storage system has been brought online, providing 12TB of space for /home directories.  The storage is clustered between two NFS servers which attach to two x4500s acting as backing stores.  The x4500s are mirrored using ZFS to provide data redundancy and high-availability.</p>
<p><strong>Networking</strong></p>
<p>Our Force10 S50 switches have all been updated to the latest OS version, FTOS 7.8.1.0.  This OS is based on FreeBSD and provides greater stability, performance, and features than the previous OS, SFTOS.  We&#8217;ve also made necessary changes to our network configuration to allow for multiple uplinks to USF&#8217;s network for redundancy.  The OS upgrade alone has helped to reduce issues with dropped packets and false positives that have been plaguing our monitoring system for quite some time.</p>
<p><strong>Software</strong></p>
<p>We&#8217;re planning updates to supported applications such as Matlab, Maple, Vasp, and others as well as the decomissioning of several older versions of these and other applications and libraries.  These will be announced in the coming weeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=21</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cluster Build-Out, 2008</title>
		<link>http://rc.usf.edu/blog/?p=11</link>
		<comments>http://rc.usf.edu/blog/?p=11#comments</comments>
		<pubDate>Tue, 02 Sep 2008 16:10:26 +0000</pubDate>
		<dc:creator>brs</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=11</guid>
		<description><![CDATA[See below the fold!

Our latest and greatest project was the construction of a 120 node beowulf cluster paid for by a National Science Foundation grant.  Each node, a SunFire x4150, contains two Quad-Core Xeon X5460s, 16GB of RAM, and is connected via InfiniBand.  Full specifications are provided below.  The project involved a number of preliminary [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">See below the fold!</p>
<p style="text-align: left;"><span id="more-11"></span></p>
<p style="text-align: left;">Our latest and greatest project was the construction of a 120 node beowulf cluster paid for by a National Science Foundation grant.  Each node, a SunFire x4150, contains two Quad-Core Xeon X5460s, 16GB of RAM, and is connected via InfiniBand.  Full specifications are provided below.  The project involved a number of preliminary steps not the least of which was finding a suitable data center.  Our 12&#8242;x40&#8242;x10&#8242; cube was no longer sufficient  for running new hardware (and is, arguably, insufficient for what&#8217;s currently running in it!).  Luckily, the Central Florida Regional Data Center in the SVC building was the perfect place for this new system.  In the process of renovation and having raised floors, this would be the perfect place to jack in some more power and feed in some water for cooling.</p>
<p style="text-align: left;">To keep costs down for our HVAC needs, we went with InRow APC units.  Since USF sports a central chilled water facility, it was simply a matter of running some pipes. The units themselves book-end the cluster and there is one between each rack.</p>
<p style="text-align: left;">Here are some preliminary photos of the work.</p>
<div id="attachment_17" class="wp-caption aligncenter" style="width: 310px"><a href="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0034.jpg"><img class="size-medium wp-image-17" title="pipe_install" src="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0034.jpg" alt="Installing Chilled Water Pipes" width="300" height="225" /></a><p class="wp-caption-text">Installing Chilled Water Pipes</p></div>
<div id="attachment_18" class="wp-caption aligncenter" style="width: 310px"><a href="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0037.jpg"><img class="size-medium wp-image-18" title="inrow_rc" src="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0037.jpg" alt="APC InRow RC Units" width="300" height="225" /></a><p class="wp-caption-text">APC InRow RC Units</p></div>
<div id="attachment_19" class="wp-caption aligncenter" style="width: 310px"><a href="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0033.jpg"><img class="size-medium wp-image-19" title="breaker_box" src="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0033.jpg" alt="Electrical Panel Install" width="300" height="225" /></a><p class="wp-caption-text">Electrical Panel Install</p></div>
<div id="attachment_16" class="wp-caption aligncenter" style="width: 310px"><a href="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0046.jpg"><img class="size-medium wp-image-16" title="one_rack" src="http://rc.usf.edu/blog/wp-content/uploads/2008/09/img_0046.jpg" alt="This is one completed rack" width="300" height="225" /></a><p class="wp-caption-text">This is one completed rack</p></div>
<p>More to come soon!</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=11</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Perl-Fu: Automating ILOM SSH Sessions</title>
		<link>http://rc.usf.edu/blog/?p=10</link>
		<comments>http://rc.usf.edu/blog/?p=10#comments</comments>
		<pubDate>Fri, 16 May 2008 00:53:43 +0000</pubDate>
		<dc:creator>aastaneh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=10</guid>
		<description><![CDATA[We at USF Research Computing take pride in automating tasks as much as possible. We write script after script, tool after tool, to make our lives easier so we have more time to conquer the real challenges. One of the tasks we would like to automate with the new cluster on the way is talking [...]]]></description>
			<content:encoded><![CDATA[<p>We at USF Research Computing take pride in automating tasks as much as possible. We write script after script, tool after tool, to make our lives easier so we have more time to conquer the real challenges. One of the tasks we would like to automate with the new cluster on the way is talking to the administrative interfaces on the nodes. ILOM, or &#8220;Integrated Lights-Out Manager&#8221;, allows us to tell the machine to do real neat things remotely, like reboot and prepare to PXE. ILOMs have their own network interface and are accessible via SSH.</p>
<p>Brian and I thought- cool. Let&#8217;s script these operations through SSH then, and make our lives easier. If we can automate these tasks, it would be a lot easier to manage maintenance of 120 new machines.</p>
<p>More, after the jump.</p>
<p><span id="more-10"></span></p>
<p>Usually, when you wish to remotely execute a command with SSH, you just tack whatever you want on the end of the ssh command, and it will execute the command on the remote host and return the results to you. However, not all systems support this. Notably, ILOM interfaces that you find on switches and some Sun hardware, do not support this kind of thing:<br />
<code><br />
ssh root@ilom1 ls<br />
Password:<br />
Invalid operation<br />
</code></p>
<p>Invalid operation? But why, you ask? You can ssh into the host and execute commands there, but you can&#8217;t push a command out to be executed? What gives?</p>
<p>Simple. SSH daemons can operate in two modes- interactive and non-interactive. Interactive mode is when you log into the host and type commands there. Non-interactive is when you pass commands like the failed example above. The problem is, some ssh daemons, mostly on switch hardware and other network appliances, do not support the non-interactive mode. Which means, that if you want to admin these devices, you have to login to each of them and type in the commands, one by one.</p>
<p>Wrong.</p>
<p>Brian had a really cool idea. What if we wrote a program that initiated an interactive ssh session and we passed commands to it for it to execute? In short, the program would connect like a human would with his ssh client.</p>
<p>Enter Net::SSH::Perl, which is a module that allows you to initiate ssh sessions, and has all the features your standard ssh client has. We could just start an interactive session with it, and execute a whole bunch of commands through the script. One problem though- interactive sessions expect the user to type in the password into the TTY. That&#8217;s lame. How can we avoid that? It isn&#8217;t like we have sshkeys built into the host, and even if they could, they won&#8217;t when we first use them. I wouldn&#8217;t want to set that up for 120 machines by hand.</p>
<p>Basic UNIX knowledge tells us that every program has STDIN(program input), STDOUT (program output), and STDERR(out-of-band diagnostic messages/errors). So, in our case, what we need is to start a separate process and push our password and our commands through STDIN, and read our output through STDOUT/STDERR.</p>
<p>Perl saves the day again, with a module called IPC::Open3. It allows us to create those handles and attach them to a separate process for us to manipulate. In our case, we just want to fork. The parent process will be responsible for sending commands and aggregating output, and the child will be responsible for actually initiating the connection and invoking those commands and reading the output. We will also use IO::Select to manage reading from STDOUT/STDERR.</p>
<p>Final outcome: we can execute commands and process it&#8217;s output. We wrote a script just to print the mac address of the ILOM:<br />
<code><br />
7:0 devel # ./ssh_test<br />
MAC Address: 00:14:4F:20:D6:90<br />
</code></p>
<p>Here&#8217;s some code for you to play with: <a href="http://rc.usf.edu/blog/files/ssh_test">ssh_test</a> Edit to your heart&#8217;s content.</p>
<p><strong>A WARNING:</strong> This piece of perl does stuff that the designers of SSH did not intend. This script involves storing a password, and if you do not give this file proper permissions, you are guaranteeing a way to get rooted. <strong>You&#8217;ve been warned.</strong> (Might I suggest chmod 700 with chown root:root?)</p>
<p>With that said, Happy Hacking! <img src='http://rc.usf.edu/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=10</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ultimate Cluster Monitoring, Part I</title>
		<link>http://rc.usf.edu/blog/?p=7</link>
		<comments>http://rc.usf.edu/blog/?p=7#comments</comments>
		<pubDate>Thu, 13 Mar 2008 15:10:03 +0000</pubDate>
		<dc:creator>aastaneh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=7</guid>
		<description><![CDATA[Brian and I have been throwing around the idea of implementing a webapp that would do the following:

 Monitor Gigabit and Infiniband switch interfaces for throughput, types of traffic, errors, etc.
 Monitor UPS&#8217;s for Voltage, Battery Capacity and Temperature.
 Tie it together with SGE to see network performance on a job-by-job basis.
 Link it to [...]]]></description>
			<content:encoded><![CDATA[<p>Brian and I have been throwing around the idea of implementing a webapp that would do the following:</p>
<ul>
<li> Monitor Gigabit and Infiniband switch interfaces for throughput, types of traffic, errors, etc.</li>
<li> Monitor UPS&#8217;s for Voltage, Battery Capacity and Temperature.</li>
<li> Tie it together with SGE to see network performance on a job-by-job basis.</li>
<li> Link it to our current Ganglia installation to see status of individual nodes on a job-by-job basis.</li>
<li> Have all that information accessible in the same place and somehow have it look pretty.</li>
</ul>
<p>Well, Brian didn&#8217;t have the time to implement it himself, so he bequeathed the daunting task to me.  Let&#8217;s see how that turned out, after the jump.</p>
<p><span id="more-7"></span></p>
<p>The first step was to figure out how to harvest all of this information. Lucky for us, our Force10 Gigabit and Cisco Infiniband switches talk SNMP. I put two and two together, and I realized that MRTG was perfect for this task. The only problem was, most MRTG installations (and the config file generator) produces one giant config file. When Apache serves you the data in your browser, you have a ton of graphs to sift though. What a pain. So, my solution to this problem was this-</p>
<ul>
<li>Create a single MRTG config file for each service per interface per host. We have quite a few switches and UPS devices; meaning- lots of files. Oh, configure them to use rrdtool.</li>
<li>Write templates for each service you want to monitor.</li>
<li>Hand-hack a perl script per type of device to generate all the config files using the templates and regular expression replacement. Each perl script reads a simple newline-delimited flat file containing all the hosts you want to monitor for a particular type of device. Each output file should be named using the hostname, interface, and service.</li>
<li>Generate a global MRTG config file that reads in all the small config files and generates .rrd files. Configure crontab to run MRTG against the global file every 5 minutes.</li>
<li>Tie it all together using the 14all.cgi script by making symlinks to that script with the naming convention.</li>
</ul>
<p>Don&#8217;t worry. I&#8217;ll give you an simple example.</p>
<p><strong> 1. Generate the config files</strong></p>
<p>You want to monitor all the incoming and outgoing bits on each active interface on a switch. Here&#8217;s a template that will do this: <a href="http://rc.usf.edu/blog/files/bitstemplate">bitstemplate</a>. I would take a look at this, if I were you <img src='http://rc.usf.edu/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>See HOST and INTERFACE in that file? We are going to replace those for each host we specify and for each active interface the host has. Here&#8217;s a perl script that will generate the files for you: <a href="http://rc.usf.edu/blog/files/genswitchconf">genswitchconf</a>. It&#8217;s pretty short, and you can add more templates to load for a more comprehensive solution.</p>
<p>One more thing- make a file called switchhosts and add one hostname per line. When you run the script, the perl script will determine what interfaces are active for each host, and generate a config file for each one.</p>
<p><strong>2. Tell mrtg to use the config files</strong></p>
<p>Copy all the config files to a location, perhaps /etc/mrtg. Next, create a file called /etc/mrtg.conf using this wicked cool one-liner:<br />
<code> for i in `ls /etc/mrtg/*.cfg`; do echo "Include: $i"&gt;&gt; /etc/mrtg.cfg; done</code></p>
<p>Then put this in crontab using crontab -e:<br />
<code>*/5 * * * *  env LANG=C /path/to/mrtg /etc/mrtg.cfg --logging /var/log/mrtg.log</code></p>
<p>Your .rrd files should be automagically be generated. Check the logfile for errors.</p>
<p><strong>3. Use 14all.cgi to make it easily web-accessible</strong></p>
<p>This perl script is cool.  Stick it in the /cgi-bin/ of your webspace, and configure it to use symlinks to load the desired metric. This ain&#8217;t an Apache  or MRTG/14all.cgi tutorial, so the logistics of this is outside the scope of this article. Next, use this even more awesome one-liner to create your symlinks:<br />
<code>for i in `ls /etc/mrtg/*.cfg`; do ln -s /usr/lib/mrtg/cgi-bin/14all.cgi /usr/lib/mrtg/cgi-bin/`basename $i .cfg`.cgi &amp;&gt; /dev/null; done</code></p>
<p><strong>4. Write a perl script to generate html to browse the device</strong></p>
<p>I have a sample for the current example: <a href="http://rc.usf.edu/blog/files/genswitchhtml">genswitchhtml</a>. When you run it, you should get a single html file per switch, containing a table which links to the 14all.cgi with the desired host and interface. Copy these to your webspace. I&#8217;d write an index for all these, if I were you.</p>
<p><strong>5. Enjoy.</strong></p>
<p>It&#8217;s a basic solution, but it gets the job done. I generated config files for 8 switches, a ton of UPSs, and a few Infiniband switches, making a one-stop shop for looking at performance metrics for all these devices. In the next article, I&#8217;ll show you how to use SGE to your advantage so you can see metrics on a per-job basis. Have Fun!</p>
<p>-Amin</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=7</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Librarian-NG: Solve your unresolved linkage dependancies FAST.</title>
		<link>http://rc.usf.edu/blog/?p=6</link>
		<comments>http://rc.usf.edu/blog/?p=6#comments</comments>
		<pubDate>Thu, 24 Jan 2008 21:08:16 +0000</pubDate>
		<dc:creator>aastaneh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=6</guid>
		<description><![CDATA[We at Research Computing build and manage many kinds of software written in all kinds of languages (C, Fortran, R), and built with many types of compilers (gcc, Intel, PGI). Consequently, we have a ton of libraries. When building software to install on our systems, it usually takes a long time.  Not all developers [...]]]></description>
			<content:encoded><![CDATA[<p>We at Research Computing build and manage many kinds of software written in all kinds of languages (C, Fortran, R), and built with many types of compilers (gcc, Intel, PGI). Consequently, we have a ton of libraries. When building software to install on our systems, it usually takes a long time.  Not all developers use automake, and therefore, it makes the process hard on us. Usually, during the linking stage of the build, the build system(make, usually) will abort, kicking and screaming about some function call whose library is not present.</p>
<p>If there was only a way to speed up the process..</p>
<p><span id="more-6"></span></p>
<p>It would be great to just ask the computer:</p>
<p><strong>&#8220;Hey, You got function foobar installed?&#8221;</strong></p>
<p>And it would be even greater if the computer replied:</p>
<p><strong>&#8220;Yea, I got it, and here&#8217;s the library that has it!&#8221;</strong></p>
<p>Guess what? In the nineties a professor at USF named David Rabson implemented such a solution called Librarian. It was written in C, used ndbm as it&#8217;s database backend, and did the job. You gave it the symbol/function name, and it gave you a list of possible candidates to link against. Good times.</p>
<p>But then Brian and I looked at it. We both agreed, it did the job. But there were some key features that we wanted that were missing, primarily being able to query using the all-powerful regular expression. Since ndbm didn&#8217;t support regexes, and after some research I found that MySQL did.. one thing led to another and..</p>
<p>Enter Librarian-NG. I ended up reimplementing the whole sucker in Perl. Even edited the man pages.</p>
<p>Example:</p>
<p><code><br />
[aastaneh@host ~]$ lbnlookup malloc<br />
Symbol Name   Location<br />
---------------------------------------<br />
malloc:  /usr/lib/libc.a<br />
malloc:  /usr/lib/syslinux/com32/libcom32.a<br />
malloc:  /usr/lib64/libc.a<br />
malloc:  /opt/priv/openmpi-1.2.4/pgi-7.0-7/lib/libopen-pal.a<br />
malloc:  /opt/priv/openmpi-1.2.3-pgi-7.0-i386/lib/libopen-pal.a<br />
</code></p>
<p>Download: <a href="http://rc.usf.edu/blog/files/librarian-ng-1.0.0.tar.gz">librarian-ng-1.0.0.tar.gz</a></p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=6</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>zfs_restore: the cool way to restore from backups</title>
		<link>http://rc.usf.edu/blog/?p=5</link>
		<comments>http://rc.usf.edu/blog/?p=5#comments</comments>
		<pubDate>Wed, 16 Jan 2008 19:19:42 +0000</pubDate>
		<dc:creator>aastaneh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=5</guid>
		<description><![CDATA[Most sysadmins know that ZFS natively keeps snapshots of files, which makes it perfect for an incremental-backup solution. Unfortunately, when trying to restore a lost file using ZFS, (especially remotely in this case) the process can be rather complex (EDIT: not really complex or difficult at all&#8230; its just that it could be even easier), [...]]]></description>
			<content:encoded><![CDATA[<p>Most sysadmins know that ZFS natively keeps snapshots of files, which makes it perfect for an incremental-backup solution. Unfortunately, when trying to restore a lost file using ZFS, (especially remotely in this case) the process can be rather complex (EDIT: not really complex or difficult at all&#8230; its just that it could be even easier), which is a shame considering it&#8217;s usefulness.</p>
<p>Until today.</p>
<p>Using the powers of ZFS, rsync, and Bash, I have devised a solution which makes accessing and restoring backups so simple, even normal users can do it!</p>
<p><span id="more-5"></span>Let&#8217;s say I have an account on a multi-user system, and I accidentally deleted a file in my home directory:<br />
<code><br />
[aastaneh@login0 ~]$ rm fftw-3.1.2.tar.gz<br />
</code><br />
It&#8217;s a really important file, and I need it back right now. I call my trusty sysadmin to restore the file for me. He can either enter in a bunch of hard-to-remember esoteric commands, or do this:<br />
<code><br />
[root@backups ~]# zfs_restore aastaneh<br />
Now in ZFS File Restore Environment<br />
Restricted Command Set: dates setdate restore [file], ls, cd [dir], exit<br />
backupsh snapshot&gt;<br />
</code><br />
The admin is now given a custom command shell which has a simple set of commands. We use &#8216;dates&#8217; first to give the listing of all the snapshots present:<br />
<code><br />
backupsh snapshot&gt; dates<br />
20071206000201  20071209000143  20071212000103  20071216000135  20071218132440<br />
20071207000228  20071210000134  20071213000210  20071217000054  20071219000106<br />
20071208000251  20071211000107  20071214010339  20071218000233<br />
</code><br />
The &#8216;20071219000106&#8242; looks the most recent. We choose it with &#8217;setdate&#8217;:<br />
<code><br />
backupsh snapshot&gt; setdate 20071219000106<br />
</code><br />
Now we are in my home directory from yesterday. The admin has the ability to dig around in my home directory using &#8216;cd&#8217; and &#8216;ls&#8217;.<br />
<code><br />
backupsh 20071219000106&gt; ls<br />
.                        .ssh                     gmxtest.tgz<br />
..                       .viminfo                 gromacs-3.3.2<br />
.Xauthority              abinit-5.4.4             gromacs-3.3.2.tar.gz<br />
.bash_history            abinit-5.4.4.tar.gz      gromacs-clean<br />
.bash_logout             abinitconfigure          gromacs-configure.patch<br />
.bash_profile            configure-working        grompp.out<br />
.bashrc                  discrep.ieee             id_rsa<br />
.elinks                  discrep.reg              id_rsa.pub<br />
.forward                 discrep.reg-1030         mail<br />
.lesshst                 fftw-3.1.2               maillog<br />
.matlab                  fftw-3.1.2.tar.gz        scratch<br />
.mbox                    gmxtest-3.3.2.tgz<br />
</code><br />
We see the lost fftw-3.1.2.tar.gz, so we restore the file with &#8216;restore&#8217;(which just calls rsync on the file with the right options):<br />
<code><br />
backupsh 20071219000106&gt; restore fftw-3.1.2.tar.gz<br />
building file list ...<br />
1 file to consider<br />
fftw-3.1.2.tar.gz<br />
2736360 100%   92.08MB/s    0:00:00 (xfer#1, to-check=0/1) sent 2736798 bytes  received 42 bytes  5473680.00 bytes/sec<br />
total size is 2736360  speedup is 1.0<br />
backupsh 20071219000106&gt; exit<br />
(root@backups) Thu Dec 20 13:33:37<br />
</code><br />
Voila! My day is saved. My file is now restored to my home directory, timestamped so that the restore process does not clobber files.<br />
<code><br />
[aastaneh@login0 ~]$ ls fftw*.gz*<br />
fftw-3.1.2.tar.gz-20071219000106</code></p>
<p><strong> Requirements</strong>: The script must be running on a system with ZFS snapshots, probably a Solaris system. This script was written assuming a NIS install somewhere, but I can imagine working around that if NIS isn&#8217;t used on the network. Rsync was also used, but I also think another option can be used as well (scp?)</p>
<p><strong>Caveats</strong>:</p>
<ul>
<li>If the script does not work successfully the first time, first check with the ZFS utilities to see if a mountpoint has been defined already for the user you are trying to restore. You will have to manually perform the unmount before the script will work properly.</li>
<li> The restored file gets copied to &#8216;~user&#8217;, not to it&#8217;s respective subdirectory where the file was restored from.</li>
<li> This script needs some better sanity checks to ensure that future users don&#8217;t try to traverse up the directory structure to / or something silly
<p class="wikipage">&nbsp;</p>
<p class="buttons">&nbsp;</p>
<form method="get" action="/trac/rc/wiki/zfsRestore">
<input name="action" value="edit" type="hidden" /> </form>
<form method="get" action="/trac/rc/attachment/wiki/zfsRestore">
<p id="delete">
<input name="action" value="new" type="hidden" /></p>
</form>
<form method="get" action="/trac/rc/wiki/zfsRestore"></form>
</li>
</ul>
<p><strong>Code</strong>: <a href="http://rc.usf.edu/blog/files/zfs_restore">zfs_restore source</a></p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=5</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IPMI Monitoring for x4500 &amp; Dell PowerEdge with Nagios</title>
		<link>http://rc.usf.edu/blog/?p=4</link>
		<comments>http://rc.usf.edu/blog/?p=4#comments</comments>
		<pubDate>Mon, 14 Jan 2008 15:57:22 +0000</pubDate>
		<dc:creator>brs</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=4</guid>
		<description><![CDATA[So, you&#8217;ve got some sweet new hardware in your server room and you have it up and running with your latest and greatest production software stack.  How are you to monitor all the ins-and-outs of the hardware &#8212; fan speeds, chassis and CPU temperatures, power supply status, etc. &#8212; and even if you can [...]]]></description>
			<content:encoded><![CDATA[<p>So, you&#8217;ve got some sweet new hardware in your server room and you have it up and running with your latest and greatest production software stack.  How are you to monitor all the ins-and-outs of the hardware &#8212; fan speeds, chassis and CPU temperatures, power supply status, etc. &#8212; and even if you can get read outs of this information, what are good thresholds for every given metric?</p>
<p>Lucky for us, most new server hardware comes with some on-board hardware that provides an IPMI service.  IPMI refers to <strong>Intelligent Platform Management Interface </strong>and it provides various mechanisms for chassis power control, system event logging, hardware monitoring, and even serial console access.  On Dell systems, an integrated BMC or <strong>Board Management Controller</strong> provides the necessary hardware interface to provide the IPMI service while piggy-backed to one of the system&#8217;s NICs to provide remote accessibility.  On the Sun x4500, the ILOM or <strong>Integrated Lights-Out Management</strong> unit provides the IPMI services through an attached service processor with its own NIC and on-board operating system (that happens to be Linux).</p>
<p><span id="more-4"></span>The really cool thing is that vendors who implement IPMI like Sun and Dell tend to do a pretty complete job populating all of the potential data points for the IPMI SDR (<strong>Sensor Data Record</strong>) and they also build-in factory specified thresholds for determining the failure of any particular component.  All temperatures, fan speeds, status indicators, etc. have established nominal operating ranges and IPMI has a very easy way to determine if a device is operating within those established parameters.</p>
<p>The first thing that we&#8217;ll have to do to set up IPMI monitoring through Nagios is, of course, to make sure that you have a ready and running copy of Nagios running somewhere and that you are familiar with adding plugins to Nagios&#8217; command list.  If you don&#8217;t get started reading here: <a href="http://www.nagios.org/docs/">http://www.nagios.org/docs</a></p>
<p>The next thing you&#8217;ll need is a current installation of OpenIPMI and OpenIPMI tools (only necessary for Linux, in this case).  It will be useful to have this installed on each Linux host with an IPMI adapter such as the Dell PowerEdge server.  With CentOS, you need only issue the following yum commands to get this software:</p>
<p><code>[root@host ~]# yum install OpenIPMI OpenIPMI-tools</code></p>
<p>This will provide the necessary command line tools such as &#8216;ipmitool&#8217; and the init script we&#8217;ll use to load the appropriate kernel modules.  Once you&#8217;ve installed these packages, you can edit and run the following script to establish a base configuration for your IPMI adapter:</p>
<p><code> #!/bin/bash<br />
# Amin Astaneh, Research Computing, USF<br />
# impiscript: configures ipmi for network access on a node.<br />
CHANNEL=1<br />
#<br />
# First, ensure that all the modules are loaded<br />
service ipmi restart<br />
#<br />
# Use NIS to determine what IP address to use. This merely adds 200 to the third octet.<br />
HOST=`hostname`<br />
#IPADDR=`ypcat hosts.byaddr | awk "/$HOST/ { str=\\$1; split(str,ip,\".\"); printf \"%s.%s.%s.%s\", ip[1], ip[2], ip[3] + 200, ip[4] }"`<br />
#<br />
# In this case, just hard-code whichever address you want for the adapter<br />
#<br />
IPADDR=x.x.x.x<br />
NETMASK=y.y.y.y<br />
GATEWAY=z.z.z.z<br />
#<br />
ROOT_PW='XXXXXXXXXX'<br />
NAGIOS_PW='XXXXXXXXXX'<br />
#<br />
# The package name is OpenIPMI-tools<br />
ipmitool lan set $CHANNEL ipaddr $IPADDR<br />
ipmitool lan set $CHANNEL ipsrc static<br />
ipmitool lan set $CHANNEL netmask $NETMASK<br />
ipmitool lan set $CHANNEL defgw ipaddr $GATEWAY<br />
ipmitool user set password 2 "$ROOT_PW"<br />
ipmitool lan set $CHANNEL access on<br />
ipmitool user set name 3 nagios<br />
ipmitool user set password 3 "$NAGIOS_PW"<br />
ipmitool channel setaccess 1 3 privilege=3 ipmi=on<br />
ipmitool mc reset cold<br />
#</code></p>
<p>I&#8217;ve seen that in many cases, you&#8217;ll need to completely power-cycle the box for the IPMI adapter to actually work.  This can mean either a full reboot or a complete unplugging of the system power.</p>
<p>Before we get down to playing with this device, what about configuring the ILOM on the x4500?  We&#8217;ll, you&#8217;ll first want to read through the <strong>Sun Lights-Out Manager (ILOM) Administration Guide</strong> provided here <a href="http://docs.sun.com/app/docs/coll/x4500-rels-ilom">http://docs.sun.com/app/docs/coll/x4500-rels-ilom</a>.  You&#8217;ll want to set up a basic configuration where you can log in via ssh as administrator to the device.  This means we&#8217;ll only be preoccupied with setting up a user account.  Go ahead and log into the device and issue the following commands to create the &#8216;nagios&#8217; user:</p>
<p><code>-&gt; cd /SP/users<br />
-&gt; create nagios<br />
-&gt; cd nagios<br />
-&gt; set password='XXXXXXXXX'<br />
-&gt; set role=Operator<br />
-&gt; exit</code></p>
<p>You&#8217;ll now be able to use the &#8216;nagios&#8217; users on both the Dell BMCs and the Sun ILOM to access IPMI SDR information for monitoring the devices.  Lets have a look at some sample output from &#8216;ipmitool&#8217; (see ipmitool(1) in the man pages for information on command line syntax).</p>
<p>Here is an example using &#8216;ipmitool&#8217; on a Dell PowerEdge server:</p>
<p><code> [user@host ~]$ ipmitool -H x.x.x.x -U nagios -P 'XXXXXXXXX' -L OPERATOR -I lan sdr list all<br />
Temp             | disabled          | ns<br />
Temp             | disabled          | ns<br />
Ambient Temp     | 25 degrees C      | ok<br />
CMOS Battery     | 0x00              | ok<br />
VCORE            | 0x01              | ok<br />
VDDIO            | 0x01              | ok<br />
VDDA             | 0x01              | ok<br />
VTT              | 0x01              | ok<br />
VCORE            | 0x01              | ok<br />
VDDIO            | 0x01              | ok<br />
...</code></p>
<p>The nice thing about &#8217;sdr list full&#8217; is that all available metrics are read and processed against the built-in threshold values.  This makes for very easy parsing.  The fields with &#8216;ns&#8217; in the 3rd column are obviously unavailable so we can grep them out pretty easily.  This standard is also followed for the x4500.  The only difference is the LAN interface used (see option &#8216;-I lan&#8217; in the above command line).  For the x4500, we&#8217;ll be using the LANplus interface.  Here&#8217;s an example:</p>
<p><code> [user@host ~]$ ipmitool -H x.x.x.x -U nagios -P 'XXXXXXXXX' -L OPERATOR -I lanplus sdr list all<br />
proc.p0.t_core   | 51 degrees C      | ok<br />
proc.p1.t_core   | 49 degrees C      | ok<br />
dbp.t_amb        | 25 degrees C      | ok<br />
io.front.t_amb   | 39 degrees C      | ok<br />
io.rear.t_amb    | 40 degrees C      | ok<br />
proc.front.t_amb | 29 degrees C      | ok<br />
proc.rear.t_amb  | 34 degrees C      | ok<br />
ft0.prsnt        | 0x02              | ok<br />
ft0.f0.speed     | 7700 RPM          | ok<br />
ft0.f1.speed     | 7800 RPM          | ok<br />
...</code></p>
<p>Well anyway, Nagios plugins in bash are incredibly easy to write and since we have all of the output we&#8217;ll ever need for monitoring purposes, lets just make the &#8216;ipmitool&#8217; commands above the basis for our monitoring scripts.  Here&#8217;s one for the Dell&#8217;s:</p>
<p><code>#!/bin/bash<br />
###################################<br />
# check_x4500<br />
#<br />
# Checks status of machine via ipmi<br />
#<br />
# Amin Astaneh, aastaneh@rc.usf.edu<br />
# 10-01-2008<br />
###################################<br />
# We can add 200 to the third octet for our Dell hosts so that we don't have to create a separate nagios host for the ipmi adapter<br />
HOST=`echo $1 | awk -F'.' '{ printf "%s.%s.%s.%s", $1, $2, $3 + 200, $4 }'`<br />
LOGS=/usr/share/nagios/logs<br />
IPMI="ipmitool -I lan -H $HOST -U nagios -P rc_ipmi_info -L OPERATOR"<br />
RESULTS=$($IPMI sdr list all | egrep -v '^.*\|.*\|.*(ok|ns)$' | awk -F'|' '{ print $1":"$2":"$NF}');<br />
#<br />
if [ -n "$RESULTS" ]; then<br />
echo "######### Status Display   ###########" &gt; $LOGS/$HOST.log<br />
echo $RESULTS &gt;&gt; $LOGS/$HOST.log<br />
echo "######### System Event Log ###########" &gt;&gt; $LOGS/$HOST.log<br />
$IPMI sel list &gt;&gt; $LOGS/$HOST.log<br />
echo "WARNING: &lt;a href=\"https://nagios.rc.usf.edu/logs/$HOST.log\"&gt;See Logfile&lt;/a&gt;";<br />
exit 1;<br />
else<br />
echo "OK: All Components Online."<br />
exit 0;<br />
fi</code></p>
<p>In this script, we simply look for lines in the output of <strong>sdr list full</strong> that contain something other than &#8216;ok&#8217; or &#8216;ns&#8217; in the 3rd column.  If a result is returned, we know something is awry and we print that line containing a URL to a log file along with WARNING or CRITICAL (which Nagios uses to determine if there is a problem).  The log file contains the spurious outputs from the command as well as a listing of the system event log.</p>
<p>With a couple small modifications, this same script can be used with the x4500 ILOM as we see here:</p>
<p><code> #!/bin/bash<br />
###################################<br />
# check_x4500<br />
#<br />
# Checks status of machine via ipmi<br />
#<br />
# Amin Astaneh, aastaneh@rc.usf.edu<br />
# 10-01-2008<br />
###################################<br />
#<br />
HOST=$1<br />
LOGS=/usr/share/nagios/logs<br />
IPMI="ipmitool -I lanplus -H $HOST -U nagios -P rc_ipmi_info -L OPERATOR"<br />
RESULTS=$($IPMI sdr list all | egrep -v '^.*\|.*\|.*(ok|ns)$' | \<br />
awk -F'|' '{ print $1":"$2":"$NF}');<br />
#<br />
if [ -n "$RESULTS" ]; then<br />
echo "######### Status Display   ###########" &gt; $LOGS/$HOST.log<br />
echo $RESULTS &gt;&gt; $LOGS/$HOST.log<br />
echo "######### System Event Log ###########" &gt;&gt; $LOGS/$HOST.log<br />
$IPMI sel list &gt;&gt; $LOGS/$HOST.log<br />
echo "CRITICAL: &lt;a href=\"https://nagios.rc.usf.edu/logs/$HOST.log\"&gt;See Logfile&lt;/a&gt;";<br />
exit 1;<br />
else<br />
echo "OK: All Components Online."<br />
exit 0;<br />
fi</code></p>
<p>Here&#8217;s the same thing only the ILOMs are defined as their own host in Nagios (so we don&#8217;t need to do any address translation) and we use the LANplus interface to communicate with the device.</p>
<p>Now, all thats left to do is to add this plugin to Nagios and assign it as a service to the appropriate hosts.  Now you have complete monitoring of all the hardware&#8217;s vital statistics while utilizing the factory-specified ranges for determining thresholds with a very minimal amount of scripting.  A big thanks to Amin for getting these scripts written.</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=4</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kickstart and ssh</title>
		<link>http://rc.usf.edu/blog/?p=3</link>
		<comments>http://rc.usf.edu/blog/?p=3#comments</comments>
		<pubDate>Fri, 07 Dec 2007 21:17:43 +0000</pubDate>
		<dc:creator>brs</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.usf.edu/blog/?p=3</guid>
		<description><![CDATA[Have you ever set up a remote, automated kickstart system only to have it fail on some esoteric piece of new-fandangled hardware?  Were you then disappointed to find that while kickstart supports VNC connections, it did not allow ssh connections so that you could get a list of all the bleeding-edge, barely-supported hardware off [...]]]></description>
			<content:encoded><![CDATA[<p>Have you ever set up a remote, automated kickstart system only to have it fail on some esoteric piece of new-fandangled hardware?  Were you then disappointed to find that while kickstart supports VNC connections, it did not allow ssh connections so that you could get a list of all the bleeding-edge, barely-supported hardware off the box and run some diagnostics in a shell?  Well, be disappointed no more!</p>
<p><span id="more-3"></span>Enabling ssh connections to an automated kickstart configuration is fairly painless.  To start, you&#8217;ll need a few things:</p>
<ol>
<li>A copy of /usr/sbin/sshd that is accessible.  If you are installing over NFS, this can be copied into your distribution directory.  If you are installing over HTTP or FTP, you&#8217;ll need to use curl to download the binary from some location</li>
<li>A basic sshd_config file, ssh_host_key and ssh_host_key.pub</li>
<li>An ssh public key that can be used to authenticate a root login during the installation</li>
</ol>
<p>A really nice thing about the base system that is contained in an Anaconda install is that it contains all of the necessary libraries to fire up an sshd demon during the installation.  This made our work incredibly easy.  At this point, we assume that you have a working ks.cfg that you push out to your remote hosts for automated installations.  What you&#8217;ll need to do is add a %pre script to the file that will configure sshd and start the daemon during the very begging of the installer.  The following is part of an example ks.cfg file.  Pay attention to the section that says &#8220;Lets enable sshd&#8221;:</p>
<pre><code>%pre
echo "root:x:0:0:root:/root:/bin/sh" &gt; /etc/passwd
echo 'root:&lt;your_hash_here&gt;:13732:0:99999:7:::' &gt; /etc/shadow
echo "sshd:x:74:74::/var/empty/sshd:/sbin/nologin" &gt;&gt; /etc/passwd

# Make necessary directories and files for logins
mkdir -p /var/empty/sshd
chown root /var/empty/sshd
chmod 700 /var/empty/sshd
mkdir -p /var/log
touch /var/log/btmp
chmod 600 /var/log/btmp
chmod 400 /etc/shadow

# Add public key support
mkdir -p /root/.ssh
chmod 700 /root/.ssh
cp /mnt/source/ssh/authorized_keys2 /root/.ssh

# Start sshd
/mnt/source/ssh/sshd -f /mnt/source/ssh/sshd_config</code></pre>
<p>
For this script to work, you&#8217;ll need a bunch of things that this document wont cover.  We&#8217;ll focus strictly on the sshd part.  In this script, we require a directory, $KSROOT/ssh where $KSROOT is the directory where your CentOS/RedHat/Fedora installation media is located.  The following files will need to be in this directory</p>
<ul>
<li>authorized_keys2: Put a public key in here for authentication purposes</li>
<li>sshd_config: A basic sshd configuration.  An example is provided below</li>
<li>ssh_host_rsa_key: You can copy this from an existing installation that you want to mirror or generate a new one with ssh-keygen</li>
<li>sshd: The sshd binary from an install of the OS you are installing with Anaconda</li>
</ul>
<p>The following sshd_config should be sufficient for most cases:</p>
<pre><code>Port 22
Protocol 2

# Logging
SyslogFacility AUTH
LogLevel ERROR

# Authentication:
PermitRootLogin yes

HostKey /mnt/source/ssh/ssh_host_rsa_key

# To disable tunneled clear text passwords, change to no here!
PasswordAuthentication yes

# Change to no to disable s/key passwords
ChallengeResponseAuthentication no

UsePAM no

# Accept locale-related environment variables
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL</code></pre>
<p></p>
<p align="left">Obviously, you should not expect these configuration files to be drop-in-ready and you will need to do a little bit of tweaking to get them to play nicely with your environment.  The nice thing is, now, you can remotely login to your host as it is installing (or, in this case, if the install fails) to get a list of the hardware, check the kernel ring buffer with dmesg, and run some basic diagnostics.  Even though VNC allows you to see the installer, you cannot use a shell.  This is, to my knowledge, the best way to get a remote shell during an Anaconda Kickstart install.</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.usf.edu/blog/?feed=rss2&amp;p=3</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
