<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-212269981320258846</id><updated>2011-12-22T07:33:46.710-08:00</updated><category term='Script.aculo.us'/><category term='openid'/><category term='postgresql'/><category term='s3'/><category term='dd'/><category term='rsync'/><category term='dtd'/><category term='acl'/><category term='boost'/><category term='upgrade'/><category term='pack'/><category term='windows 7'/><category term='rewrite'/><category term='exceptions'/><category term='firefox'/><category term='persistent connection'/><category term='sshfs'/><category term='make'/><category term='freebase'/><category term='css'/><category term='spring'/><category term='c++0x'/><category term='sun'/><category term='virtual'/><category term='email'/><category term='aws'/><category term='unison'/><category term='c++'/><category term='binlog'/><category term='bind'/><category term='static publishing'/><category term='xml'/><category term='ami'/><category term='xfs'/><category term='ext3'/><category term='mql'/><category term='fastcgi'/><category term='shell scripting'/><category term='mysql'/><category term='java'/><category term='cloudfront'/><category term='utf-8'/><category term='ocsp'/><category term='prototype.js. jQuery'/><category term='oracle'/><category term='mvc'/><category term='struts'/><category term='ha'/><category term='longhorn'/><category term='ssl'/><category term='connection pool'/><category term='ubuntu'/><category term='verisign'/><category term='json'/><category term='ide'/><category term='google'/><category term='subversion'/><category term='dom'/><category term='jdbc'/><category term='shibboleth'/><category term='javascript'/><category term='ec2'/><category term='beanshell'/><category term='perl'/><category term='persistentfs'/><category term='hosts'/><category term='pivot'/><category term='gzip'/><category term='wsdl'/><category term='snapshot'/><category term='dbi'/><category term='memcache'/><category term='cmake'/><category term='debian'/><category term='kiss'/><category term='iostreams'/><category term='nfs'/><category term='firewall'/><category term='schwartz'/><category term='kdevelop'/><category term='catalog'/><category term='ebs'/><category term='crl'/><category term='apache'/><category term='jsonp'/><category term='cvs'/><category term='php'/><category term='sockets'/><category term='mount'/><category term='jsp'/><category term='lucene'/><category term='yslow'/><category term='cdn'/><category term='mt'/><category term='nas'/><category term='seo'/><category term='elb'/><category term='search'/><category term='standards'/><category term='tse'/><category term='psp'/><category term='simpledb'/><category term='ssi'/><category term='sqs'/><title type='text'>Bimport Throtz</title><subtitle type='html'>The musings of a software developer on Castle Hill, Shaftesbury, Dorset, England.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>46</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-5531266501275454887</id><published>2011-12-22T07:30:00.000-08:00</published><updated>2011-12-22T07:33:46.713-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='windows 7'/><category scheme='http://www.blogger.com/atom/ns#' term='hosts'/><title type='text'>Nine hosts in %SystemRoot%\System32\Drivers\etc\hosts</title><content type='html'>If you are running a web server locally for development on your Windows XP/Vista/7 system and you have set up (say) &lt;b&gt;local.yet-another-site.com&lt;/b&gt; locally as a development site for &lt;b&gt;www.yet-another-site.com&lt;/b&gt; for the 9th time, you might scratch your head and wonder why DNS resolution isn't working.&lt;br /&gt;&lt;br /&gt;I've not found Google forth-coming on this matter, but the problem is that a maximum of nine hosts can be listed next to an IP address in &lt;b&gt;%SystemRoot%\System32\Drivers\etc\hosts&lt;/b&gt;, and that means that with the following:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;pre&gt;&lt;code&gt;127.0.0.1&amp;nbsp;&amp;nbsp; &amp;nbsp;localhost h1 h2 h3 h4 h5 h6 h7 h8 h9 h10&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;...you are able to PING localhost, h1, h2, .. h8 but h9 and h10 do not resolve.&lt;br /&gt;&lt;br /&gt;Don't go soul-searching to identify your top 10 and shuffle your list of development hosts, though. Happily you can have this:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;pre&gt;&lt;code&gt;127.0.0.1&amp;nbsp;&amp;nbsp;&amp;nbsp; localhost h1 h2 h3 h4 h5 h6 h7 h8 &lt;br /&gt;127.0.0.1&amp;nbsp;&amp;nbsp;&amp;nbsp; h9 h10&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;Yes, it is a bit pathetic, and the Darwinistas of the Mac world are laughing at Windows users once again... but this works.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-5531266501275454887?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/5531266501275454887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/12/nine-hosts-in-systemrootsystem32drivers.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5531266501275454887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5531266501275454887'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/12/nine-hosts-in-systemrootsystem32drivers.html' title='Nine hosts in %SystemRoot%\System32\Drivers\etc\hosts'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7689580313105169691</id><published>2011-11-02T04:52:00.000-07:00</published><updated>2011-11-02T04:53:52.442-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='windows 7'/><category scheme='http://www.blogger.com/atom/ns#' term='virtual'/><title type='text'>Overcoming a Network Filter limit on a Windows 7 DEV box</title><content type='html'>In the past, my development set-up has typically consisted of a main development machine with my usual development tools and a Linux &lt;i&gt;puppy&lt;/i&gt; sitting beside it. The puppy is there to act as a server for my code. However, the might of my DEV box now is such that I really ought to be making better use out of virtualisation and use some of the redundant CPU power of my DEV box to run the server too.&lt;br /&gt;&lt;br /&gt;I find Oracle's &lt;a href="https://www.virtualbox.org/wiki/VirtualBox"&gt;VM VirtualBox&lt;/a&gt; nice to work with. Microsoft's Virtual PC isn't great for Linux, and you can even run a 64 bit server in VirtualBox. &lt;br /&gt;&lt;br /&gt;I run &lt;b&gt;server software&lt;/b&gt; in my puppy. I want to run clients from the host PC, but I'd like to be able to run other clients on my LAN too. To achieve that, I need to run &lt;a href="http://www.virtualbox.org/manual/ch06.html#idp12181312"&gt;bridged networking&lt;/a&gt;, with a static IP address assigned to my virtual server; it makes no sense to run DCHP and then have to mess around with HOSTS files in the various clients. I can write my service code in Windows, have it run in a Linux server, and then test it with various devices on my LAN.&lt;br /&gt;&lt;br /&gt;I was surprised to run into a "Filters currently installed on the system have reached the limit" error when trying to install the bridged adapter filter. I was &lt;a href="https://forums.virtualbox.org/viewtopic.php?f=7&amp;amp;t=29365"&gt;not alone&lt;/a&gt;. I needed to install the VirtualBox Bridged Networking driver, but I'd hit a limit, reached because of the necessity of running a Cisco VPN client amongst other bits and pieces for development. Fortunately there is a &lt;i&gt;soft limit &lt;/i&gt;set to 8 for &lt;b&gt;MaxNumfilters&lt;/b&gt; in &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network&lt;/span&gt;, but &lt;a href="http://social.technet.microsoft.com/Forums/en/w7itpronetworking/thread/4deb27fc-33ce-4fc0-a26f-3fec5b57733d"&gt;this can be increased to the hard limit&lt;/a&gt;, which is 14 (or 0xe in hexadecimal) in Windows 7 via RegEdit. With &lt;b&gt;MaxNumfilters&lt;/b&gt; set to its 14 hard limit, I can power down my old puppy and virtualise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7689580313105169691?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7689580313105169691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/11/overcoming-network-filter-limit-on.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7689580313105169691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7689580313105169691'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/11/overcoming-network-filter-limit-on.html' title='Overcoming a Network Filter limit on a Windows 7 DEV box'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-6079663429013385731</id><published>2011-10-23T09:08:00.000-07:00</published><updated>2011-10-24T13:19:40.896-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='boost'/><category scheme='http://www.blogger.com/atom/ns#' term='ubuntu'/><title type='text'>Boost 1.47 on Ubuntu Oneiric</title><content type='html'>&lt;span style="font-size: large;"&gt;Why install 1.47?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;[K]Ubuntu Oneiric&lt;/span&gt; &lt;/span&gt;comes with Boost 1.46 rather than 1.47. If you want to work with &lt;i&gt;boost_asio&lt;/i&gt;, that's a nuisance, because the release documentation assumes you are already on 1.47. I avoid changing my set-up from the norm, if I can help it, but this upgrade/installation is a necessary evil.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;What's the &lt;i&gt;recommended&lt;/i&gt; quick upgrade approach?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;This is a bum steer.&amp;nbsp; Don't start typing.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;The Personal Package Archives (PPAs) include a repository called PurpleKarrot, which you can add to your repositories &lt;a href="http://stackoverflow.com/questions/6605754/building-boost-on-linux"&gt;as follows&lt;/a&gt;:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;pre class="lang-c prettyprint"&gt;&lt;code&gt;&lt;span class="pln"&gt;sudo add&lt;/span&gt;&lt;span class="pun"&gt;-&lt;/span&gt;&lt;span class="pln"&gt;apt&lt;/span&gt;&lt;span class="pun"&gt;-&lt;/span&gt;&lt;span class="pln"&gt;repository ppa&lt;/span&gt;&lt;span class="pun"&gt;:&lt;/span&gt;&lt;span class="pln"&gt;purplekarrot&lt;/span&gt;&lt;span class="pun"&gt;/&lt;/span&gt;&lt;span class="pln"&gt;ppa&lt;/span&gt;&lt;span class="pun"&gt;&lt;/span&gt;&lt;span class="pln"&gt;&lt;br /&gt;sudo apt&lt;/span&gt;&lt;span class="pun"&gt;-&lt;/span&gt;&lt;span class="pln"&gt;get update&lt;/span&gt;&lt;span class="pun"&gt;&lt;/span&gt;&lt;span class="pln"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;To fetch the latest packages you &lt;i&gt;ought&lt;/i&gt; to be able to do the following:&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;pre class="lang-c prettyprint"&gt;&lt;code&gt;&lt;span class="pln"&gt;sudo apt&lt;/span&gt;&lt;span class="pun"&gt;-&lt;/span&gt;&lt;span class="pln"&gt;get upgrade&lt;/span&gt;&lt;span class="pln"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;span style="font-size: small;"&gt;However, on Kubuntu you see the message '&lt;/span&gt;The following packages have been kept back&lt;span style="font-size: small;"&gt;', and sure enough none of the 1.47 packages are installed&lt;/span&gt;. I've not tries this with Ubuntu. It is conceivable that KDE depends on boost, and that's the cause of the conflict.&lt;br /&gt;&lt;br /&gt;Don't be naive, like me, and assume that&lt;strike&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt; to upgrade your Boost libraries you need to do a &lt;a href="http://www.debian-administration.org/articles/69%20"&gt;dist-upgrade&lt;/a&gt;:&lt;/span&gt; &lt;/span&gt;&lt;/strike&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;pre class="lang-c prettyprint"&gt;&lt;strike&gt;&lt;code&gt;&lt;span class="pln"&gt;sudo apt&lt;/span&gt;&lt;span class="pun"&gt;-&lt;/span&gt;&lt;span class="pln"&gt;get dist-upgrade&lt;/span&gt;&lt;span class="pln"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/code&gt;&lt;/strike&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;strike&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;This upgrades all packages which have newer ones available and satisfies &lt;/span&gt;&lt;/span&gt;&lt;/strike&gt;&lt;br /&gt;&lt;strike&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;dependencies.&lt;/span&gt;&lt;/span&gt;&lt;/strike&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;You'll get a ton of dependency errors, as indicated in a small print comment &lt;a href="http://askubuntu.com/questions/61384/where-do-i-find-an-up-to-date-version-of-boost%20"&gt;here&lt;/a&gt;. Alarming though that is, removing the repository from &lt;i&gt;/etc/apt/sources.d&lt;/i&gt; is quite painless, and an &lt;i&gt;apt update&lt;/i&gt; shows the system is healthy.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="font-size: large;"&gt;So what's the actual approach we need to make?&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;It looks like we need to use the &lt;i&gt;old Icelandic supermarket&lt;/i&gt; Bjam to build the libraries with a 1.47 Boost download, and keep the installation separate from the system. Bearing in mind the system has 1.46 installed, we need to be sure to have a complete installation to avoid pulling in 1.46 artifacts from the system.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;The download takes a while and the build takes a while. Best to make tea during the download and do some gardening during the build. I created ~/lib/boost_1_47_0 in my home directory, which left me needing to set up CMakeLists.txt as follows:&lt;/span&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;span style="font-size: small;"&gt;set (BOOST_DIR "$ENV{HOME}/lib/boost_1_47_0")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;set (BOOST_LIBDIR "${BOOST_DIR}/stage/lib")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;include_directories("${BOOST_DIR}")&lt;/span&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span style="font-size: small;"&gt;link_directories("${BOOST_LIBDIR}")set (EXTRA_LIBS ${EXTRA_LIBS} boost_system boost_filesystem boost_thread pthread)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;add_executable(XXXXX main.cpp)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;target_link_libraries(XXXXX ${EXTRA_LIBS})&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;This works nicely for me, allowing my project to use 1.47 and doesn't conflict with the 1.46 system libraries.&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-6079663429013385731?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/6079663429013385731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/10/boost-147-on-ubuntu-oneiric.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6079663429013385731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6079663429013385731'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/10/boost-147-on-ubuntu-oneiric.html' title='Boost 1.47 on Ubuntu Oneiric'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-4774851409809632547</id><published>2011-10-23T07:30:00.000-07:00</published><updated>2011-10-23T07:30:20.749-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ide'/><category scheme='http://www.blogger.com/atom/ns#' term='c++0x'/><category scheme='http://www.blogger.com/atom/ns#' term='c++'/><category scheme='http://www.blogger.com/atom/ns#' term='subversion'/><category scheme='http://www.blogger.com/atom/ns#' term='kdevelop'/><category scheme='http://www.blogger.com/atom/ns#' term='cmake'/><category scheme='http://www.blogger.com/atom/ns#' term='cvs'/><category scheme='http://www.blogger.com/atom/ns#' term='make'/><title type='text'>KDevelop Impressions</title><content type='html'>&lt;span style="font-size: large;"&gt;Where am I coming from?&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;I occasionally try shaking things up in my development environment, to check whether I'm not missing out on things, clinging onto  Luddite command-line ways. In that spirit I decided that it was time to look at KDE's KDevelop, which is now mature-looking in KDevelop4, and see what it feels like creating a new C++ project in that environment.&lt;br /&gt;&lt;br /&gt;What I wanted to do was to set up a project that allows me to breath life back into a &amp;gt;10 year old C/C++ project, which would benefit from having C++0x goodness applied to it, and being recoded from the ground up. My existing project consists of two Makefiles (GNU/UN*X and Win32), source code, assets and no project files, because it wasn't built in an IDE. I was attracted to KDevelop, because it embraces cross-platform compilation, and thought that maybe I'd benefit from whetever we are supposed to call &lt;a href="http://en.wikipedia.org/wiki/IntelliSense"&gt;&lt;i&gt;intellisense&lt;/i&gt;&lt;/a&gt; in the Open Source world.&lt;br /&gt;&lt;br /&gt;However, one of things that puts me off IDEs, is that you often find &lt;i&gt;the tail wagging the dog&lt;/i&gt;, and sure enough there were some new practices forced on me that I wouldn't have chosen otherwise.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Version Control&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;The first of these was version control.&lt;br /&gt;&lt;br /&gt;I was pleased to see that CVS was available and a version control option, and I thought that I could use my familiar repository for the experiment, CVS &lt;i&gt;preview version 1.12.9&lt;/i&gt;. After some floundering, I found that KDevelop wasn't going to play ball with my repository; it uses fully-qualified client-side paths, which chokes my version of CVS.&lt;br /&gt;&lt;br /&gt;I didn't really want to look into upgrading/downgrading CVS on the server and also didn't want to dig too deep into KDevelop to see if I could get it to use relative paths. I am a great believer in &lt;i&gt;going with the flow&lt;/i&gt;, and I use Subversion and Git for my main client, so I thought there would be safety in numbers setting up Subversion locally. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://svnbook.red-bean.com/"&gt;Setting up&lt;/a&gt; Subversion on a Linux box is of course a snap, and initialising a respository locally even more so. This was familiar territory. Making it play ball with the KDevelop IDE was less smooth.&lt;br /&gt;&lt;br /&gt;My first surprise was that creating a new project with Subversion version control, expects the repository module to exist already. Using the console is comfortable for me, but I expected to be able to create a repository module, when creating the project itself in KDevelop. It looks like &lt;a href="http://home.messiah.edu/%7Esh1278/docs/Subversion_with_KDevelop.html"&gt;you were able to do this with KDevelop3&lt;/a&gt;, but KDevelop4 (version 4.7.1) only appears to let you create a project by checking out an existing module. So I created a directory at the command line and imported it into Subversion and checked out that module to get me started.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;CMake&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I've been aware of CMake for some time, but hadn't considered it for my own projects.&lt;br /&gt;&lt;br /&gt;I was interested to see that CMakeLists.txt (CMake's equivalent of a Makefile) sits at the heart of the KDevelop project. Along with the minimal &lt;i&gt;MyProject&lt;/i&gt;.kdev4 project file, I was pleased to see that appears to be all that KDevelop needs for the project. Human-readable text files, which are brief. Very nice.&lt;br /&gt;&lt;br /&gt;It made sense to get to grips with CMake outside the IDE, so that I could focus on CMake concerns without stumbling on IDE issues, and that's what I did following a &lt;a href="http://www.cmake.org/cmake/help/cmake_tutorial.html"&gt;basic tutorial&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I am very pleased with CMake. I wish I'd started using it a long time ago. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Linkage&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This isn't really relevant to KDevelop, but when you try something new other new things often get dragged in, unexpected. &lt;br /&gt;&lt;br /&gt;To get to work with CMake, I knew that I would need to link the static libraries, so I pulled in some basic experimental threads code for &lt;i&gt;boost_asio&lt;/i&gt;, and saw loads of linkage errors. I had recently run a distribution upgrade on my laptop to Oneiric, and thought the distribution libraries must be out of kilter.&lt;br /&gt;&lt;br /&gt;I was wrong. The library versions set up by Aptitude were fine and CMake had been fine. My only error in CMake was that I needed to add &lt;i&gt;boost_system&lt;/i&gt; to the libraries, which had not been the case in the previous Boost libraries.&lt;br /&gt;&lt;br /&gt;However, being wary of the unfamiliar, I dropped out of CMake and down into a command-line to invoke the following:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;  g++ -I. -lboost_system -lboost_filesystem -lboost_thread &lt;b&gt;main.cpp&lt;/b&gt;&lt;br /&gt;&lt;/blockquote&gt;The order is an old bad habit, which I have been getting away with. There was nothing wrong with the library versions I was pulling in, my command line error was linking them before the main.o file, which has the external requirements, which they needed to satisfy.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For linkage to work I needed:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;g++ -I. &lt;b&gt;main.cpp&lt;/b&gt; -lboost_system -lboost_filesystem -lboost_thread &lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;The linkage order in gcc 4.6.1 matters more than its predecessors. It makes sense that having main.o linked first makes it quicker to pull the dependencies out of the subsequent libraries.&lt;br /&gt;&lt;br /&gt;My CMakeLists.txt only needed boost_system to be added, just like my Makefiles needed it, but CMake correctly specified the link order, which I need to correct in my old project's Makefiles.&lt;br /&gt;&lt;br /&gt;+1 to CMake, and therefore KDevelop. It is nice to have concerns like that taken care of for you. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Adding files to the project&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;My next move was to add a class to the project.&lt;br /&gt;&lt;br /&gt;The jury is out whether it is a disappointment that KDevelop doesn't have a class wizard for creating the class. I confess I've never really liked those wizards in Microsoft's Visual C++ and Borland (yeah, I know I'm old), however I was unimpressed that the IDE assumed that I'd want to add a file in my home directory, but directories are sticky so that's no big deal. I think I like the balance, which KDevelop strikes here. You can navigate around classes in a class browser, but you create them as files. It is hard to see wizards working well with meta-programming, so the compromise is sensible. &lt;br /&gt;&lt;br /&gt;What puzzled me, though, was how the new file stands with respect to version control. When you add a file from the IDE, it automatically adds it to Subversion. I like that. What was less elegant was the fact that it did not seem to realise that the file had been added, and having written code, I found that the right-click menu offers you the option to add the file (and then tells you you have already done that &lt;i&gt;sucker&lt;/i&gt;!) and doesn't give you the option of committing it.&lt;br /&gt;&lt;br /&gt;Stop grumbling. This is FOSS. I guess that fix is something I could contribute to the KDevelop project, if I get familiar enough with it to dig into its source, and someone else doesn't beat me to it.&lt;br /&gt;&lt;br /&gt;Reloading the project lets you commit new code, and once you are dealing with code that's already been committed, it works OK, with only minor quibbles (like the fact that the cursor is in the diff box when you commit, when you'd expect it to be in the comments.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Intellisense&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;I was disappointed to find &lt;i&gt;intellisense&lt;/i&gt; attempting code completion when I was adding comments to a README.txt file. That doesn't bode well. I may simply switch it off, if it doesn't help productivity. Hand-on-heart, I've not started coding sufficiently in anger in this environment to have a view on code completion yet, but I hate it when I find unexpected text deposited into my source.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Last thoughts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;These are only initial impressions. KDevelop looks like quite a polished environment, but you can tell that it hasn't benefited from commercial QA. I am going to hang in with it and test it in anger by seeing through the &lt;i&gt;porting&lt;/i&gt; of my old code. If I find that I really am too &lt;i&gt;old school&lt;/i&gt; to move to this IDE or that its warts are too many, I'll thank it for bringing CMake to my attention.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-4774851409809632547?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/4774851409809632547/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/10/kdevelop-impressions.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4774851409809632547'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4774851409809632547'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/10/kdevelop-impressions.html' title='KDevelop Impressions'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-2090572690965645726</id><published>2011-10-03T05:12:00.000-07:00</published><updated>2011-10-03T05:12:58.324-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ec2'/><title type='text'>Amazon Silk</title><content type='html'>Perhaps I'm turning into an Amazon &lt;i&gt;fanboi&lt;/i&gt;, but I find myself very much liking the approach under-pinning the up-coming &lt;a href="http://amazonsilk.wordpress.com/"&gt;Silk&lt;/a&gt; browser. They say it is going to kick off initially as an exclusive for Kindle Fire, which makes sense. Silk will give that tablet a real advantage over other mobile devices.&lt;br /&gt;&lt;br /&gt;So the browser becomes split between EC2 and an "app" that runs on your mobile device. Most of the beef is in EC2; the "app" is a light-weight user of bandwidth. Neat. Watch the video, if you haven't already done so:&lt;br /&gt;&lt;br /&gt;&lt;iframe allowfullscreen="" frameborder="0" height="315" src="http://www.youtube.com/embed/_u7F_56WhHk" width="560"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;This approach is so &lt;b&gt;right&lt;/b&gt; that it is bound to reach other devices, including the desktop. The assembly processes running in EC2 are bound to be shared, which I guess has some security implications, but I was wondering what this means to web application developers like me.&lt;br /&gt;&lt;br /&gt;For starters, this is another nail in the coffin for voting applications that use IP addresses. That approach was always a bit flimsy, but we are going to see an increasing number of clients appearing with the same REMOTE_ADDR, where the IP address corresponds to some server in EC2. Applications should of course be using &lt;a href="http://en.wikipedia.org/wiki/X-Forwarded-For"&gt;X-Forwarded-For&lt;/a&gt;, but the value of that IP address is likely to be diminished too.&lt;br /&gt; &lt;br /&gt;A good approach for Caching becomes all the more relevant. EC2 will presumably become something of a CDN for clients, accessing common resources, and the more proxy-friendly we can make web applications, the better we can make it for all concerned. More reason to get expiry headers right.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-2090572690965645726?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/2090572690965645726/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/10/amazon-silk.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2090572690965645726'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2090572690965645726'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/10/amazon-silk.html' title='Amazon Silk'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://img.youtube.com/vi/_u7F_56WhHk/default.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-5306756516198792087</id><published>2011-07-05T10:32:00.000-07:00</published><updated>2011-07-05T10:32:39.301-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sshfs'/><category scheme='http://www.blogger.com/atom/ns#' term='ebs'/><category scheme='http://www.blogger.com/atom/ns#' term='ami'/><title type='text'>sshfs to the rescue</title><content type='html'>I'm building a new AMI in Amazon and am getting to the bottom of a problem with log4j. I always seem to have problems with log4j with JBOSS, but that's another story.&lt;br /&gt;&lt;br /&gt;I want to run a quick directory comparison on two locally mounted directories to see what's different. The natural thing to do in EC2 is to snapshot and mount, so that both EBS volumes are accessible from the same EC2 host, and that wouldn't be a bad idea, but a quicker approach is simply to mount a remote directory on a host using sshfs.&lt;br /&gt;Here's the drill:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; Install sshfs via the package manager: &lt;span style="color: #674ea7;"&gt;sudo apt-get install sshfs&lt;/span&gt; &lt;/li&gt;&lt;li&gt;Put my working user into the privileged &lt;i&gt;fuse&lt;/i&gt; group: &lt;span style="color: #674ea7;"&gt;sudo gpasswd -a $USER fuse&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Create a directory to use as the remote mount: &lt;span style="color: #674ea7;"&gt;mkdir ~/remote&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #674ea7;"&gt;&lt;span style="color: black;"&gt;Mount the directory:&lt;/span&gt;&lt;/span&gt; &lt;span style="color: #674ea7;"&gt;sshfs -o idmap=user  -o uid=1000 -o gid=1000 $USER@MY-REMOTE-IP-ADDRESS:/MY-REMOTE-DIRECTORY ~/remote &lt;span style="color: black;"&gt;(the remote user ID and group ID os 1000:1000 and this maps the local user ID to them)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #674ea7;"&gt;&lt;span style="color: black;"&gt;Then use diff recursively for the directory comparison.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #674ea7;"&gt;&lt;span style="color: black;"&gt;Clean up afterwards: &lt;/span&gt;fusermount -u ~/remote&lt;/span&gt; &lt;span style="color: #674ea7;"&gt;&lt;span style="color: black;"&gt;(this unmounts)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;This is an example of &lt;a href="http://en.wikipedia.org/wiki/Filesystem_in_Userspace"&gt;FUSE&lt;/a&gt;, which I find really liberating. It is something that you expect on the KDE or Gnome desktop, where is runs behind dialogue boxes in much the same way, but you don't necessarily think of using it in the cloud... or maybe that's just me with my head in the wrong clouds.&lt;br /&gt;&lt;span style="color: #674ea7;"&gt;&lt;span style="color: black;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-5306756516198792087?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/5306756516198792087/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/07/sshfs-to-rescue.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5306756516198792087'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5306756516198792087'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/07/sshfs-to-rescue.html' title='sshfs to the rescue'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-6020900983844216085</id><published>2011-06-22T13:06:00.000-07:00</published><updated>2011-06-22T13:06:35.468-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jsp'/><category scheme='http://www.blogger.com/atom/ns#' term='mvc'/><category scheme='http://www.blogger.com/atom/ns#' term='spring'/><category scheme='http://www.blogger.com/atom/ns#' term='struts'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><category scheme='http://www.blogger.com/atom/ns#' term='Script.aculo.us'/><category scheme='http://www.blogger.com/atom/ns#' term='jsonp'/><category scheme='http://www.blogger.com/atom/ns#' term='prototype.js. jQuery'/><title type='text'>Struts legacy and jQuery web services</title><content type='html'>I have a new application that needs to mash up with an old application. The old application is built using an unfashionable design paradigm: Java Struts2, Dojo and working on my own. The new application is PHP, JSONP, jQuery and working with company. My colleagues are comfortable with the new technology set. Making the right decision with respect to the new application is much easier than the old, but today's soul-searching regards the old app. How do I move forwards with it? I need to add some flesh to what's there, but really don't want to add to its legacy.&lt;br /&gt;&lt;br /&gt;Struts2 emerged from the WebWork Framework and thankfully is not much like the original Struts. Struts2/WebWork tags work nicely enough with &lt;i&gt;view&lt;/i&gt; JSPs Integrating Spring for MVC wiring is natural for Struts2 and you find that its tags work very nicely for the MVC paradigm. My only real problem with Struts2 is that I don't spend enough time with it to feel like I can churn out code without giving it much thought. I couldn't ever really get on with Dojo (the preferred Struts2 JavaScript framework) and wound up favouring server side processing and/or using Prototype.js/Script.aculo.us for drag 'n' drop, but Prototype.js - like Dojo - is more technology I want to consign to history. &lt;br /&gt;&lt;br /&gt;I don't really want to keep building old technology. I didn't ever really get much value out of Dojo, and my Prototype.js is looking unloved now. jQuery is great to work with, because it has become a &lt;i&gt;Lingua Franca&lt;/i&gt; in web development.&lt;br /&gt;&lt;br /&gt;How should I proceed with building on the model in the old application to support features that &lt;b&gt;belong&lt;/b&gt; in the old application and are needed for a mash-up with the new application?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Web services&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I was really tempted to &lt;i&gt;keep on truckin'&lt;/i&gt; with Struts2, but it makes sense to build new technology in jQuery+JSONP, and present everything else that I need in the old application as a JSONP web service, with the JSP code.&lt;br /&gt;&lt;br /&gt;Bye-bye, Struts2/WebWork tags. My view pages are pretty much static HTML, which flesh out content via JSONP. Bye-bye, complicated controllers. My model/controllers concern themselves with building light-wight JSON objects rather than setting up beans for views. The beauty is that &lt;i&gt;most of the work&lt;/i&gt; has always been in views, but now my JSP views are not much different from PHP pages which I can use to demonstrate functionality. Out with Java beans and in with JavaScript objects.&lt;br /&gt;&lt;br /&gt;The jury is out whether jQuery is going to feel different from the old technologies in the new pages. I expect it will. but perhaps they ought to feel new and different anyhow.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-6020900983844216085?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/6020900983844216085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/06/struts-legacy-and-jquery-web-services.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6020900983844216085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6020900983844216085'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/06/struts-legacy-and-jquery-web-services.html' title='Struts legacy and jQuery web services'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-4282436942065511673</id><published>2011-06-17T03:13:00.000-07:00</published><updated>2011-06-17T03:13:42.290-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='memcache'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>Memcache responsibility</title><content type='html'>I shared a link with colleagues a while back for MongoDB:&lt;a href="http://nosql.mypopescu.com/post/1016320617/mongodb-is-web-scale"&gt; MongoDB is Web Scale&lt;/a&gt;. The line that tickled one of my colleagues most was: &lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 12pt;"&gt;&lt;i&gt;You turn it on and it scales right up&lt;/i&gt;.&lt;/span&gt; The terrible truth is that there are so many great technologies out there (MongoDB included), that we often don't give ourselves enough time to understand them properly.&lt;br /&gt;&lt;br /&gt;Memcache is something that I realised I needed a lower-level understanding of this week.&lt;br /&gt;&lt;br /&gt;This was my initial understanding:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Memcache allows you to make use of spare RAM in your production cluster, and avail it to your applications for caching. The more you scale out, the more benefit you get, because the RAM is shared as one great big pool.&lt;/li&gt;&lt;li&gt;Installation is very simple. You start a daemon on each node in the cluster, and that's it. &lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 12pt;"&gt;&lt;i&gt;You turn it on and it scales right up&lt;/i&gt;.&lt;/span&gt; Typically, you run one instance per node, and conventionally have it listen for TCP/IP connections on port 11211.&lt;/li&gt;&lt;li&gt;In the world of PHP you &lt;a href="http://www.php.net/manual/en/memcache.connect.php"&gt;connect&lt;/a&gt;(!), and &lt;a href="http://php.net/manual/en/memcache.set.php"&gt;set data&lt;/a&gt; using a key with a &lt;a href="http://en.wikipedia.org/wiki/Time_to_live"&gt;TTL&lt;/a&gt; and and optional compression flag and &lt;a href="http://www.php.net/manual/en/memcache.get.php"&gt;get&lt;/a&gt; it simply with the key.&lt;/li&gt;&lt;li&gt;I appreciated that they key used for get/set, needed to be contrived in a clever manner to make sure that it was unique within the application and unique for all applications. A simple approach for caching SQL results was to generate a key from a MD5 of the concatenation of the application name and the SQL query used for the results - i.e. &lt;code&gt;$key = md5('MyApplication'.$sql)&lt;/code&gt;, adding servername to the concatenation, if there are multiple discrete versions of the application.&lt;/li&gt;&lt;/ul&gt;What I couldn't reconcile myself to was how the Memcache instances knew about each other to achieve the scale-out, and make the data available across the cluster. When the memcached daemons are instantiated, they aren't told about each other, so how do they swap information with each other.&lt;br /&gt;&lt;br /&gt;I ran some experiments, connecting to different Memcache instances in the cluster and setting and getting, and found to my horror that they don't know about each other. Set a key on one node and the other nodes don't know about it! &lt;br /&gt;&lt;br /&gt;My knee-jerk reaction was to fire an e-mail the long-suffering System Administrators to complain that the cluster wasn't set up properly, and my next foolish digression was to look at &lt;a href="http://repcached.lab.klab.org/"&gt;repcached&lt;/a&gt;'s replication patch as a way to replicate data across the cluster. Fool that I am.&lt;br /&gt;&lt;br /&gt;What I had failed to recognise was the essential design philosophy of Memcache, &lt;a href="http://code.google.com/p/memcached/wiki/NewOverview#What_are_the_Design_Philosophies"&gt;summarised&lt;/a&gt; in this paragraph:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;b&gt;Servers are Disconnected From Each Other&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Memcached servers are generally unaware of each other. There is no crosstalk, no syncronization, no broadcasting. The lack of interconnections means adding more servers will usually add more capacity as you expect. There might be exceptions to this rule, but they are exceptions and carefully regarded.&lt;/blockquote&gt;What I'd failed to appreciate is that the responsibility for scale-out is on the client, and that the client has the responsibility to connect to the &lt;b&gt;right &lt;/b&gt;Memcache node to set/get data.&lt;br /&gt;&lt;br /&gt;In your standard high level Perl or PHP client, this is achieved by:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Requiring a list of Memcache instances to be &lt;a href="http://www.php.net/manual/en/memcache.addserver.php"&gt;added&lt;/a&gt; to the client in a consistent order.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Connect to required instances as and when needs be.&lt;/li&gt;&lt;li&gt;Determining &lt;i&gt;which&lt;/i&gt; instance to connect to typically by a simple modulus on the CRC of the key used to reference the data, as a crude load-spreading technique.&lt;/li&gt;&lt;/ol&gt;Where I'd been doing wrong was explicitly to be &lt;i&gt;connecting&lt;/i&gt; to Memcache instances, when I should have left it to the client library to select the right instance in a &lt;a href="http://www.php.net/manual/en/memcache.addserver.php"&gt;list&lt;/a&gt; to connect to as and when required.&lt;br /&gt;&lt;br /&gt;The responsibilities are as follows:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The server (memcached) is responsible for maintaining a fast hash with a simple TCP/IP &lt;a href="http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt"&gt;protocol&lt;/a&gt; for getting and setting data referenced by keys.&lt;/li&gt;&lt;li&gt;Client library is responsible for determining which server to connect to, and handles zlib &lt;a href="http://www.php.net/manual/en/memcache.setcompressthreshold.php"&gt;compression&lt;/a&gt;, if your data size warrants it. It presents a simple API for you to work with, handling the sockets interface, and spreading the data across the Memcache nodes, typically using the key CRC modulus trick to select the node.&lt;/li&gt;&lt;li&gt;Your implementation is responsible for setting up a consistent list of Memcache instances for the client library to work with, and you need to be responsible for generating unique keys in whet is most likely a shared resource, appropriate TTLs and sensible judgments about compression.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:WordDocument&gt;   &lt;w:View&gt;Normal&lt;/w:View&gt;   &lt;w:Zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:TrackMoves/&gt;   &lt;w:TrackFormatting/&gt;   &lt;w:PunctuationKerning/&gt;   &lt;w:ValidateAgainstSchemas/&gt;   &lt;w:SaveIfXMLInvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:IgnoreMixedContent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:AlwaysShowPlaceholderText&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:DoNotPromoteQF/&gt;   &lt;w:LidThemeOther&gt;EN-GB&lt;/w:LidThemeOther&gt;   &lt;w:LidThemeAsian&gt;X-NONE&lt;/w:LidThemeAsian&gt;   &lt;w:LidThemeComplexScript&gt;X-NONE&lt;/w:LidThemeComplexScript&gt;   &lt;w:Compatibility&gt;    &lt;w:BreakWrappedTables/&gt;    &lt;w:SnapToGridInCell/&gt;    &lt;w:WrapTextWithPunct/&gt;    &lt;w:UseAsianBreakRules/&gt;    &lt;w:DontGrowAutofit/&gt;    &lt;w:SplitPgBreakAndParaMark/&gt;    &lt;w:EnableOpenTypeKerning/&gt;    &lt;w:DontFlipMirrorIndents/&gt;    &lt;w:OverrideTableStyleHps/&gt;   &lt;/w:Compatibility&gt;   &lt;w:BrowserLevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;   &lt;m:mathPr&gt;    &lt;m:mathFont m:val="Cambria Math"/&gt;    &lt;m:brkBin m:val="before"/&gt;    &lt;m:brkBinSub m:val="&amp;#45;-"/&gt;    &lt;m:smallFrac m:val="off"/&gt;    &lt;m:dispDef/&gt;    &lt;m:lMargin m:val="0"/&gt;    &lt;m:rMargin m:val="0"/&gt;    &lt;m:defJc m:val="centerGroup"/&gt;    &lt;m:wrapIndent m:val="1440"/&gt;    &lt;m:intLim m:val="subSup"/&gt;    &lt;m:naryLim m:val="undOvr"/&gt;   &lt;/m:mathPr&gt;&lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"  DefSemiHidden="true" DefQFormat="false" DefPriority="99"  LatentStyleCount="267"&gt;   &lt;w:LsdException Locked="false" Priority="0" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Normal"/&gt;   &lt;w:LsdException Locked="false" Priority="9" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="heading 1"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/&gt;   &lt;w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 1"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 2"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 3"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 4"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 5"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 6"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 7"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 8"/&gt;   &lt;w:LsdException Locked="false" Priority="39" Name="toc 9"/&gt;   &lt;w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/&gt;   &lt;w:LsdException Locked="false" Priority="10" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Title"/&gt;   &lt;w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/&gt;   &lt;w:LsdException Locked="false" Priority="11" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/&gt;   &lt;w:LsdException Locked="false" Priority="22" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Strong"/&gt;   &lt;w:LsdException Locked="false" Priority="20" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/&gt;   &lt;w:LsdException Locked="false" Priority="59" SemiHidden="false"   UnhideWhenUsed="false" Name="Table Grid"/&gt;   &lt;w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/&gt;   &lt;w:LsdException Locked="false" Priority="1" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/&gt;   &lt;w:LsdException Locked="false" Priority="60" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Shading"/&gt;   &lt;w:LsdException Locked="false" Priority="61" SemiHidden="false"   UnhideWhenUsed="false" Name="Light List"/&gt;   &lt;w:LsdException Locked="false" Priority="62" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Grid"/&gt;   &lt;w:LsdException Locked="false" Priority="63" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 1"/&gt;   &lt;w:LsdException Locked="false" Priority="64" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 2"/&gt;   &lt;w:LsdException Locked="false" Priority="65" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 1"/&gt;   &lt;w:LsdException Locked="false" Priority="66" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 2"/&gt;   &lt;w:LsdException Locked="false" Priority="67" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 1"/&gt;   &lt;w:LsdException Locked="false" Priority="68" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 2"/&gt;   &lt;w:LsdException Locked="false" Priority="69" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 3"/&gt;   &lt;w:LsdException Locked="false" Priority="70" SemiHidden="false"   UnhideWhenUsed="false" Name="Dark List"/&gt;   &lt;w:LsdException Locked="false" Priority="71" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Shading"/&gt;   &lt;w:LsdException Locked="false" Priority="72" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful List"/&gt;   &lt;w:LsdException Locked="false" Priority="73" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Grid"/&gt;   &lt;w:LsdException Locked="false" Priority="60" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Shading Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="61" SemiHidden="false"   UnhideWhenUsed="false" Name="Light List Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="62" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Grid Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="63" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="64" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="65" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/&gt;   &lt;w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/&gt;   &lt;w:LsdException Locked="false" Priority="34" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/&gt;   &lt;w:LsdException Locked="false" Priority="29" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Quote"/&gt;   &lt;w:LsdException Locked="false" Priority="30" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/&gt;   &lt;w:LsdException Locked="false" Priority="66" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="67" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="68" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="69" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="70" SemiHidden="false"   UnhideWhenUsed="false" Name="Dark List Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="71" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="72" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful List Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="73" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/&gt;   &lt;w:LsdException Locked="false" Priority="60" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Shading Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="61" SemiHidden="false"   UnhideWhenUsed="false" Name="Light List Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="62" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Grid Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="63" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="64" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="65" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="66" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="67" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="68" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="69" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="70" SemiHidden="false"   UnhideWhenUsed="false" Name="Dark List Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="71" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="72" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful List Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="73" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/&gt;   &lt;w:LsdException Locked="false" Priority="60" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Shading Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="61" SemiHidden="false"   UnhideWhenUsed="false" Name="Light List Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="62" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Grid Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="63" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="64" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="65" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="66" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="67" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="68" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="69" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="70" SemiHidden="false"   UnhideWhenUsed="false" Name="Dark List Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="71" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="72" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful List Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="73" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/&gt;   &lt;w:LsdException Locked="false" Priority="60" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Shading Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="61" SemiHidden="false"   UnhideWhenUsed="false" Name="Light List Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="62" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Grid Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="63" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="64" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="65" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="66" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="67" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="68" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="69" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="70" SemiHidden="false"   UnhideWhenUsed="false" Name="Dark List Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="71" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="72" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful List Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="73" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/&gt;   &lt;w:LsdException Locked="false" Priority="60" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Shading Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="61" SemiHidden="false"   UnhideWhenUsed="false" Name="Light List Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="62" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Grid Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="63" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="64" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="65" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="66" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="67" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="68" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="69" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="70" SemiHidden="false"   UnhideWhenUsed="false" Name="Dark List Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="71" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="72" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful List Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="73" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/&gt;   &lt;w:LsdException Locked="false" Priority="60" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Shading Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="61" SemiHidden="false"   UnhideWhenUsed="false" Name="Light List Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="62" SemiHidden="false"   UnhideWhenUsed="false" Name="Light Grid Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="63" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="64" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="65" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="66" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="67" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="68" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="69" SemiHidden="false"   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="70" SemiHidden="false"   UnhideWhenUsed="false" Name="Dark List Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="71" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="72" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful List Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="73" SemiHidden="false"   UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/&gt;   &lt;w:LsdException Locked="false" Priority="19" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/&gt;   &lt;w:LsdException Locked="false" Priority="21" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/&gt;   &lt;w:LsdException Locked="false" Priority="31" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/&gt;   &lt;w:LsdException Locked="false" Priority="32" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/&gt;   &lt;w:LsdException Locked="false" Priority="33" SemiHidden="false"   UnhideWhenUsed="false" QFormat="true" Name="Book Title"/&gt;   &lt;w:LsdException Locked="false" Priority="37" Name="Bibliography"/&gt;   &lt;w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt; /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";}&lt;/style&gt; &lt;![endif]--&gt;  &lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-4282436942065511673?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/4282436942065511673/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/06/memcache-responsibility.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4282436942065511673'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4282436942065511673'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/06/memcache-responsibility.html' title='Memcache responsibility'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-2888519967418657750</id><published>2011-03-26T06:22:00.000-07:00</published><updated>2011-03-26T07:00:01.341-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='crl'/><category scheme='http://www.blogger.com/atom/ns#' term='ssl'/><category scheme='http://www.blogger.com/atom/ns#' term='ocsp'/><title type='text'>SSL Certificate Revocation - Need to clean up our act</title><content type='html'>With &lt;a href="http://blogs.comodo.com/it-security/data-security/the-recent-ra-compromise/"&gt;Comodo&lt;/a&gt;  issuing a batch of dodgy SSL certificates, Certificate Revocation came into focus. Certificate Revocation is the way a certificate is withdrawn from trust; you cant force a signed certificate to be removed from a web site, but you can black-list it centrally.&lt;br /&gt;&lt;br /&gt;Certificate Revocation Lists (&lt;a href="http://en.wikipedia.org/wiki/Revocation_list"&gt;CRLs&lt;/a&gt;) used to be the way certificate revocation has handled, with Distribution Points publicised in the Certificate Authorities' (CA) certificates. "&lt;i&gt;Browser&lt;/i&gt;, go to this Distribution Point for this CA, and you'll see which current certificates are black-listed." This doesn't scale well however, and you'll find an empty list of CRL Distribution Points in Firefox (and Operating Systems for the other browsers). Nowadays &lt;a href="http://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol"&gt;OCSP&lt;/a&gt; is used to distribute certificate revocations rather like the way DNS works, &lt;b&gt;responders&lt;/b&gt; acting for certificate revocation like local DNSs act for domain name look-ups.&lt;br /&gt;&lt;br /&gt;Firefox and Internet Explorer 7+ support OCSP, the latter only via the Windows Vista or Windows 7 operating system. On Macs, I understand that Safari users need to activate OCSP in their KeyChain preferences in OSX (Do it!). Windows XP users should stick with Firefox.&lt;br /&gt;&lt;br /&gt;Alas, having a browser that supports OCSP doesn't mean that you know that it is supported by the CA. Remarkably, Extended Verification certificates are &lt;a href="http://en.wikipedia.org/wiki/Extended_Validation_Certificate#Online_Certificate_Status_Protocol"&gt;not necessarily protected by OCSP&lt;/a&gt;. So that Green Bar on your browser doesn't tell you that the site you are looking at has a certificate that hasn't been revoked! So you &lt;b&gt;don't&lt;/b&gt; have a visual indication that your site certificate is valid; you benefit only from seeing that some are invalid. That is not good enough.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="MsoPlainText"&gt;Firefox has an extension called &lt;a href="https://calomel.org/firefox_ssl_validation.html"&gt;Calomel SSL Validation&lt;/a&gt;, which only green-flags sites which are OCSP-verified and strong ciphers. It flags interesting things like the fact that &lt;a href="https://www.nwolb.com/"&gt;NatWest online bank&lt;/a&gt; is on a weak cipher and key, which seems very sloppy to me. If you accept anything less than a green-flagged site, it isn't explicit whether the CA provides OCSP or whether the failure was only due to a weak cipher. So, if you take the Draconian approach that only sites green-flagged by the Calomel Validation should be trsuted, you are likely to lose a few sites, but less likely to be subjected to fraud. I'd like to see OCSP mandated as a main-stream requirement for secure sites, but in the interim I'll stick to Firefox+Calomel for online security.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-2888519967418657750?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/2888519967418657750/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/03/ssl-certificate-revocation-need-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2888519967418657750'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2888519967418657750'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/03/ssl-certificate-revocation-need-to.html' title='SSL Certificate Revocation - Need to clean up our act'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-760171313940252867</id><published>2011-03-16T02:58:00.000-07:00</published><updated>2011-03-16T02:58:26.347-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='windows 7'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>Windows 7 and PHP development</title><content type='html'>I am told that I ought to be using Aptana for PHP develpoment to benefit from source code indexing, but I'm a creature of habit and was wondering if I could stick with my usual text editors and make better use of Windows explorer for search.&lt;br /&gt;&lt;br /&gt;I haven't invested any time in the Windows&amp;nbsp; search beyond looking at my Office documents and wondered why it was that when I looked for an HTML ID in a PHP project, I'd see relevant stuff in JavaScript, CSS and funily enough subversion entries, but nothing in the PHP source.&lt;br /&gt;&lt;br /&gt;Time to look at Windows Explorer's &lt;b&gt;Indexing Options&lt;/b&gt;, which in Windows 7 means hitting the start button an entering &lt;i&gt;Indexing Options&lt;/i&gt;.Clicking the advanced button and tabbing to &lt;b&gt;File Types&lt;/b&gt; shows how the various files are being indexed. Having some Lucene experience, it should have occurred to me that Windows assigns different indexing filters to different file types, with HTML and CSS getting a special HTML filter and JavaScript .txt etc being handled by a plain text filter. Office files of course get a special filter of their own. In a default set-up, PHP files get the same treatment as images and anything else which is deemed to be uninteresting for internal search... and that is a search on &lt;i&gt;File properties&lt;/i&gt; only.&lt;br /&gt;&lt;br /&gt;Well since I'm earning a crust working with PHP, it makes sense to assign the Plain Text filter to &lt;b&gt;*.php&lt;/b&gt; so I can have Windows index them too. Doing that is simply a matter of hitting a radio button to &lt;i&gt;Index Properties &lt;b&gt;and&lt;/b&gt; File Contents&lt;/i&gt;. Job done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-760171313940252867?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/760171313940252867/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/03/windows-7-and-php-development.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/760171313940252867'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/760171313940252867'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/03/windows-7-and-php-development.html' title='Windows 7 and PHP development'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-5046803019527538991</id><published>2011-03-11T04:17:00.000-08:00</published><updated>2011-03-11T04:18:47.429-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rewrite'/><category scheme='http://www.blogger.com/atom/ns#' term='apache'/><title type='text'>Handling fall-through in a rewrite map</title><content type='html'>If you have to deal with a lot of URLRewrites, you want to use a &lt;a href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#mapfunc"&gt;map&lt;/a&gt;, but what about the URLs that do not get matched by the map? Let's say there are a load of entries in your &lt;b&gt;/2011/March/&lt;/b&gt; folder in your blog, which you need to map over to some external domain.&lt;br /&gt;&lt;br /&gt;Under normal circumstances you can use something like this in your virtual host configuration in Apache:&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;RewriteEngine on&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;RewriteMap march-2011-map txt:/somepath/march-2011-map.txt&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&lt;span style="font-size: small;"&gt;RewriteRule ^/2011/March/(.+) ${march-2011-map:$1|/2011/March/$1}&lt;/span&gt; &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;The &lt;b&gt;march-2011-map.txt&lt;/b&gt; file (in directory &lt;b&gt;/somepath&lt;/b&gt;) contains lines like this:&lt;br /&gt;&lt;span style="font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;sausages http://example.com/sausages&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;Hits on anything in the &lt;b&gt;/2011/March/&lt;/b&gt; directory get the map treatment. A hit on &lt;b&gt;/2011/March/sausages&lt;/b&gt; gets redirected to&amp;nbsp; http://example.com/sausages and a hit on (say) &lt;b&gt;/2011/March/eggs&lt;/b&gt;, if &lt;b&gt;eggs&lt;/b&gt; is not in the map, gets handled locally.&lt;br /&gt;&lt;br /&gt;This works because of the parameter encoded after the '|' in the &lt;a href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewriterule"&gt;RewriteRule&lt;/a&gt;, which replaces non-matches in the look-up.&lt;br /&gt;&lt;br /&gt;The problem with this approach though, is that maving rewritten the lookup in such a way as to do nothing to the non-match, the URL is handled there and then and any other rewrite rules etc do not get a look in.&lt;br /&gt;&lt;br /&gt;There are situations where you want to redirect URLs matched by the map, but to continue to handle other URLs. The solution is to use a &lt;a href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewritecond"&gt;RewriteCond&lt;/a&gt;, which is a look-ahead to make the Rewrite only apply to map matches.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;RewriteEngine on&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;RewriteMap march-2011-map txt:/somepath/march-2011-map.txt&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;RewriteCond ${march-2011-map:$1} &amp;gt;""&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: small;"&gt;RewriteRule ^/2011/March/(.+) ${march-2011-map:$1} &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;We have lost the '|' in the RewriteRule now, which means that keys not matched return as an empty string. The RewriteCond, which precedes this is a look-ahead, which is unintuitive, if you've not played with RewriteCond before. It looks at the lookup in the subsequent RewriteRule and tests that it is something greater than the empty string. That means that RewriteRule handles &lt;b&gt;/2011/March/sausages&lt;/b&gt; because it is a match and therefore returns something greater than "", but the RewriteRule is not applied to &lt;b&gt;/2011/March/eggs&lt;/b&gt;, because that look-up returned "".&lt;br /&gt;&lt;br /&gt;You can then go on to handle &lt;b&gt;/2011/March/eggs&lt;/b&gt; with subsequent tests and rules in your configuration.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-5046803019527538991?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/5046803019527538991/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/03/handling-fall-through-in-rewrite-map.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5046803019527538991'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5046803019527538991'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/03/handling-fall-through-in-rewrite-map.html' title='Handling fall-through in a rewrite map'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-392488739761891720</id><published>2011-02-07T03:28:00.000-08:00</published><updated>2011-02-11T04:04:08.066-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sockets'/><category scheme='http://www.blogger.com/atom/ns#' term='pack'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>PHP and Binary Network Protocol</title><content type='html'>A friend of mine consults to a company that has a Fingerprint identification module. This let people into and uot of some premises. He asked me if I'd be able to write some software to controll the device, given an SDK for C++ and C# and a specification document. What I knew that he &lt;b&gt;really&lt;/b&gt; wanted was to be able to do this work himself in a language which he is comfortable with: PHP. What he wanted from me was a leg-up to be able to do this.&lt;br /&gt;&lt;br /&gt;I flirted with the idea of digging through the C++ and/or C# source, but it soon became clear that most of what was there was GUI-related, and &lt;i&gt;understanding&lt;/i&gt; the applications required understanding invocations to library files, and the invocations were undocumented and poorly commented. By contrast, the specification document described with great clarity the TCP/IP protocol to interrogate and control the module.&lt;br /&gt;&lt;br /&gt;With some bravado, I said I'd knock up a PHP interface to the module, recalling a project 6 years ago, where I worked with socket client in a PHP management page, which controlled a proprietary &lt;a href="http://en.wikipedia.org/wiki/Message_transfer_agent"&gt;mail transfer agent&lt;/a&gt;, for &lt;a href="http://www.emailfiltering.com/"&gt;Email Systems&lt;/a&gt;; PHP has a relatively little-used low-level &lt;a href="http://php.net/manual/en/book.sockets.php"&gt;sockets library&lt;/a&gt;, which was just the ticket for customer management pages that needed to maniulate the MTA.&lt;br /&gt;&lt;br /&gt;The Fingerprint device presented some challenges that hadn't occurred to me in my moment of hubris. The protocol has all the hallmarks of something derived from a serial protocol: fixed lengths, signature bytes and checksums - application checksums are superfluous in TCP, which has a checksum in its header. This sort of thing is all well and good when you are wirting assembler or C code, but it is a culture shock in PHP.&lt;br /&gt;&lt;br /&gt;Fortunately, I've had to deal with a similar binary protocol (also with its roots in serial) for tests in Perl, and at the time I muddled through with &lt;a href="http://www.perlmonks.org/?node_id=224666"&gt;pack/unpack&lt;/a&gt;, taking advantage of the fact that strings are &lt;a href="http://en.wikipedia.org/wiki/Binary-safe"&gt;binary safe&lt;/a&gt;. &lt;a href="http://php.net/manual/en/function.pack.php"&gt;Pack/unpack&lt;/a&gt; are also available in PHP - I expect used even less than the sockets functions - and its strings are also binary safe.&lt;br /&gt;&lt;br /&gt;To a C/C++ programmer it seems very alien, but initialising a variable with &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$my_variable = ''&lt;/span&gt; in PHP is equivalent to creating an empty buffer, and to the chagrin of UTF-8 using &lt;a href="http://php.net/manual/en/function.strlen.php"&gt;strlen()&lt;/a&gt; on that variable tells you how many bytes there are in the buffer. You can use &lt;a href="http://php.net/manual/en/function.substr.php"&gt;substr()&lt;/a&gt; to extract portions of the buffer, as you would with &lt;a href="http://www.cplusplus.com/reference/clibrary/cstring/memcpy/"&gt;memcpy()&lt;/a&gt; in C/C++. Using these core functions along with pack/unpack, we can pick through binary data packets.&lt;br /&gt;&lt;br /&gt;It doesn't all smell of roses, though.&lt;br /&gt;&lt;br /&gt;Like Perl, PHP's pack/unpack functions are quirky. The module is - we have to guess - &lt;a href="http://en.wikipedia.org/wiki/Endianness"&gt;little endian&lt;/a&gt;; the protocol is wirtten for MS Windows clients and there is no mention of network byte order and a little-endian implementation works. I'd like to think that we could have a PHP library that works on any architecture, including pre-Intel Macs, which are big endian. However, that would require extra work, because the 16-bit "short" format specifiers for pack/unpack are as follows:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;s &amp;nbsp;&amp;nbsp; &amp;nbsp;signed short (always 16 bit, machine byte order)&lt;/li&gt;&lt;li&gt;S &amp;nbsp;&amp;nbsp; &amp;nbsp;unsigned short (always 16 bit, machine byte order)&lt;/li&gt;&lt;li&gt;n &amp;nbsp;&amp;nbsp; &amp;nbsp;unsigned short (always 16 bit, big endian byte order)&lt;/li&gt;&lt;li&gt;v &amp;nbsp;&amp;nbsp; &amp;nbsp;unsigned short (always 16 bit, little endian byte order)&lt;/li&gt;&lt;/ul&gt;If you know the source is a signed 16-bit integer on a little endian system and you don't want to have to know the endianism of the server your PHP is running on it seems silly that you have to work with &lt;b&gt;unsigned&lt;/b&gt; shorts, because &lt;b&gt;signed&lt;/b&gt; shorts are assumed to have local endianism.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://php.net/manual/en/function.unpack.php"&gt;PHP unpack function&lt;/a&gt; differs from its Perl counterpart in that it returns data in an associative array. When you've been broght up with C/C++, it feels very inefficient working with hashes in a protocol handler, but that's OK for my friends application, because it isn't something that has to scale. The device will only ever have one client at a time, and indeed is designed only to allow one at a time.&lt;br /&gt;&lt;br /&gt;So what was the outcome of my evening's fiddling? I think and hope I've given him a PHP class that he can use as the basis of his controller. It seems to work OK with a simulator that I knocked up in C++. I quite liked the quirkness of building something that can go into a PHP page (or CLI) that can do something you'd expect to have to use a lower-level language for.&lt;br /&gt;&amp;nbsp;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-392488739761891720?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/392488739761891720/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/02/php-and-binary-network-protocol.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/392488739761891720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/392488739761891720'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/02/php-and-binary-network-protocol.html' title='PHP and Binary Network Protocol'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-72838257389233920</id><published>2011-02-02T04:12:00.000-08:00</published><updated>2011-02-02T04:12:19.539-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c++'/><category scheme='http://www.blogger.com/atom/ns#' term='dbi'/><category scheme='http://www.blogger.com/atom/ns#' term='utf-8'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='perl'/><title type='text'>MySQL laziness with UTF-8</title><content type='html'>You can get away with putting UTF-8 into a Latin1 table in MySQL, if your client connects &lt;i&gt;without&lt;/i&gt; using &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;SET NAMES UTF-8&lt;/span&gt; (i.e. you leave the client connecting for Latin1).&lt;br /&gt;&lt;br /&gt;Certainly you will expect some things to go wrong, if you get too clever. You wouldn't expect the MySQL &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_char-length"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;CHAR_LENGTH()&lt;/span&gt;&lt;/a&gt; function to play ball with multi-byte characters, and likewise functions like &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;SUBSTR()&lt;/span&gt;&lt;/a&gt; will get their character boundaries wrong, but a bog standard &lt;a href="http://en.wikipedia.org/wiki/Create,_read,_update_and_delete"&gt;CRUD&lt;/a&gt; application seldom uses string functions that are likely to trip up portability. So you can get away with not telling the MySQL server or its client code that it is dealing with UTF-8, and other than the odd hiccup with collation in sorted results you can slap &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;/span&gt; on your XML and otherwise remember to put UTF-8 in and expect to get UTF-8 out, and have a happy lazy coding experience.&lt;br /&gt;&lt;br /&gt;I have an odd application running that uses a MySQL database as a temporary file on steroids. I have a heap of UTF-8 data coming in from a daily feed that needs relationships to be set up and what better than an &lt;a href="http://en.wikipedia.org/wiki/Relational_database_management_system"&gt;RDBMS&lt;/a&gt; for this, and what could be easier than MySQL or PostgreSQL for the relationships. I'm amongst &lt;a href="http://en.wikipedia.org/wiki/LAMP_%28software_bundle%29"&gt;LAMP&lt;/a&gt; flag-waving company so MySQL is what I'm using. My application imports this data using MySQL's excellent &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/load-data.html"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;LOAD DATA INFILE&lt;/span&gt;&lt;/a&gt; syntax, which sets up the relational database in short order. Somehow, I'd forgotten to set up the database for UTF-8 and yet was stuffing UTF-8 data into it without a care. It worked for my purposes. My client code - a C++ application that is responsible for static publishing - likewise &lt;i&gt;forgot&lt;/i&gt; to send a  &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;SET NAMES UTF-8&lt;/span&gt;query, so both ends were blithely handling the UTF-8 data as if it was ISO-8859-1 (Latin1) with no penalty. Where order was being used in my client application, it didn't matter that it was erroneously working on UTF-8 data with Latin1 collation.&lt;br /&gt;&lt;br /&gt;Something went wrong in the data feed (invalid UTF-8 characters coming from the source) and I took a another look at the application and spotted the Latin1 encoding.&lt;br /&gt;&lt;br /&gt;Time for some floundering (confession time):&lt;br /&gt;&lt;ol&gt;&lt;li&gt;I changed the database to UTF-8 encoding, so that my &lt;a href="http://www.navicat.com/"&gt;Navicat&lt;/a&gt; MySQL client would show nice UTF-8 data in my &lt;i&gt;temporary database&lt;/i&gt;. My diacritics now looked good with the exception of the invalid UTF-8 data, which created the fuss in the first place. because it was invalid. However, this shot me in the foot in my MySQL client, which was now rendering the diacritics as &lt;b&gt;ISO-8859-1&lt;/b&gt; (Latin1) and slapping &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt; &lt;/span&gt;on its XML. My client was connecting to the UTF-8 database with a default Latin1 connection, and the MySQL drivers in good faith converted it to Latin1. Things were made worse.&lt;/li&gt;&lt;li&gt;Having made my error in an environment which unexpectedly was being used for production purposes (oops) and having discovered that an awful lot of XML was no longer well-formed because of ISO-8859-1 characters posing as UTF-8, I needed to run a hasty&amp;nbsp; &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;find . -type f | xargs perl -i.old -p -e 's/UTF-8/ISO-8859-1/'&lt;/span&gt; on my published XML files, and then to edit my C++ sources to declare ISO-8859-1 as the encoding the the XML declaration for future publishes.&lt;/li&gt;&lt;/ol&gt;Panic over, it is time to take stock.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I now have a UTF-8 database with UTF-8 data in it, which improves its appearance in Navicat (big deal) and also has UTF-8 collation (well OK there may be a subtle benefit from this).&lt;/li&gt;&lt;li&gt;My client code is now ISO-8859-1 (Latin1), and says as much in its XML declaration.&lt;/li&gt;&lt;/ul&gt;Unless we start seeing characters in the feed that cannot be converted to ISO-8859-1, we are in&amp;nbsp; good shape, but it would be good to go the extra yard and have the client handle UTF-8 properly.&lt;br /&gt;&lt;br /&gt;So, what's needed?&lt;br /&gt;&lt;br /&gt;My C++ client, needs to connect to the database for the "utf8" charset. If it is going to render in UTF-8.&lt;br /&gt;&lt;br /&gt;Hre's how: &lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;void init()&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mysql_init(&amp;amp;mysql);&lt;br /&gt;&lt;span style="background-color: white; color: orange;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; const char *charset = "utf8";&lt;/span&gt;&lt;br style="background-color: white; color: orange;" /&gt;&lt;span style="background-color: white; color: orange;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (mysql_options(&amp;amp;mysql,MYSQL_SET_CHARSET_NAME,charset) != 0)&lt;/span&gt;&lt;br style="background-color: white; color: orange;" /&gt;&lt;span style="background-color: white; color: orange;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; cerr &amp;lt;&amp;lt; "Warning: Unable to set CHARSET '" &amp;lt;&amp;lt; charset &amp;lt;&amp;lt; "' " &amp;lt;&amp;lt; mysql_errno(&amp;amp;mysql) &amp;lt;&amp;lt; ": " &amp;lt;&amp;lt; mysql_error(&amp;amp;mysql) &amp;lt;&amp;lt; endl;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ((con = mysql_real_connect(&amp;amp;mysql,host,user,password,db,0,0,0)) == 0) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; cerr &amp;lt;&amp;lt; "Error: Unable to connect code " &amp;lt;&amp;lt; mysql_errno(&amp;amp;mysql) &amp;lt;&amp;lt; ": " &amp;lt;&amp;lt; mysql_error(&amp;amp;mysql) &amp;lt;&amp;lt; endl;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; exit(1);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; atexit(done);&lt;br /&gt;}&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;C++ strings are more robust than C's ASCIIZ strings, so need to &lt;a href="http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8"&gt;worry&lt;/a&gt; about '\0' characters if we treat the multibyte strings no differently from regular ASCII 8-bit characters.&lt;br /&gt;&lt;br /&gt;Some incremental updates are applied with a Perl DBI client, which has yet to be made UTF-8 aware. Since that writes to the database, I need to clean up its act:&lt;br /&gt;&lt;br /&gt;It need to read its update data in utf8 mode:&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;open FILE, $filename" or die "Couldn't open $filename: $!";&lt;br /&gt;binmode FILE, ":utf8";&lt;br /&gt;my $content = &lt;file&gt;;&lt;/file&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;Strings inserted, need to be UTF8-ised:&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span style="font-size: x-small;"&gt;utf8::encode $mystring;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;The database needs to be connected to for UTF-8 mode:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: x-small;"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;our $dbh = DBI-&amp;gt;connect("DBI:mysql:$mydatabase;host=$myhost",$myuser, $mypassword, { RaiseError =&amp;gt; 1 });&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$dbh-&amp;gt;{'mysql_enable_utf8'} = 1;&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$dbh-&amp;gt;do('SET NAMES utf8');&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Sometimes laziness is king.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-72838257389233920?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/72838257389233920/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2011/02/mysql-laziness-with-utf-8.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/72838257389233920'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/72838257389233920'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2011/02/mysql-laziness-with-utf-8.html' title='MySQL laziness with UTF-8'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7357231131497936297</id><published>2010-11-16T04:53:00.000-08:00</published><updated>2010-11-16T04:53:26.894-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='catalog'/><category scheme='http://www.blogger.com/atom/ns#' term='dtd'/><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><title type='text'>DTD go local</title><content type='html'>I meant to look into this a long time ago, and regret I didn't. &lt;a href="http://www.sagehill.net/docbookxsl/Catalogs.html"&gt;Here&lt;/a&gt; is an excellent walk through for the use of an XML catalogue file to allow DTDs to be used in the absence of Internet connectivity. This will stop my XSLT scripts from grinding to an embarrassing halt, when the resolver isn't resolving iin customer sites.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7357231131497936297?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7357231131497936297/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2010/11/dtd-go-local.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7357231131497936297'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7357231131497936297'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2010/11/dtd-go-local.html' title='DTD go local'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-3502328793401398222</id><published>2010-11-09T03:56:00.000-08:00</published><updated>2010-11-09T03:56:19.546-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='elb'/><category scheme='http://www.blogger.com/atom/ns#' term='ssl'/><category scheme='http://www.blogger.com/atom/ns#' term='aws'/><title type='text'>Elastic Load Balancing for HTTPS</title><content type='html'>Amazon Web Services has added support for HTTPS termination to its Elastic Load Balancer, according to &lt;a href="http://aws.typepad.com/aws/2010/10/elastic-load-balancer-support-for-ssl-termination.html"&gt;their blog&lt;/a&gt;, which means that the final piece for scalable E-commerce solutions is in place. The trick is putting the certificate onto the load balancer, and leaing it to the ELB to handle the SSL handshake.&lt;br /&gt;&lt;br /&gt;Like vanilla HTTP, there is support for sticky sessions over the HTTPS Elastic Load Balancer.&lt;br /&gt;&lt;br /&gt;Is it enough to trust vanilla HTTP between the ELB and your application instances? I guess it depends on the application, but if you do not you can always forward to port 443 on the application server and use a second encrypted channel for that hop. If the AWS &lt;a href="http://s3.amazonaws.com/aws_blog/AWS_Security_Whitepaper_2008_09.pdf"&gt;security white paper&lt;/a&gt; cuts the mustard, settle for the promised X-Forwarded-Proto header to indicate that the communication arrived via a secure channel upstream, and you have adequate encryption for most privacy purposes.&lt;br /&gt;&lt;br /&gt;I think this opens up all sorts of possibilities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-3502328793401398222?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/3502328793401398222/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2010/11/elastic-load-balancing-for-https.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/3502328793401398222'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/3502328793401398222'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2010/11/elastic-load-balancing-for-https.html' title='Elastic Load Balancing for HTTPS'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-1294904235559838248</id><published>2010-08-06T04:14:00.000-07:00</published><updated>2010-08-06T04:14:06.917-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cloudfront'/><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='s3'/><category scheme='http://www.blogger.com/atom/ns#' term='jsonp'/><category scheme='http://www.blogger.com/atom/ns#' term='aws'/><title type='text'>Going static in AWS</title><content type='html'>Amazon has announced default root objects in &lt;a href="http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/"&gt;Cloudfront&lt;/a&gt;, which makes me think of static publishing again. I really like the idea of throwing everything at S3 and hence Cloudfront and reducing my application to something that works with JSONP to get around cross-domain scripting issues with JSON. The default object is the final piece needed for this, because now I can have a home page and use www or default DNS entry for the CDN and something less catchy for the dynamic stuff that goes through JSONP.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-1294904235559838248?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/1294904235559838248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2010/08/going-static-in-aws.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1294904235559838248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1294904235559838248'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2010/08/going-static-in-aws.html' title='Going static in AWS'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-5110965981196200822</id><published>2010-01-16T04:45:00.000-08:00</published><updated>2010-01-16T04:45:35.606-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='php'/><category scheme='http://www.blogger.com/atom/ns#' term='perl'/><title type='text'>No parentheses in PHP language constructs</title><content type='html'>I list PHP amongst the languages, which I'm not great at, but which I occasionally earn a crust from.&lt;br /&gt;&lt;br /&gt;PHP appears to be a simple language, because it spurns Perl's proclivity for hieroglyphics and is somewhat less "write-only" than Perl. Notably arrays and hashes are unified in PHP. PHP occasionally falls over itself, attempting to improve upon Perl with cleaner syntax - e.g. for regular expressions - but mostly it does a good job at keeping its syntax consistent.&lt;br /&gt;&lt;br /&gt;I am therefore puzzled to see that the architects of PHP found the need to have &lt;i&gt;&lt;b&gt;language constructs&lt;/b&gt;&lt;/i&gt; which are distinguished from functions in that they do not need to have parentheses around their arguments. They are listed amongst the PHP &lt;a href="http://php.net/manual/en/reserved.keywords.php"&gt;keywords&lt;/a&gt;. Typically, you only come upon them, when you see that (say) an echo in a PHP page "forgot" to have its parentheses, but... seems to work anyhow.&lt;br /&gt;&lt;br /&gt;Once discovered, the PHP coder delights in having two less characters to type, just like the Perl coder makes a point of avoiding superfluous brackets to remind himself that he's not writing C code. Poor code reviewer. Lucky coder, eh?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-5110965981196200822?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/5110965981196200822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2010/01/no-parentheses-in-php-language.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5110965981196200822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5110965981196200822'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2010/01/no-parentheses-in-php-language.html' title='No parentheses in PHP language constructs'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-509547507011299575</id><published>2010-01-10T07:28:00.000-08:00</published><updated>2010-01-10T08:21:05.686-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sun'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='postgresql'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Database of choice - reacting to Oracle</title><content type='html'>I have some loyalty to &lt;a href="http://www.postgresql.org/"&gt;PostgreSQL&lt;/a&gt;, because at a time when I couldn't fathom the MySQL AB &lt;a href="http://www.mysql.com/about/legal/"&gt;license&lt;/a&gt;, it provided an excellent affordable &lt;a href="http://en.wikipedia.org/wiki/Relational_database_management_system"&gt;RDBMS&lt;/a&gt;. In a project where some messy Oracle code needed to be ported, I was pleased with the ease of porting into PostgreSQL. It would have been tougher with MySQL. I would love to say that I was guided by the principals of &lt;a href="http://en.wikipedia.org/wiki/Free_and_open_source_software"&gt;FOSS&lt;/a&gt;, but hand on heart &lt;b&gt;Free = $0&lt;/b&gt; was my main motive.&lt;br /&gt;&lt;br /&gt;Times have moved on, and MySQL has been a choice of convenience in a lot of my recent projects, because it is by far the market leader and it is very good at what it does. People know it. Like all software developers, I've become wiser about licenses, and I'm OK with what I get from the package managers.&lt;br /&gt;&lt;br /&gt;An &lt;a href="http://developers.slashdot.org/story/10/01/09/136244/Why-Oracle-Cant-Easily-Kill-PostgreSQ"&gt;article&lt;/a&gt; in Slashdot alerted me to &lt;a href="http://askmonty.org/wiki/index.php/MariaDB_versus_MySQL"&gt;MariaDB&lt;/a&gt;, which is a FOSS branch of MySQL 5.1 by &lt;a href="http://en.wikipedia.org/wiki/Michael_Widenius"&gt;Monty Widenius&lt;/a&gt;, the original MySQL creator. Monty &lt;a href="http://monty-says.blogspot.com/2009/12/help-keep-internet-free.html"&gt;expresses&lt;/a&gt; the opinion that PostgreSQL is dominated by closed source &lt;a href="http://www.enterprisedb.com/products/postgres_plus_as/features.do"&gt;Enterprise DB&lt;/a&gt;, which has exclusively many of the features that you come to expect like replication (Slony). He also says that Oracle's buy-out of Sun is likely to be a way to retrieve the perceived sales loss of $1 billion per annum caused by MySQL. They are going to want to make some money out of it, and are unlikely to depend on Sun's preference community building etc, which is a bewildering way to make money anyhow. If Oracle are also likely to buy out talent from PostgreSQL, all FOSS RDBMS look threatened by the buy-out.&lt;br /&gt;&lt;br /&gt;Should we care and do selfish individuals like me need to help? I guess we should sign the &lt;a href="http://helpmysql.org/en/theissue/customerspaythebill"&gt;Save MySQL bill&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I guess MariaDB means going back to compiling from source for a while, but I expect it won't be long before we see it in our package managers, if Oracle does the dirty on MySQL. There are &lt;a href="http://blog.endpoint.com/2010/01/state-of-postgres-project.html"&gt;good reasons&lt;/a&gt; to doubt whether Oracle could buy out the top 20 PostgreSQL developers, though, bearing in mind that they are all typically employed and valued by other companies. Probably the best thing to do is to stick to standards and avoid vendor lock-in. Don't get too comfortable with the idiosyncrasies of your RDBMS, because you may need to change.&amp;nbsp;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-509547507011299575?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/509547507011299575/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2010/01/database-of-choice-reacting-to-oracle.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/509547507011299575'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/509547507011299575'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2010/01/database-of-choice-reacting-to-oracle.html' title='Database of choice - reacting to Oracle'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-6051871071288157676</id><published>2010-01-04T05:03:00.000-08:00</published><updated>2010-01-04T05:03:33.937-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='email'/><title type='text'>E-mail the old killer app</title><content type='html'>E-mail is the old &lt;a href="http://en.wikipedia.org/wiki/Killer_application"&gt;killer app&lt;/a&gt; of the Internet. In the early-mid '90s you were more likely to subscribe to ritual humiliation by ISPs because of e-mail than WWW; nobody could find anything in those days on the Web anyhow.&lt;br /&gt;&lt;br /&gt;Familiarity with an e-mail client remains a good reason to avoid changing operating systems nowadays. A Microsoft user is likely to spend more time in Outlook than Word. So it ought to be a well-oiled machine, oughtn't it? But it isn't.&lt;br /&gt;&lt;br /&gt;I spent a chunk of time working for &lt;a href="http://web.archive.org/web/*/http://www.emailsystems.com"&gt;a company that provided managed e-mail services&lt;/a&gt;, but I struggle with day-to-day e-mail issues, caused by the paradoxes of&lt;br /&gt;&lt;ol&gt;&lt;li&gt;wanting to make myself accessible and yet remaining impervious to spam&lt;/li&gt;&lt;li&gt;being intolerant of false-positive spam filtering in my mailbox and yet preferring my family not to be subjected to the freakish correspondence that's become the norm in unsolicited mail&lt;/li&gt;&lt;li&gt;attaching low value to something that's a life-line in business&lt;/li&gt;&lt;/ol&gt;Increasingly, I've worked with people whose e-mail is so broken, that they've come to depend on social networking or centralised project management applications like &lt;a href="http://basecamphq.com/"&gt;Basecamp&lt;/a&gt; or bug tracking software like &lt;a href="http://www.mantisbt.org/"&gt;Mantis&lt;/a&gt;/&lt;a href="http://www.bugzilla.org/"&gt;Bugzilla&lt;/a&gt; to handle correspondence. These all need message recipients to be alerted via &lt;a href="http://en.wikipedia.org/wiki/Instant_messaging"&gt;IM&lt;/a&gt;, text messages, telephone calls or - yes - e-mail to alert message recipients, if they are not in the habit of polling the system, so none of them really cut the mustard for general messaging. What we all really want is e-mail that works.&lt;br /&gt;&lt;br /&gt;My current imperfect solution is as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;I don't publicise my e-mail address much, but it is &lt;i&gt;out there&lt;/i&gt;, which means that not much effort with Google will locate my address. This is security by obfuscation, which is ultimately doomed as the spammers put more an more effort into harvesting e-mail addresses, but it may prevent some nutters from sending unsolicited mail.&lt;/li&gt;&lt;li&gt;I use &lt;a href="http://spamassassin.apache.org/"&gt;SpamAssassin&lt;/a&gt; for filtering my mail on a low-end &lt;a href="http://www.memset.com/dedicated-servers/virtual.php"&gt;miniserver&lt;/a&gt;, where I have my&lt;a href="http://en.wikipedia.org/wiki/Message_transfer_agent"&gt;&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/MX_record#Load_distribution_among_an_array_of_mail_servers"&gt;MX server&lt;/a&gt;. I use &lt;a href="http://www.postfix.org/"&gt;Postfix&lt;/a&gt; as my inbound and outbound &lt;a href="http://en.wikipedia.org/wiki/Message_transfer_agent"&gt;MTA&lt;/a&gt;. I use &lt;a href="http://partmaps.org/era/procmail/mini-faq.html"&gt;Procmail&lt;/a&gt; as my &lt;a href="http://en.wikipedia.org/wiki/Mail_delivery_agent"&gt;MDA&lt;/a&gt;, putting mail judged to be spam into a spam folder in the &lt;a href="http://en.wikipedia.org/wiki/Maildir"&gt;Maildir&lt;/a&gt;, as specified in /home/&lt;b&gt;{user}&lt;/b&gt;/.procmailrc. I use &lt;a href="http://en.wikipedia.org/wiki/Post_Office_Protocol"&gt;POP3S&lt;/a&gt; over &lt;a href="http://www.dovecot.org/"&gt;Dovecot&lt;/a&gt; to fetch my mail, which I manage on my desktop PC with Outlook, because I like to be able to search through several years of e-mail, and Outlook's &lt;a href="http://en.wikipedia.org/wiki/Personal_Storage_Table"&gt;PST&lt;/a&gt;s seemed to work best. The rest of the family/small-business use IMAP over Dovecot, because they move around more and expect e-mail to be accessible from any client.&lt;/li&gt;&lt;li&gt;I use Outlook as a second line of defense against spam, because I've found that it identifies a lot of spam that my SpamAssassin setup overlooks.&lt;/li&gt;&lt;/ol&gt;This isn't perfect for various reasons.&lt;br /&gt;&lt;br /&gt;The worst reason is that it is over-engineered. I run a local &lt;a href="http://en.wikipedia.org/wiki/Certificate_authority"&gt;certificate authority&lt;/a&gt; for the &lt;a href="http://en.wikipedia.org/wiki/Transport_Layer_Security"&gt;TLS&lt;/a&gt;, to save money, but that means installing the CA as a trusted authority on all PCs, which use POP3S or IMAP/TLS to fetch mail. I &lt;i&gt;just works&lt;/i&gt; for a few years and then I need to refamiliarise myself with the server set up for an upgrade.&lt;br /&gt;&lt;br /&gt;I upgraded the miniserver from Debian Etch to Lenny to take advantage - at long last - of improvements to SpamAssassin, which need Lenny's package manager. The upgrade looked smooth enough, but mail delivery stopped dead, and I realised that I needed to revisit the server set-up, because I'd forgotten too much and the log files didn't say enough to identify what was going wrong.&lt;br /&gt;&lt;br /&gt;I knew it was going somewhere, but where could it be? I little it of fiddling with &lt;b&gt;du&lt;/b&gt; showed me that /var/spool/postfix/hold was holding onto everything that ought to have been going through the MTA. A lot of head-scratching and floundering with Google, identified the following as the cause of the problem in /etc/postfix/main.cf:&lt;br /&gt;&lt;blockquote&gt;header_checks = regexp:/etc/postfix/header_checks&lt;br /&gt;&lt;/blockquote&gt;It looks like everything was getting identified by the header check regex and getting thrown into the hold queue. The regex?&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;root@mini:/var/spool# cat /etc/postfix/header_checks&lt;br /&gt;/^Received:/ HOLD&lt;br /&gt;&lt;/blockquote&gt;Hmmm. So it looks like everything was getting stamped with a &lt;b&gt;Received: HOLD&lt;/b&gt; header.&lt;br /&gt;&lt;br /&gt;I simply commented out the &lt;b&gt;header_checks&lt;/b&gt; from Postfix's &lt;b&gt;main.cf&lt;/b&gt;, and now I'm back in business with somewhat improved spam-filtering. If I was putting more energy into my e-mail handling, I might get some value out of using a hold queue, but life is too short to fix this properly and to get to the bottom of why this Received header was getting set and or detected.&lt;br /&gt;&lt;br /&gt;My New Year's resolution ought to be to value e-mail more highly and get a better solution for all this, but it is tempting to close the book on this for a few more months and "make do".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-6051871071288157676?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/6051871071288157676/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2010/01/e-mail-old-killer-app.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6051871071288157676'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6051871071288157676'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2010/01/e-mail-old-killer-app.html' title='E-mail the old killer app'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-6325463063862874033</id><published>2009-12-12T03:38:00.000-08:00</published><updated>2009-12-12T03:38:10.919-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lucene'/><title type='text'>Obsoleting deprecated detritus in Lucene 3</title><content type='html'>Lucene went up a major version number in November, &lt;a href="http://lucene.apache.org/java/3_0_0/changes/Changes.html"&gt;purging some of the deprecated detritus&lt;/a&gt;, which expediency has left littered in a lot of my code.&lt;br /&gt;&lt;br /&gt;This is what I needed to address to get with it:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Nowadays Lucene refers to fields as analyzed rather than &lt;a href="http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/document/Field.Index.html#TOKENIZED"&gt;tokenized&lt;/a&gt;. This is much clearer, and its terminology which I've pushed all the way back to my query interface. I haven't obsoleted "tokenize" in my query interface - I have too many JavaScript, PHP and Perl clients scattered about to do that, but I have deprecated it, favouring "analyze" instead. My server code now compiles against Lucene 3 with "analyze".&lt;/li&gt;&lt;li&gt;&lt;b&gt;Shock horror:&lt;/b&gt; Lucene has made &lt;a href="http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/document/Field.Store.html#COMPRESS"&gt;compressed fields&lt;/a&gt; obsolete. My index writers now ignore requests to compress, requiring instead that data is requested to be zipped with a &lt;a href="http://java.sun.com/javase/6/docs/api/constant-values.html#java.util.zip.Deflater.BEST_COMPRESSION"&gt;specified compression&lt;/a&gt;, and clients wishing to retrieve this content are required to indicate explicitly that they are fetching zipped binary data, which my reader &lt;a href="http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/document/CompressionTools.html#decompressString%28byte[]%29"&gt;should decompress&lt;/a&gt;. With some help from &lt;b&gt;java-user@lucene.apache.org&lt;/b&gt;&lt;span id="goog_1260614786034"&gt;&lt;/span&gt;&lt;span id="goog_1260614786035"&gt;&lt;/span&gt;, I found that I had been abusing field compression anyhow by applying it to fields of less than 1K. It may wind up being inappropriate to zip the synopsis fields which I had been compressing, because upgrading indexes with compressed fields and decompressing the compressed fields resulted in an unexpected &lt;i&gt;&lt;b&gt;decrease&lt;/b&gt;&lt;/i&gt; in the index size. I should get a good performance improvement avoiding the spurious compression in my indexes, and it looks as though I shan't see in increase in index sizes in my application. It took the upgrade to force me to do the right thing.&lt;/li&gt;&lt;li&gt;&lt;a href="http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/search/Hits.html"&gt;Hits&lt;/a&gt; are out and it is time for me to embrace the more efficient &lt;a href="http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/search/Searcher.html#search%28org.apache.lucene.search.Query,%20int%29"&gt;TopDocs mechanism&lt;/a&gt; for getting hits. Changing over to TopDocs wasn't very painful in the event, my server code was sufficiently modular to make it one edit and - yes - I can see a performance improvement in my average query times.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I noticed that scoring is more discerning now, which is a good thing for relevancy sort order. I am seeing fewer score='1.0' results, which I believe is a consequence of &lt;a href="http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/analysis/StopFilter.html#StopFilter%28boolean,%20org.apache.lucene.analysis.TokenStream,%20java.util.Set%29"&gt;better handling&lt;/a&gt; of proximity with stop words. There may be some repercussions from this in relevancy orders in the change-over to Lucene 3.0, where my Lucene 2.3 indexes have completely ignored stop words, and I need to deal with an evil script, I have come to depend on which uses absolute scores rather than percentage scores, which would &lt;a href="http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_filter_by_score.3F"&gt;always have been more reliable&lt;/a&gt;.&lt;/li&gt;&lt;/ol&gt;I was initially tempted to upgrade to 2.9.1 and leave all of my deprecated foibles in production, but I'm glad that I went the whole hog and implemented my software with Lucene 3; it forced me to address issues which I've sat on for too long.&lt;br /&gt;&lt;ol&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-6325463063862874033?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/6325463063862874033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/12/obsoleting-deprecated-detritus-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6325463063862874033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/6325463063862874033'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/12/obsoleting-deprecated-detritus-in.html' title='Obsoleting deprecated detritus in Lucene 3'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-182934757372511332</id><published>2009-12-01T09:01:00.000-08:00</published><updated>2009-12-01T09:01:57.789-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dom'/><category scheme='http://www.blogger.com/atom/ns#' term='dtd'/><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><title type='text'>Wrong Content-type with an HTTP/404</title><content type='html'>This really threw me:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;2009-12-01 16:07:22,875 INFO  [STDOUT] 16:07:22,874 ERROR [Rss2ListIndexHandler]&lt;br /&gt;Unable to add http://SOMEPATHTOANXML.xml: Server returned HTTP response code: &lt;br /&gt;503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;My crawler was handling RSS 2.0 XML published on a blog site. Some of the RSS feeds did not exist, and the server returned the sites's standard HTTP/404 ErrorDocument, which was of course Content-type text/html, served up with the following DOCTYPE:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"&lt;br /&gt;    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;My XML DOM handler naturally assumed it was getting XML, because it didn't spot the unexpected Content-type, and attempted to fetch &lt;em&gt;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&lt;/em&gt; to validate it. W3 returned an HTTP/503 to reduce its bandwidth, because it reasonably enough viewed my fetch as &lt;a href="http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic" title="Why W3 returns HTTP/503 for DTDs"&gt;abusive&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-182934757372511332?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/182934757372511332/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/12/wrong-content-type-with-http404.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/182934757372511332'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/182934757372511332'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/12/wrong-content-type-with-http404.html' title='Wrong Content-type with an HTTP/404'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7773305005438353279</id><published>2009-11-19T02:15:00.000-08:00</published><updated>2009-11-19T02:22:27.159-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='shell scripting'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='lucene'/><category scheme='http://www.blogger.com/atom/ns#' term='standards'/><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>BeanShell for a sanity-check</title><content type='html'>There are some development tools that come to be real friends in a crisis. One of these for me is &lt;a href="http://www.beanshell.org/" title="Lightweight scripting for Java"&gt;BeanShell&lt;/a&gt; for quick Java tests and experiments.&lt;br /&gt;&lt;br /&gt;I have a process in production that scans the published &lt;a href="http://en.wikipedia.org/wiki/Site_map#XML_Sitemaps" title="Google Sitemaps"&gt;XML Sitemaps&lt;/a&gt; for various domains owned by a client. When changes have been made to pages, or new pages come into existence or pages are added, it needs to make changes to corresponding &lt;a href="http://lucene.apache.org/" title="Apache Lucene"&gt;Lucene&lt;/a&gt; indexes. To stop it from working on pages unnecessarily, it looks at the &lt;code&gt;lastmod&lt;/code&gt; elements, which are &lt;a href="http://en.wikipedia.org/wiki/ISO_8601" title="What Wikipedia says about ISO 8601"&gt;ISO 8601&lt;/a&gt; date+time stamps. My utility is only interested in the date part of the &lt;code&gt;lastmod&lt;/code&gt;, and it parses is with the &lt;a href="http://java.sun.com/javase/6/docs/api/java/text/SimpleDateFormat.html" title="Java docs about SimpleDateFormat"&gt;Java SimpleDateFormat&lt;/a&gt; formatter &lt;code&gt;DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd");&lt;/code&gt;.&lt;br /&gt;&lt;br /&gt;In one of the sites an area had become marooned from local linkage, and it was missing from the index. It was only referenced from another domain, and its XML sitemap generation used a quick'n'dirty approach with a point-it-at-the-root &lt;a href="http://www.xml-sitemaps.com/" title="I think it used this one"&gt;sitemap generation tool&lt;/a&gt;. These generators are fine, if you cannot publish your own sitemap with your content management application, with caveats:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Server-side scripts will typically always appear to have new &lt;code&gt;lastmod&lt;/code&gt; values. Typically PHP developers do not &lt;a href="http://php.net/manual/en/function.filemtime.php" title="How to get a good Last-modified header value in PHP"&gt;fix&lt;/a&gt; the &lt;code&gt;Last-modified&lt;/code&gt; header to reflect the last file edit, which the generators depend upon, because it is not always appropriate; the dynamic content may have changes for other reasons (e.g. data or an include).&amp;nbsp;&lt;/li&gt;&lt;li&gt;Content accessible only via search or JavaScript won't appear in one of these sitemaps.&lt;/li&gt;&lt;/ul&gt;To add to the confusion, a sitemap generator had been producing files in the form "2009-11-13 22:23:38", but had changed or been replaced with another that generated the more &lt;acronym title="The way date+time stamps really ought to be"&gt;ISO 8601&lt;/acronym&gt;-ish "2009-11-13T22:23:38+00:00". I needed a quick sanity-check that a format change wasn't responsible for the missing references in the index.&lt;br /&gt;&lt;br /&gt;Enter BeanShell, with the following shell script: &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;import java.text.SimpleDateFormat;&lt;br /&gt;import java.text.DateFormat;&lt;br /&gt;import java.util.Date;&lt;br /&gt;&lt;br /&gt;DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd");&lt;br /&gt;String[] ss = "2009-11-13,2009-11-13 22:23:38,2009-11-13T22:23:38+00:00".split(",");&lt;br /&gt;for (String s : ss) {&lt;br /&gt; Date date = fmt.parse(s);&lt;br /&gt; print(s+": "+fmt.format(date)+"\n");&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Sure enough, by SimpleDateFormat was OK for my purposes. It could cope with all three &lt;code&gt;lastmod&lt;/code&gt; formats.&lt;br /&gt;&lt;br /&gt;What I really &lt;i&gt;like&lt;/i&gt; about this quick test, is that I didn't have to rub sticks together to boot-strap Eclipse to perform a simple Java test. I have the following simple Win32 bash.cmd file in my path on my Windows work-horse:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;@echo off&lt;br /&gt;setlocal&lt;br /&gt;set classpath=%classpath%;/lib/BeanShell/bsh-2.0b4.jar&lt;br /&gt;java bsh.Interpreter %*&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;All I have to do to run the test from my command prompt, is type &lt;code&gt;bsh test.bsh&lt;/code&gt;.&lt;br /&gt;&lt;br /&gt;I have a similar &lt;code&gt;BASH&lt;/code&gt; script on my Linux laptop, but that wasn't at my finger-tips at the time.&lt;br /&gt;&lt;br /&gt;In the event, we realised we needed to fix the sitemap generation process and pasted a chunk into the generated sitemap to index the marooned content as a stop-gap. Likewise Google will benefit from that sitemap.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7773305005438353279?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7773305005438353279/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/11/beanshell-for-sanity-check.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7773305005438353279'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7773305005438353279'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/11/beanshell-for-sanity-check.html' title='BeanShell for a sanity-check'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7889784210353983806</id><published>2009-11-18T02:42:00.000-08:00</published><updated>2009-11-18T02:42:44.202-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mql'/><category scheme='http://www.blogger.com/atom/ns#' term='lucene'/><category scheme='http://www.blogger.com/atom/ns#' term='freebase'/><category scheme='http://www.blogger.com/atom/ns#' term='json'/><category scheme='http://www.blogger.com/atom/ns#' term='jsonp'/><title type='text'>Querying Freebase</title><content type='html'>&lt;a href="http://en.wikipedia.org/wiki/Freebase_%28database%29" title="Metaweb Freebase"&gt;Freebase&lt;/a&gt; is a wealthy resource of free (Creative Commons) information in a structured form, which you can access using a sophisticated query language, called the &lt;a href="http://www.freebase.com/view/en/documentation" title="Metaweb Query Language documentation"&gt;Metaweb Query Language&lt;/a&gt; (&lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt;), which you can express in &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt;. &lt;br /&gt;&lt;br /&gt;Pulling information from Freebase in a server-side or desktop application, realistically requires you to use one of the &lt;a href="http://www.freebase.com/docs/client_libraries" title="Language libraries for Freebase"&gt;client libraries&lt;/a&gt; or working with a &lt;a href="http://download.freebase.com/datadumps/" title="Download TSV data dumps etc from Freebase"&gt;data dump&lt;/a&gt;, if you don't have time to familiarise yourself with &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt;. &lt;br /&gt;&lt;br /&gt;I've been playing with the data dumps and &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt; and I'm finding that it makes sense to work with &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt;, because it models the data nicely, though I do find myself missing the functionality of &lt;a href="http://lucene.apache.org/" title="Lucene main page"&gt;Lucene&lt;/a&gt; for querying, especially &lt;a href="http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/FuzzyQuery.html" title="Fuzzy Query Java class"&gt;fuzzy&lt;/a&gt; matching and full text search. But that's another story.&lt;br /&gt;&lt;br /&gt;It is best to start playing with &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt; using the &lt;a href="http://www.freebase.com/app/queryeditor" title="Play with the Query Editor"&gt;MQL Query Editor&lt;/a&gt; application, which is a superb bit of JavaScript, allowing you to construct queries using your tab key to explore the database schema. This gets to up to speed with constructing empty maps with &lt;code&gt;{}&lt;/code&gt;, empty arrays with &lt;code&gt;[]&lt;/code&gt; and empty values with &lt;code&gt;null&lt;/code&gt; in your &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt; query, using key/value pairs for the selection criteria.&lt;br /&gt;&lt;br /&gt;Initially, I assumed that the fill-the-blanks &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt; approach would restrict &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt; to exact matches, but &lt;a href="http://mql.freebaseapps.com/ch03.html#matchsyntax" title="MQL Pattern Matching Syntax"&gt;special characters in the keys&lt;/a&gt; allow you to achieve range queries and regular expression matching. &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt; is very powerful.&lt;br /&gt;&lt;br /&gt;So buoyed up by my experimentation, I needed to bite the bullet with implementation in a project. My project has a lot of client-side JavaScript in it, using the &lt;a href="http://www.prototypejs.org/" title="JavaScript framework for Object Orientated class-style DHTML, AJAX etc"&gt;prototype.js&lt;/a&gt; and &lt;a href="http://script.aculo.us/" title="GUI framework built on top of prototype.js"&gt;script.aculo.us&lt;/a&gt; frameworks. I haven't made the move to &lt;a href="http://jquery.com/" title="The JavaScript framework that seems to be gaining the most popularity"&gt;JQuery&lt;/a&gt;, because my existing combination works very nicely and I do not like to change things for just for the sake of it. &lt;a href="http://www.mjtemplate.org/doc/freebaseapi.html" title="Language libraries listed at Freebase"&gt;Neon signs&lt;/a&gt; in Freebase point you towards using &lt;a href="http://www.mjtemplate.org/doc/freebaseapi.html" title="A Javascript application framework that includes wrappers for Freebase API services as well as other useful pieces for building browser-side Freebase applications"&gt;Mjt&lt;/a&gt; (pronounced &lt;i&gt;midget&lt;/i&gt;), which gets you into &lt;a href="http://www.mjtemplate.org/examples/" title="Mjt template examples"&gt;elegant JavaScript templating&lt;/a&gt;, but I have a nasty feeling that mixing this with prototype.js, is going to lead to debugging headaches. Also &lt;a href="http://www.blogger.com/basecamphq.com/" title="A collaboration system with Textile mark up language support"&gt;Basecamp&lt;/a&gt; &lt;i&gt;polite milestone&lt;/i&gt; reminders keep telling me that my &lt;a href="http://www.quotationspage.com/quote/723.html" title="Douglas Adams RIP"&gt;deadline&lt;/a&gt; is overdue.&lt;br /&gt;&lt;br /&gt;So I needed to get &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt; working with my JavaScript toolset, and leave Mjt for a rainy day, and that means getting a &lt;acronym title="JSON with padding or prefix"&gt;JSONP&lt;/acronym&gt; implementation working with prototype.js, which unlike &lt;a href="http://docs.jquery.com/Ajax/jQuery.getJSON" title="JQuery JSON-P support"&gt;JQuery&lt;/a&gt;, doesn't appear to support &lt;acronym title="JSON with padding or prefix"&gt;JSONP&lt;/acronym&gt; out of the box.&lt;br /&gt;&lt;br /&gt;With &lt;a href="http://en.wikipedia.org/wiki/JSON#JSONP" title="JSON with Padding"&gt;JSONP&lt;/a&gt;, a calling parameter specifies the name of the callback function. The callback function is invoked with a JavaScript object, using &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt; from a script loaded into the HEAD element dynamically loaded into the document. The calling parameter is &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt; encoded into a single line as a query string parameter.&lt;br /&gt;&lt;br /&gt;I've implemented &lt;a href="http://bimport.blogspot.com/2009/11/json-for-cross-domain-scripting-with.html" title="My JSONP-like handling for XML querying and responses for a Lucene query service"&gt;something in the same vein as JSONP&lt;/a&gt; before. So I wanted to implement something similar as a &lt;a href="http://www.prototypejs.org/learn/class-inheritance" title="Class inheritance in prototype.js"&gt;prototype.js Class&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here's what I wound up with for a base class for JSONP:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// Base class for JSONP - odd name, because of its similarity&lt;br /&gt;// to my previous Uxml class .&lt;br /&gt;var Ujson = Class.create({&lt;br /&gt;&lt;br /&gt; // Ctor&lt;br /&gt; initialize: function(url, queryObject) {&lt;br /&gt;  this.reponse = null;&lt;br /&gt;  this.transport = new Element('script',{&lt;br /&gt;   type: 'text/javascript',&lt;br /&gt;   src: url+(typeof queryObject === 'object' ?&lt;br /&gt;    Object.toJSON(queryObject) : queryObject),&lt;br /&gt;  });&lt;br /&gt;  this.head.appendChild(this.transport);&lt;br /&gt; },&lt;br /&gt;&lt;br /&gt; head: document.getElementsByTagName('head')[0],&lt;br /&gt;&lt;br /&gt; // Handle the response - expect to override this&lt;br /&gt; handleResponse: function() {&lt;br /&gt;  alert('Object response is: ' + &lt;br /&gt;   Object.toJSON(this.response));&lt;br /&gt; },&lt;br /&gt;&lt;br /&gt; callback: function(obj) {&lt;br /&gt;&lt;br /&gt;  // It is already a bona fide object, passed by JSON.&lt;br /&gt;  // No need to parse it.&lt;br /&gt;  this.response = obj;&lt;br /&gt;&lt;br /&gt;  if (this.transport &amp;amp;&amp;amp; Object.isElement(this.transport)) {&lt;br /&gt;   var transport = this.transport;&lt;br /&gt;   setTimeout(function() {&lt;br /&gt;    if (transport) transport.remove();&lt;br /&gt;   }, 20);&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  // Handle the response with an override&lt;br /&gt;  this.handleResponse();&lt;br /&gt; },&lt;br /&gt;});&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You can see that the &lt;code&gt;script src=""&lt;/code&gt; attribute gets the concatenation of url plus the stringified JavaScript object, so the url parameter is for Freebase &lt;b&gt;http://www.freebase.com/api/service/mqlread?callback=freebase_callback&amp;amp;query=&lt;/b&gt;, where freebase_callback is the name of the callback function and the stringified JS object is passed to the query parameter. There is probably a prettier way of doing this in the base class, but this looks OK to me.&lt;br /&gt;&lt;br /&gt;I didn't initially use prototype.js's &lt;code&gt;Object.toJSON(queryObject)&lt;/code&gt; to stringify the JS object, preferring instead the more general &lt;a href="http://www.json.org/js.html" title="JSON in JavaScript"&gt;JSON.stringify&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// This technique appears to get clobbered by prototype.js&lt;br /&gt;JSON.stringify(queryObject, function(key, value) { // Replacer&lt;br /&gt; if (typeof value === 'number' &amp;amp;&amp;amp; !isFinite(value)) {&lt;br /&gt;  return String(value);&lt;br /&gt; }&lt;br /&gt; return value;&lt;br /&gt;})&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;However, I found that I was getting quotes URI entity-encoded with a spurious leading '\' escape, and my &lt;acronym title="Metaweb Query Language"&gt;MQL&lt;/acronym&gt; keys were made illegible. Presumably it converted it into a string and then URI entity-encoded the string, complete with the '\' escape characters for double quotes. &lt;br /&gt;&lt;br /&gt;I &lt;i&gt;believe&lt;/i&gt; that this was a conflict between prototype.js and the JSON object, but the correct solution was to use prototype.js's static method &lt;a href="http://www.prototypejs.org/learn/json" title="How prototype.js stringifies"&gt;Object.toJSON(obj)&lt;/a&gt;. It took me a long time to figure that out, because of the distractions of modern life, paucity of debugging in JavaScript and a good dollop of stupidity on my part!&lt;br /&gt;&lt;br /&gt;So here's an inherited class for Freebase, getting a list of stars in movies and images:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;var UjsonFreebase = Class.create(Ujson, {&lt;br /&gt;&lt;br /&gt; // Ctor&lt;br /&gt; initialize: function($super, name) {&lt;br /&gt;  $super(&lt;br /&gt;   'http://www.freebase.com/api/service/mqlread?callback=freebase_callback&amp;amp;query=',&lt;br /&gt;   {"query":&lt;br /&gt;// Query built at http://www.freebase.com/app/queryeditor/ - using the tab key :-)&lt;br /&gt;[{&lt;br /&gt;  "type": "/film/film",&lt;br /&gt;  "name": name, // Here's the movie name variable&lt;br /&gt;  "guid": null,&lt;br /&gt;  "id":   null,&lt;br /&gt;  "starring": [{&lt;br /&gt;    "actor":     null,&lt;br /&gt;    "character": null&lt;br /&gt;  }],&lt;br /&gt;  "/common/topic/image": [{&lt;br /&gt;    "id":       null,&lt;br /&gt;    "optional": true,&lt;br /&gt;    "limit":    3&lt;br /&gt;  }],&lt;br /&gt;  "genre": [{&lt;br /&gt;    "name": null&lt;br /&gt;  }],&lt;br /&gt;  "initial_release_date" : null&lt;br /&gt;}]&lt;br /&gt;   }&lt;br /&gt;  );&lt;br /&gt; },&lt;br /&gt;&lt;br /&gt; handleResponse: function() {&lt;br /&gt;  // Response object passed to a dialogue&lt;br /&gt;  // which does aplication-specific stuff &lt;br /&gt;  // with the data&lt;br /&gt;  new MyDialog(this.response);&lt;br /&gt; },&lt;br /&gt;});&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;For this to work, I needed a function called &lt;code&gt;freebase_callback&lt;/code&gt; in the global namespace. I tried using an object, but the web service didn't seem to like to use that as the &lt;acronym title="The callback function seems to be referred to as prefix by some, giving the P in JSONP. Wikipedia calls the P padding so perhaps this is a modern myth."&gt;prefix&lt;/acronym&gt; for the JSONP callback. &lt;br /&gt;&lt;br /&gt;So, assuming the &lt;code&gt;UjsonFreebase&lt;/code&gt; object was created as &lt;code&gt;window.freebase = new UjsonFreebase("Gone With The Wind")&lt;/code&gt;, we need a global &lt;code&gt;freebase_callback&lt;/code&gt; function as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;function freebase_callback(obj) {&lt;br /&gt; if (window.freebase)&lt;br /&gt;  window.freebase.callback(obj);&lt;br /&gt; else&lt;br /&gt;  alert("No freebase object");&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It works.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7889784210353983806?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7889784210353983806/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/11/querying-freebase.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7889784210353983806'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7889784210353983806'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/11/querying-freebase.html' title='Querying Freebase'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-4516003968438994397</id><published>2009-11-12T07:56:00.000-08:00</published><updated>2009-11-18T03:00:01.280-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='javascript'/><category scheme='http://www.blogger.com/atom/ns#' term='lucene'/><category scheme='http://www.blogger.com/atom/ns#' term='jsp'/><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><category scheme='http://www.blogger.com/atom/ns#' term='json'/><category scheme='http://www.blogger.com/atom/ns#' term='jsonp'/><title type='text'>JSON for cross-domain scripting with XML</title><content type='html'>There is &lt;a href="http://www.theurer.cc/blog/2005/12/15/web-services-json-dump-your-proxy/"&gt;a hack to allow cross domain scripting&lt;/a&gt; using a dynamic JavaScript file, which typically returns data as &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt;. The idea is that you use JavaScript to generate a SCRIPT tag dynamically, which invokes a server-side script that generates JavaScript in response to the query string which depends on JavaScript variables in the client. It makes sense to have this server-side script return a JavaScript associative array, which is described by JSON, but it could be a bunch of variable assignments, if that was easier to work with.&lt;br /&gt;&lt;br /&gt;I hate the fact that this is necessary, but can understand why. This approach is more efficient than using a proxy for cross-domain scripting, because it avoids the extra network hop. What I don't like about the approach, though, is that it tempts you to put client-side business logic into the web services.&lt;br /&gt;&lt;br /&gt;I have a web service for querying a &lt;a href="http://lucene.apache.org/" title="Apache Lucene"&gt;Lucene&lt;/a&gt; indexes. Queries are very expressive, encapsulating Lucene's query objects, and XML allows that expression nicely. Results are also expressive. &lt;br /&gt;&lt;br /&gt;It is tempting to move the logic responsible for setting up the query into my web service and add JSPs for each client requirement, but that means making my web service know more about its clients than I want and delocalising my concerns. I don't like that.&lt;br /&gt;&lt;br /&gt;So how about this approach? I generate the query XML and &lt;code&gt;encodeURIComponent(xml)&lt;/code&gt; it to make it a parameter passed to my JSP. My JSP, which is local to the web service, simply URIDecodes the query parameter and passes the query on to my usual web service. This I invoke in much the same way as JSON, via a dynamic SCRIPT tag from the client. My JSP outputs one line: &lt;code&gt;xml = decodeURIComponent('XXX');&lt;/code&gt;, where the 'XXX' is an URIEncoded XML response from my web service.&lt;br /&gt;&lt;br /&gt;It should work, provided my query string &lt;a href="http://classicasp.aspfaq.com/forms/what-is-the-limit-on-querystring/get/url-parameters.html"&gt;doesn't get too long&lt;/a&gt;. There ought to be a way to avoid the extra server-side processing, but this is the best I can come up with for now.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;It works&lt;/h3&gt;&lt;br /&gt;My servlet decodes the query string and gets XML&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// Read the query from the query string&lt;br /&gt; String xmlQuery = java.net.URLDecoder.decode(&lt;br /&gt;  // This is the XML document encoded by http://www.w3schools.com/jsref/jsref_encodeuricomponent.asp&lt;br /&gt;  request.getQueryString() == null ? "" : request.getQueryString()&lt;br /&gt;  ,"UTF-8"&lt;br /&gt; );&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It returns the XML in URLEncoded form to a JavaScript object in the calling code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;String xmlResponseURI = java.net.URLEncoder.encode(xmlResponse, "ISO-8859-1");&lt;br /&gt; ServletOutputStream outStream = response.getOutputStream();&lt;br /&gt; outStream.println("if (window.uxml) window.uxml.callback('"+xmlResponseURI.replace('+', ' ')+"');");&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The JavaScript object needs to have a callback function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// prototype.js example&lt;br /&gt;&lt;br /&gt;var Uxml = Class.create({&lt;br /&gt;&lt;br /&gt; // Ctor&lt;br /&gt; initialize: function(query) {&lt;br /&gt;  this.reponse = null;&lt;br /&gt;  this.transport = new Element('script',{&lt;br /&gt;   type: 'text/javascript',&lt;br /&gt;   src: 'http://example.com/servlet?'+encodeURIComponent(query),&lt;br /&gt;  });&lt;br /&gt;  this.head.appendChild(this.transport);&lt;br /&gt; },&lt;br /&gt;&lt;br /&gt; head: document.getElementsByTagName('head')[0],&lt;br /&gt;&lt;br /&gt; // Handle the response - expect to override this&lt;br /&gt; handleResponse: function() {&lt;br /&gt;  alert(this.response);&lt;br /&gt; },&lt;br /&gt;&lt;br /&gt; callback: function(encoded_response) {&lt;br /&gt;  this.response = decodeURIComponent(encoded_response);&lt;br /&gt;  if (this.transport &amp;amp;&amp;amp; Object.isElement(this.transport)) {&lt;br /&gt;   var transport = this.transport;&lt;br /&gt;   setTimeout(function() {transport.remove();}, 20);&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  // Handle the response with an override&lt;br /&gt;  this.handleResponse();&lt;br /&gt; },&lt;br /&gt;});&lt;br /&gt;&lt;br /&gt;// Object instance must be called uxml for the callback invocation&lt;br /&gt;window.uxml = new Uxml(xmlQuery);&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The use of the object is similar to the use of a callback function in &lt;acronym title="JSON with padding or prefix"&gt;JSONP&lt;/acronym&gt;, as disucced in &lt;a href="http://en.wikipedia.org/wiki/JSON#JSONP" title="JSON with Padding"&gt;Wikipedia&lt;/a&gt;, in which a calling parameter specifies the name of the callback function. The callback function is invoked with a JavaScript object, using &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt;. If you look at my subsequent &lt;a href="http://bimport.blogspot.com/2009/11/querying-freebase.html" title="Querying Freebase"&gt;Freebase post&lt;/a&gt;, you'll see a &lt;em&gt;proper&lt;/em&gt; JSONP implementation with prototype.js.&lt;br /&gt;&lt;br /&gt;Incidentally, since implementing the XML solution, I found that for the sake of &lt;a href="http://en.wikipedia.org/wiki/Internationalization_and_localization" title="Internationalisation"&gt;i18n&lt;/a&gt;, it made better sense to use UTF-8 encoding in the XML response and therefore the URI entity encoding.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;String xmlResponseURI = java.net.URLEncoder.encode(xmlResponse, "UTF-8");&lt;br /&gt; ServletOutputStream outStream = response.getOutputStream();&lt;br /&gt; outStream.println("if (window.uxml) window.uxml.callback('"+xmlResponseURI.replace('+', ' ')+"');");&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-4516003968438994397?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/4516003968438994397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/11/json-for-cross-domain-scripting-with.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4516003968438994397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4516003968438994397'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/11/json-for-cross-domain-scripting-with.html' title='JSON for cross-domain scripting with XML'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-2649417309140793522</id><published>2009-11-10T02:14:00.000-08:00</published><updated>2009-11-27T05:37:11.902-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='openid'/><category scheme='http://www.blogger.com/atom/ns#' term='ssl'/><category scheme='http://www.blogger.com/atom/ns#' term='shibboleth'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='verisign'/><title type='text'>OpenID Identity Providers</title><content type='html'>Many moons ago I did some work on a system that accepted federated login from &lt;a href="http://shibboleth.internet2.edu/"&gt;Shibboleth&lt;/a&gt; Identity Providers.Setting up that system as a Service Provider for the Shibboleth trust network was quite painful. It involved a bit of dependency hell, partly because the system administrator had a preference for &lt;a href="http://www.slackware.com/"&gt;Slackware&lt;/a&gt;. Shibboleth's approach for networks of trust is excellent for institutions, which need to trust institutions, but I've been finding increasingly that federated login for individuals is what's needed for my projects. Rather than having a network of trust, I have a list of trusted individuals and some knowledge of identity providers. My facility needs to trusts the individual and identity provider used by that individual.&lt;br /&gt;&lt;br /&gt;The whole point of using &lt;a href="http://en.wikipedia.org/wiki/OpenID"&gt;OpenID&lt;/a&gt; for me is to avoid administering passwords, so I don't want to host my own identity provider. My individuals can manage their own identity providers. So what providers should I be looking at. A provider already used by the individual is a good starting point. You probably already have an OpenID if you use any of a list of &lt;a href="http://openid.net/get-an-openid/"&gt;familiar web services&lt;/a&gt;. Well no: those providers may be good enough for the individual, but not necessarily the application that I want to use.&lt;br /&gt;&lt;br /&gt;I've been playing with my neglected &lt;a href="http://www.plaxo.com/"&gt;Plaxo&lt;/a&gt; account, which accepts OpenIDs for federated login. I wanted to see what it feels like using the different IDs.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;My starting point was &lt;a href="https://pip.verisignlabs.com/"&gt;VeriSign's Personal Identity Portal (PIP)&lt;/a&gt;, which I've been using for services like the &lt;a href="http://basecamphq.com/"&gt;Basecamp&lt;/a&gt; collaboration platform, which is favoured by a US client. Putting this into Plaxo reminded me about the irritation of using VeriSign PIP in Basecamp after closing the browser: you get sent to a page with no link asking you to log into PIP first. This prevents &lt;a href="http://en.wikipedia.org/wiki/Phishing" title="Definition of Phishing"&gt;phishing&lt;/a&gt; and is a feature VeriSign call &lt;em&gt;OpenID Sign In Security&lt;/em&gt;. The lack of link is annoying, but VeriSign appears to be secure - provided you don't leave your browser open and computer unattended in an insecure place. I bolstered my personal security by installing a certificate on my PC for 2-way SSL during authentication, but who's to know if other PIP identities are protected by &lt;b&gt;Strong Authentication&lt;/b&gt;? PIP has &lt;a href="http://openid.net/specs/openid-provider-authentication-policy-extension-1_0.html#auth_policies"&gt;phishing-resistant authentication policy&lt;/a&gt; or Provider Authentication Policy Extension (PAPE), which has to be a good thing. The VeriSign PIP OpenID URL is natural-looking &lt;b&gt;https://&lt;i&gt;username&lt;/i&gt;.pip.verisignlabs.com/&lt;/b&gt;, which I like. What I'm not sure about, is how PIP deals with browser certificate expiry etc; there is nothing about this in the FAQ.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The next obvious choice for me was &lt;a href="https://www.google.com/accounts/"&gt;Google Accounts&lt;/a&gt;. The Google Account OpenID URL is difficult to find out and then pretty complicate looking when you do find it. You can't find it out from your account pages at Google, you have to get it by &lt;a href="http://code.google.com/apis/accounts/docs/OpenID.html"&gt;OAuth&lt;/a&gt;. Rather than implementing this myself, I used Plaxo to identify it and then read the URL from the OpenID list in Plaxo. It looks like this: &lt;b&gt;https://www.google.com/accounts/o8/id?id=&lt;i&gt;XXXXXXXXXXXX&lt;/i&gt;&lt;/b&gt; and is definitely a cut, copy and paste job with a jumble of letters at the end. Google don't seem to subscribe to the idea of making their OpenID entirely &lt;i&gt;open&lt;/i&gt;. So perhaps, we shouldn't be publicising it? I'm wary about security through obfuscation. I would like to see 2-way SSL working with Google Accounts and I'd like to see evidence of anti-phishing policy extensions.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I had a go with &lt;a href="https://www.myopenid.com/"&gt;myOpenID&lt;/a&gt;,&amp;nbsp; which is also a free identity provider, giving you an Open ID URL of the form &lt;b&gt;https://&lt;i&gt;username&lt;/i&gt;.myopenid.com/&lt;/b&gt;. There is a nice way to manage personas with myOpenID. However, I found the SSL certificate installation didn't work for me, so I can't vouch for its security. Also it doesn't report any &lt;a href="http://openid.net/specs/openid-provider-authentication-policy-extension-1_0-01.html"&gt;policy extensions&lt;/a&gt; like anti-phishing, which is a disappointment. Verisign PIP seems like a better option.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;My other experiments really came back to Google Accounts again. Being the owner of &lt;b&gt;http://bimport.blogspot.com&lt;/b&gt;, I can use that same URL as an OpenID URL and use by Google Account to authenticate. Likewise, some bright spark has put a wrapper/manager onto Google Accounts and has therefore handled OAuth and you can use a natural looking but 2-week maximum OpenID URL &lt;b&gt;http://openid-provider.appspot.com/&lt;i&gt;emailaddress&lt;/i&gt;&lt;/b&gt; generated by the &lt;a href="http://openid-provider.appspot.com/"&gt;OpenID Provider application at appspot&lt;/a&gt;. If anything, these providers must lessen the Google Accounts security and can only really be seen as a low security convenience.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;OpenID identity providers are not born equal and I need to get more familiar with Verisign PIP's browser certificate handling before I get too dependent on it. I use a &lt;a href="http://www.verisign.com/authentication/individual-authentication/digital-id/index.html"&gt;VeriSign digital ID&lt;/a&gt; for e-mail signing and seldom encryption for which I pay $20 per annum for renewals; I was disappointed not to be able to use this for OpenID login. Initially, I didn't want to go the whole hog and carry around &lt;a href="https://idprotect.verisign.com/learnmore.v"&gt;VIP hardware&lt;/a&gt; for authentication, but it makes sense of you live in more than one browser, if you access security-sensitive applications.&lt;br /&gt;&lt;br /&gt;Implementing a PHP site that uses OpenID federated authentication is a snap using the &lt;a href="http://openidenabled.com/php-openid/"&gt;OpenID library for PHP&lt;/a&gt;. I've yet to bite the bullet with Java, but I expect it is a similar deal.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-2649417309140793522?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/2649417309140793522/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/11/openid-identity-providers.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2649417309140793522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2649417309140793522'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/11/openid-identity-providers.html' title='OpenID Identity Providers'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-2598092953218070453</id><published>2009-10-31T09:42:00.000-07:00</published><updated>2009-10-31T09:42:35.020-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='seo'/><category scheme='http://www.blogger.com/atom/ns#' term='css'/><title type='text'>Hidden DIVs do trick Google</title><content type='html'>I thought Google was going to great efforts not to index content on hidden DIVs. Indeed, I was told that you are liable to be put into Purgatory by Google if hidden DIVs appeared in your page.&lt;br /&gt;&lt;br /&gt;The thinking was presumably that purveyors of adult entertainment, who now get irrelevant keywords via (say) &lt;code&gt;&amp;lt;meta content="meaning, life" name="keywords"/&amp;gt;&lt;/code&gt; ignored, should not start doing things like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;lt;div style="visibility: hidden"&amp;gt;&lt;br /&gt;Meaning of life &lt;br /&gt;&amp;lt;/div&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;...and thus mislead innocent searchers.&lt;br /&gt;&lt;br /&gt;So I was surprised to find a client whose competitor was putting my client's name into a hidden DIV and benefiting from searches on that name. The trick was simply to use an external CSS rather than inlining the style.&lt;br /&gt;&lt;br /&gt;This is what a C++ person would call depending on &lt;i&gt;undefined behaviour&lt;/i&gt;. I wonder if Google will one day take that competitor to task, or if my expectations are unrealistic? S-E-oh-ho-ho-ho.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-2598092953218070453?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/2598092953218070453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/hidden-divs-do-trick-google.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2598092953218070453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/2598092953218070453'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/hidden-divs-do-trick-google.html' title='Hidden DIVs do trick Google'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7246292214054038590</id><published>2009-10-20T11:44:00.000-07:00</published><updated>2009-10-20T11:44:02.355-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='windows 7'/><category scheme='http://www.blogger.com/atom/ns#' term='longhorn'/><category scheme='http://www.blogger.com/atom/ns#' term='tse'/><title type='text'>Windows  Server Core desktop equals MS-DOS</title><content type='html'>I think I've found a &lt;a href="http://en.wikipedia.org/wiki/File:Windows_2008_Server_Core.png"&gt;desktop&lt;/a&gt; I'm comfortable with at last. I thought I was going to wait for Sam Mitchell to iron out the bugs in &lt;a href="http://linux.semware.com/"&gt;TSE for Linux&lt;/a&gt; and get comfortable with runlevel 3 with TSE, but perhaps I should hang in with &lt;a href="http://www.semware.com/"&gt;TSE 4.40a&lt;/a&gt; and cmd.exe on the default Server Core UI for Longhorn.&lt;br /&gt;&lt;br /&gt;Gnome doesn't float my boat. KDE has never really felt right. OSX brings out the inverted snob in me. Perhaps I never really grew out of MS-DOS and QEdit. I'm using Vista most of the time now but with a cmd.exe box open most of the time and TSE with everything that's important apart from drafting invoices, which doesn't seem right without the latest incarnation of MS Word.&lt;br /&gt;&lt;br /&gt;I'll probably sign up for Windows 7 next week, for the sake of the invoices, but I'd be better off with Longhorn and TSE. I'm a &lt;a href="http://en.wikipedia.org/wiki/Luddite"&gt;Luddite&lt;/a&gt;, aren't I?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7246292214054038590?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7246292214054038590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/windows-server-core-desktop-equals-ms.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7246292214054038590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7246292214054038590'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/windows-server-core-desktop-equals-ms.html' title='Windows  Server Core desktop equals MS-DOS'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-4267016415355863225</id><published>2009-10-19T10:20:00.000-07:00</published><updated>2009-10-19T10:26:25.605-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mt'/><category scheme='http://www.blogger.com/atom/ns#' term='schwartz'/><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='fastcgi'/><title type='text'>If something is timing out in a CGI script, something is wrong</title><content type='html'>A colleague has been setting up FastCGI in MovableType to improve performance in a production site and there have been a number of problems to overcome.&lt;br /&gt;&lt;br /&gt;The first one was related to leaving the FastCGI script idle for a long period of time. I've discussed there problem &lt;a href="http://bimport.blogspot.com/2009/10/mysql-server-has-gone-away.html" title="MySQL has gone away"&gt;here&lt;/a&gt; already.&lt;br /&gt;&lt;br /&gt;The problem we were left with related to publishing relatively complicated index templates in a blog. We had FastCGI set up thus:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;FastCgiIpcDir /tmp/fcgi_ipc/&lt;br /&gt;AddHandler fastcgi-script .fcgi&lt;br /&gt;FastCGIConfig -singleThreshold 4 -idle-timeout 30 -killInterval 300 -maxProcesses 50 -appConnTimeout 0&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We were finding that publishing the index files was taking longer than 30 seconds required by the &lt;code&gt;idle-timeout&lt;/code&gt; parameter, and our FastCGI script was being unceremoniously killed before it had finished its job. &lt;br /&gt;&lt;br /&gt;If you're wondering, the &lt;code&gt;singleThreshold&lt;/code&gt; parameter handles the MySQL "gone away problem" I reported &lt;a href="http://bimport.blogspot.com/2009/10/mysql-server-has-gone-away.html" title="MySQL has gone away"&gt;previously&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So, increase &lt;code&gt;idle-timeout&lt;/code&gt;, duh? &lt;br /&gt;&lt;br /&gt;Well no. 30 seconds is a reasonable timeout for an HTTP request, if a request takes longer than that, there is something wrong. Increase this timeout and you'll move the problem elsewhere; it could be a proxy server returning an HTTP/408 or the browser retporting an error or the user's patience snapping (hitting the F5 refresh in anger).&lt;br /&gt;&lt;br /&gt;In this case the problem was that MT by default uses &lt;i&gt;static publishing&lt;/i&gt;, which is synonymous with &lt;i&gt;manual publishing&lt;/i&gt; (I &lt;a href="http://www.movabletype.org/documentation/administrator/managing-blogs/settings/publishing-settings.html" title="What MT says about publish settings"&gt;think&lt;/a&gt;). With &lt;i&gt;static publishing&lt;/i&gt;, the publishing process is part of the CGI request and the CGI request returns HTML to the browser, when the static content has been rendered to the files. This is convenient in that you know immediately when the publishing is complete, but it is inconvenient in that you have to wait for the publishing to complete and - indeed - that process could take more than 30 seconds.&lt;br /&gt;&lt;br /&gt;In MT 4, &lt;a href="http://www.movabletype.org/documentation/administrator/publishing/publish-queue.html" title="Publish Queue in MovableType"&gt;background publishing was introduced&lt;/a&gt;. When you click your publish button, a request goes into &lt;a href="http://search.cpan.org/%7Ebradfitz/TheSchwartz-1.07/" title="The Schwartz"&gt;TheSchwartz&lt;/a&gt; job queue to be serviced by a background process launched by cron on a Publisher. The Publisher runs run-periodic-tasks every couple of minutes. This way the CGI script does not have to hand around while the heavy-lifting of publishing is processed.&lt;br /&gt;&lt;br /&gt;By making our complex index templates publish &lt;b&gt;via the Publish Queue&lt;/b&gt;, FastCGI manages just fine and the user experience is as it ought to be. Desktop user interfaces are very comfortable with UI threads, making windowing systems responsive. We need to do the same thing with our CGIs with or without AJAX.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-4267016415355863225?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/4267016415355863225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/if-something-is-timing-out-in-cgi.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4267016415355863225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4267016415355863225'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/if-something-is-timing-out-in-cgi.html' title='If something is timing out in a CGI script, something is wrong'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-851598180311688841</id><published>2009-10-12T05:16:00.000-07:00</published><updated>2009-10-12T08:44:19.714-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='shell scripting'/><title type='text'>Aliases with parameters in C-shell</title><content type='html'>A colleague was doing battle with an alias on his Mac. It was a relatively complicated alias, which needed a command line parameter.&lt;br /&gt;&lt;br /&gt;As a C-shell ignoramus, I proposed a solution using a BASH function. The Bourne Again mantra is that aliases are out and shell functions are in.&lt;br /&gt;&lt;br /&gt;Unfortunately, Mac users expect to use tcsh (a flavour of C-shell) as their shell and therefore do not have support for shell functions. I wanted to tell him to use a script, but that seemed like defeat, so I dug around with Google for a bit and came across &lt;a href="http://my.brandeis.edu/bboard/q-and-a-fetch-msg?msg_id=0001iO"&gt;this post&lt;/a&gt;. Lo and behold C-shell aliases do have support for command line parameters. The first parameter is: &lt;code&gt;\!^&lt;/code&gt;. Obvious, huh? I don't think so.&lt;br /&gt;&lt;br /&gt;I'm going to stick to BASH. C-shell &lt;a href="http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/"&gt;looks like a can of worms&lt;/a&gt; for the uninitiated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-851598180311688841?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/851598180311688841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/aliases-with-parameters-in-c-shell.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/851598180311688841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/851598180311688841'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/aliases-with-parameters-in-c-shell.html' title='Aliases with parameters in C-shell'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-4761320303229275641</id><published>2009-10-12T04:10:00.000-07:00</published><updated>2009-10-12T04:37:06.247-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c++'/><category scheme='http://www.blogger.com/atom/ns#' term='iostreams'/><category scheme='http://www.blogger.com/atom/ns#' term='exceptions'/><title type='text'>Exceptions in IOStreams</title><content type='html'>A former colleague and good friend was using the following "fast copy" in C++:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#include &amp;lt;algorithm&amp;gt;&lt;br /&gt;#include &amp;lt;fstream&amp;gt;&lt;br /&gt;#include &amp;lt;iterator&amp;gt;&lt;br /&gt;#include &amp;lt;iostream&amp;gt;&lt;br /&gt;&lt;br /&gt;int main(int argc, char* argv[]) {&lt;br /&gt;&lt;br /&gt;    std::ifstream fin(argv[1]);&lt;br /&gt;    if (!fin) &lt;br /&gt;        return 1;&lt;br /&gt;&lt;br /&gt;    std::ofstream fout(argv[2]);&lt;br /&gt;    if (!fout) &lt;br /&gt;        return 2;&lt;br /&gt;&lt;br /&gt;    std::istreambuf_iterator&amp;lt;char&amp;gt; iitr(fin), end;&lt;br /&gt;    std::ostreambuf_iterator&amp;lt;char&amp;gt; oitr(fout);&lt;br /&gt;&lt;br /&gt;    std::copy(iitr, end, oitr);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;He was using this to copy a file to a memory stick and was irked that when the stick was full, the copy loop carried on failing to copy right the way to the end of the input; it continues blithely along.&lt;br /&gt;&lt;br /&gt;It is only when you think of the likely copy algorithm implementation that you can see the problem  with this approach.&lt;br /&gt;&lt;br /&gt;The implementation is likely to expand to something like the following:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;while (iitr != end)&lt;br /&gt;           *oitr++ = *iitr++;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;There is no provision for the &lt;b&gt;&lt;i&gt;exceptional circumstance&lt;/i&gt;&lt;/b&gt; that the destination iterator fails, because the loop condition simply depends in the source iterator.&lt;br /&gt;&lt;br /&gt;So, why doesn't it break out of the copy implementation loop with an &lt;b&gt;exception&lt;/b&gt;?&lt;br /&gt;&lt;br /&gt;By default IOStreams do not throw exceptions, so the first thing you need to do is to enable them if you want them. &lt;br /&gt;&lt;br /&gt;Here we enable exception-throwing on the whole gamut of IO states, reading from an input file:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#include &amp;lt;iostream&amp;gt;&lt;br /&gt;#include &amp;lt;fstream&amp;gt;&lt;br /&gt;#include &amp;lt;iterator&amp;gt;&lt;br /&gt;using namespace std;&lt;br /&gt;&lt;br /&gt;int main()&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;     try {&lt;br /&gt;           ifstream file;&lt;br /&gt;           //ios_base::iostate old_flags = file.exceptions();&lt;br /&gt;           file.exceptions(ios_base::eofbit | ios_base::badbit | ios_base::failbit | ios_base::goodbit);&lt;br /&gt;           file.open("test.txt");&lt;br /&gt;           cerr &amp;lt;&amp;lt; "Getting... ";&lt;br /&gt;           istream&lt;span style="color: red;"&gt;buf&lt;/span&gt;_iterator&amp;lt;char&amp;gt; iitr(file), end;&lt;br /&gt;           while (iitr != end)&lt;br /&gt;                   cerr &amp;lt;&amp;lt; *iitr++ &amp;lt;&amp;lt; ',';&lt;br /&gt;           file.close();&lt;br /&gt;           cerr &amp;lt;&amp;lt; "\nExpect to see this, because no exception is thrown\n";&lt;br /&gt;     }&lt;br /&gt;     catch(ios_base::failure&amp;amp; exc) {&lt;br /&gt;           cerr &amp;lt;&amp;lt; '\n' &amp;lt;&amp;lt; exc.what() &amp;lt;&amp;lt; endl;&lt;br /&gt;     }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The &lt;acronym title="End of File"&gt;EOF&lt;/acronym&gt; condition does &lt;b&gt;not&lt;/b&gt; throw, however, because we've been using the stream buffer directly. When you work directly with the stream buffers, you get fast and crude performance. The iterator, works at a lower level than the istream object, and as a consequence no exceptions get thrown when it hits EOF.&lt;br /&gt;&lt;br /&gt;You can iterate with the istream object, however, if you use the higher level &lt;b&gt;istream_iterator&amp;lt;char&amp;gt;&lt;/b&gt; instead of its more fashionable (?) lower level stream buffer counterpart &lt;b&gt;istreambuf_iterator&amp;lt;char&amp;gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;This throws:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#include &amp;lt;iostream&amp;gt;&lt;br /&gt;#include &amp;lt;fstream&amp;gt;&lt;br /&gt;#include &amp;lt;iterator&amp;gt;&lt;br /&gt;using namespace std;&lt;br /&gt;&lt;br /&gt;int main()&lt;br /&gt;{&lt;br /&gt;     try {&lt;br /&gt;           ifstream file;&lt;br /&gt;           //ios_base::iostate old_flags = file.exceptions();&lt;br /&gt;           file.exceptions(ios_base::eofbit | ios_base::badbit | ios_base::failbit | ios_base::goodbit);&lt;br /&gt;           file.open("test.txt");&lt;br /&gt;           cerr &amp;lt;&amp;lt; "Getting... ";&lt;br /&gt;           istream_iterator&amp;lt;char&amp;gt; iitr(file), end;&lt;br /&gt;           while (iitr != end)&lt;br /&gt;                   cerr &amp;lt;&amp;lt; *iitr++ &amp;lt;&amp;lt; ',';&lt;br /&gt;           cerr &amp;lt;&amp;lt; "\nDon't expect to see this, because an EOF exception will be thrown\n";&lt;br /&gt;           file.close();&lt;br /&gt;     }&lt;br /&gt;     catch(ios_base::failure&amp;amp; exc) {&lt;br /&gt;           cerr &amp;lt;&amp;lt; '\n' &amp;lt;&amp;lt; exc.what() &amp;lt;&amp;lt; endl;&lt;br /&gt;     }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So, what my colleague needs to have an IO failure in the output throw is the following:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#include &amp;lt;algorithm&amp;gt;&lt;br /&gt;#include &amp;lt;fstream&amp;gt;&lt;br /&gt;#include &amp;lt;iterator&amp;gt;&lt;br /&gt;#include &amp;lt;iostream&amp;gt;&lt;br /&gt;&lt;br /&gt;int main(int argc, char* argv[]) {&lt;br /&gt;    std::ifstream fin(argv[1]);&lt;br /&gt;    if (!fin) &lt;br /&gt;        return 1;&lt;br /&gt;&lt;br /&gt;    std::ofstream fout(argv[2]);&lt;br /&gt;    if (!fout) &lt;br /&gt;        return 2;&lt;br /&gt;&lt;br /&gt;    &lt;span style="color: green"&gt;// Throw as soon as it is unable to write to the memory stick&lt;/span&gt;&lt;br /&gt;    fout.exceptions(std::ios_base::badbit);&lt;br /&gt;&lt;br /&gt;    &lt;span style="color: green"&gt;// Work with stream iterators (not buffer iterators) when you want to catch exceptions.&lt;br /&gt;    // Where you won't want to catch exceptions, use the faster stream buffer iterators.&lt;/span&gt;&lt;br /&gt;    std::istream&lt;span style="color: red"&gt;buf&lt;/span&gt;_iterator&amp;lt;char&amp;gt; iitr(fin), end;&lt;br /&gt;    std::ostream_iterator&amp;lt;char&amp;gt; oitr(fout);&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    try {&lt;br /&gt;        std::copy(iitr, end, oitr);&lt;br /&gt;    }&lt;br /&gt;    catch (std::ios_base::failure&amp;amp; exc) {&lt;br /&gt;        std::cerr &amp;lt;&amp;lt; '\n' &amp;lt;&amp;lt; exc.what() &amp;lt;&amp;lt; std::endl;&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Going at a lower level than istream is not a panacea for performance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-4761320303229275641?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/4761320303229275641/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/exceptions-in-iostreams.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4761320303229275641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4761320303229275641'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/exceptions-in-iostreams.html' title='Exceptions in IOStreams'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-634176984145630737</id><published>2009-10-09T11:04:00.000-07:00</published><updated>2009-10-09T11:04:52.448-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mt'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='fastcgi'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><category scheme='http://www.blogger.com/atom/ns#' term='persistent connection'/><category scheme='http://www.blogger.com/atom/ns#' term='jdbc'/><category scheme='http://www.blogger.com/atom/ns#' term='connection pool'/><title type='text'>MySQL "server has gone away"</title><content type='html'>I've been bitten by &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/gone-away.html"&gt;MySQL connection has gone away&lt;/a&gt; problems before. Last time it was a &lt;acronym title="Java Database Connectivity"&gt;JDBC&lt;/acronym&gt; connection pool in a Java daemon application that maintained connections to a MySQL server and/or a persistent connection in a PHP script in the same project. This time it was a &lt;a href="http://www.fastcgi.com/drupal/"&gt;FastCGI&lt;/a&gt; script for &lt;a href="http://www.movabletype.org/"&gt;MovableType&lt;/a&gt; that was reported to be bombing out with an HTTP/500 server error every so often.&lt;br /&gt;&lt;br /&gt;The problem in every case was that I was working with a system that was primed for high load, but it was encountering periods of sustained inactivity beyond - incredible though it seems - the 8 hour timeout, after which MySQL rejects idle connections.&lt;br /&gt;&lt;br /&gt;The fix for a Perl script was not obvious. If you look through the &lt;a href="http://www.fastcgi.com/mod_fastcgi/docs/mod_fastcgi.html#FastCgiConfig"&gt;FastCGI reference&lt;/a&gt;, there isn't an obvious parameter to ensure that the FastCGI is killed after a period of inactivity. The trick is to get the FastCGI script to restart regardless of the fact that there has been no apparent failure. That counts out the &lt;b&gt;-restart&lt;/b&gt; parameter. It looks like we need to set &lt;b&gt;-singleThreshold&lt;/b&gt; to be something more than 0, which keeps an idle instance running ad infinitum, and less than 100, which presumably causes the FastCGI processes to be dropped immediately.&lt;br /&gt;&lt;br /&gt;If that is right it is much easier than the heart-beat fix we needed to apply to the Java application to keep the connection alive. &lt;i&gt;To be on the safe side, we should timeout before the &lt;a href="http://www.unixguide.net/network/socketfaq/4.7.shtml"&gt;SO_KEEPALIVE&lt;/a&gt;, which is likely to be 2 hours for a TCP Socket; though we are dealing with a Unix Domain socket, so that shouldn't apply to out FastCGI Perl script.&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-634176984145630737?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/634176984145630737/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/mysql-server-has-gone-away.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/634176984145630737'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/634176984145630737'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/mysql-server-has-gone-away.html' title='MySQL &quot;server has gone away&quot;'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-114252213015139405</id><published>2009-10-09T08:32:00.000-07:00</published><updated>2009-10-12T05:02:24.663-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='firefox'/><category scheme='http://www.blogger.com/atom/ns#' term='firewall'/><category scheme='http://www.blogger.com/atom/ns#' term='gzip'/><category scheme='http://www.blogger.com/atom/ns#' term='yslow'/><category scheme='http://www.blogger.com/atom/ns#' term='cloudfront'/><category scheme='http://www.blogger.com/atom/ns#' term='s3'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><title type='text'>YSlow compromise</title><content type='html'>Firefox has a great add-in for web developers by Yahoo called &lt;a href="http://developer.yahoo.com/yslow/"&gt;YSlow&lt;/a&gt;. This makes you confront all of the bad things, which hand-on-heart you know you were doing wrongly in your web page by grading you. No one likes an E/F grade, do they?&lt;br /&gt;&lt;br /&gt;The trouble with YSlow is that you soon come to realise that there are real world compromises you need to make - e.g. putting your static content onto an affordable &lt;acronym title="Content Delivery Network"&gt;CDN&lt;/acronym&gt; and having it served gzip-compressed to those who can accept it.&lt;br /&gt;&lt;br /&gt;If your CDN is Amazon Cloudfront, which puts &lt;a href="http://aws.amazon.com/s3/"&gt;S3&lt;/a&gt; onto edge servers, you can do &lt;a href="http://devblog.famundo.com/articles/2007/03/02/serving-compressed-content-from-amazons-s3"&gt;clever things with individual files&lt;/a&gt; to get&amp;nbsp; them to be delivered gzip-compressed, but you have to get very clever if you are going to handle files, which need to be served clients that do not &lt;a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3"&gt;accept gzip encoding&lt;/a&gt;. Since &lt;a href="http://schroepl.net/projekte/mod_gzip/browser.htm"&gt;most modern browsers accept gzip encoding&lt;/a&gt;, I wonder if it is &lt;b&gt;really &lt;/b&gt;worth making special provision for those that don't?&lt;br /&gt;&lt;br /&gt;I'd find it hard to argue the case for going down that loathsome site &lt;a href="http://www.anybrowser.org/campaign/"&gt;&lt;i&gt;best viewed with&lt;/i&gt;&lt;/a&gt; path just for the sake of gzip optimisation, but aren't we all demanding modern standards-compliant browsers anyhow, so can we ignore the old ones? Can I stick my neck out and safely compress all the static content without consulting the accept header?&lt;br /&gt;&lt;br /&gt;Probably &lt;a href="http://developer.yahoo.com/performance/rules.html#gzip"&gt;yes&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Personal firewall software like &lt;a href="http://www.zonealarm.com/"&gt;Zone Alarm&lt;/a&gt; &lt;a href="http://schroepl.net/projekte/mod_gzip/firewalls.htm"&gt;clobber the accept header&lt;/a&gt;. However, I &lt;i&gt;believe&lt;/i&gt; &lt;i&gt;[Warning: Not tested!]&lt;/i&gt; they relay the gzip-compressed content nevertheless. I've not tested this to confirm, but I'm inclined to think that nowadays we can gzip and improve the experience of the masses at the expense of the few, delivering it on cheap edge servers via a CDN.&lt;br /&gt;&lt;br /&gt;I wonder if we'll be penalised by Search Engines for adopting this approach? Googlebot requests with &lt;code&gt;Accept-Encoding: gzip,deflate&lt;/code&gt;, so no problems there. It looks like &lt;a href="http://www.bing.com/community/blogs/webmaster/archive/2008/02/12/announcing-crawling-improvements-for-live-search.aspx"&gt;Bing does too&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I put a post on &lt;a href="http://www.experts-exchange.com/Web_Development/Miscellaneous/Q_24799648.html"&gt;Experts Exchange&lt;/a&gt;, and Simple CDN was recommended as an affordable alternative to Cloudfront. I probably ought to shop around more, but I want to try to stick with Amazon AWS, if I can, to keep my accounts uncomplicated. &lt;br /&gt;&lt;br /&gt;Sticking with Cloudfront/S3, there is a simple and elegant solution provided in a post &lt;a href="http://developer.amazonwebservices.com/connect/message.jspa?messageID=108103#107834"&gt;here&lt;/a&gt;. Bearing in mind a &lt;a href="http://bimport.blogspot.com/2009/09/what-no-index-page-on-s3.html"&gt;root index page can't be contrived in S3&lt;/a&gt;, we have to host a site index page in &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt; or equivalent. A server-side script or web application, can inspect the "accept encoding" header and redirect to (say) zip.mysite.com if gzip is accepted and flab.mysite.com, if it isn't. The static content is uploaded to each, gzipped on zip and uncompressed on flab. Hopefully, Google will not penalise for duplicate content. &lt;i&gt;I hate having to worry about SEO.&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-114252213015139405?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/114252213015139405/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/yslow-compromise.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/114252213015139405'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/114252213015139405'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/yslow-compromise.html' title='YSlow compromise'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-4998061689822737480</id><published>2009-10-01T05:07:00.000-07:00</published><updated>2009-10-01T06:55:52.434-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='standards'/><category scheme='http://www.blogger.com/atom/ns#' term='psp'/><title type='text'>Standards for PSPs</title><content type='html'>I'm pitching for a job where a web site provides the front for a product that is going to have partner-distributors in a load of different countries. When someone buys a product from the site, they do so from their local distributor in a local currency with the distributor's shipping fees and price.&lt;br /&gt;&lt;br /&gt;As each new distributor comes on board, he takes a region from the global distributor and then handles orders for all countries within his allocated region. Credit card payments go directly to the distributor's bank account. The global distributor is not in the loop.&lt;br /&gt;&lt;br /&gt;To achieve this, we need to handle whatever &lt;a href="http://en.wikipedia.org/wiki/Payment_service_provider"&gt;&lt;acronym title="Payment Service Provider"&gt;PSP&lt;/acronym&gt;&lt;/a&gt; the partner is able to work with. The partner needs to have an acquiring bank account, which the PSP is willing and able to play ball with in a cost-effective way. Because this has to do with money movements, there is lots of haggling in the set-up of merchant account, what might look like a promising nearly globally applicable PSP turns out to be an expensive option.&lt;br /&gt;&lt;br /&gt;Here in the UK, if (say) you have a &lt;a href="http://www.natwest.com/"&gt;NatWest&lt;/a&gt;/&lt;a href="http://www.rbs.co.uk/"&gt;RBS&lt;/a&gt; &lt;a href="http://www.streamline.com/"&gt;Streamline&lt;/a&gt; acquiring account, your main PSP options are &lt;a href="http://www.sagepay.com/developers.asp"&gt;Sage Pay&lt;/a&gt; (formerly known as Protx), which is developer-friendly, or you the RBS group's own &lt;a href="http://www.rbsworldpay.com/support/index.php"&gt;WorldPay&lt;/a&gt;, which is less so. PSPs naturally prefer to work with off-the-shelf shopping basket software presumably to avoid the hassle of dealing with developers for bespoke solutions. Off-the-shelf shopping basket software appear to be restricted to certain PSPs. The lazy developer is tempted to be drawn into a PSP lock-in. You soon find, though, that Sage Pay can only be used for UK acquiring accounts and WorldPay is really only &lt;a href="http://www.tecknashop.com/worldpay-online-merchant-account.htm"&gt;competitive&lt;/a&gt; for UK acquiring accounts. I suspect that rates change and it is wise to be flexible, because partners may change their PSP because they change their acquiring bank. I'm faced with the prospect of dealing with more PSPs as partners come on board. I expect they are going to be local offerings in different countries.&lt;br /&gt;&lt;br /&gt;It would be all well and good if there was a &lt;b&gt;standard &lt;/b&gt;for interfacing with PSPs. &lt;br /&gt;&lt;br /&gt;There are a number of different ways to work with PSPs:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;You can do as much as possible on your site, including collecting the credit card details. The PSP is completely white-labeled, because you pass all the details it needs to work with via an API from your web site. I've done this sort of thing in the past with Verisign/Paypal's &lt;a href="https://www.paypal.com/cgi-bin/webscr?cmd=_payflow-pro-overview-outside"&gt;PayFlow Pro&lt;/a&gt;, and it isn't much fun; you need to have a secure site for handling those credit card numbers and you also have to get with it with &lt;a href="https://www.pcisecuritystandards.org/"&gt;PCI DSS&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;You can set up the order, billing address and shipment address on your site and then pass those details and the buyer to the PSP to handle credit card input and payment. I like this approach, because the buyer needs only to trust the PSP with his credit card details. I am expecting to go this route, if the PSPs all play ball.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;You can have the PSP handle the billing address and delivery address too. I am less keen on this approach, because there is less flexibility.&lt;/li&gt;&lt;/ul&gt;I'd like to see these approaches standardised, so that the same calling parameters can be used for all PSPs and the results are formatted in a standard way. Without standardisation and with the prickliness of dealing with financial institutions, the PSP integration is really irksome. I'd like to be able to visit the PSP site and see that it supports a standard interface for the second approach and not have to go through the rigmarole of applying for a test account with that PSP.&lt;br /&gt;&lt;br /&gt;It should be possible, shouldn't it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-4998061689822737480?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/4998061689822737480/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/10/standards-for-psps.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4998061689822737480'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4998061689822737480'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/10/standards-for-psps.html' title='Standards for PSPs'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-8287812761140128569</id><published>2009-09-20T02:59:00.000-07:00</published><updated>2009-09-20T02:59:53.450-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c++0x'/><category scheme='http://www.blogger.com/atom/ns#' term='c++'/><title type='text'>Uniform initialisation in C++</title><content type='html'>&lt;a href="http://en.wikipedia.org/wiki/C%2B%2B0x#Uniform_initialization"&gt;Uniform initialization&lt;/a&gt; (to use the American spelling), is sure to make you start noticing initializer lists, when your bleeding edge colleagues start adopting some of the goodies in the next incarnation of the C++ standard.&lt;br /&gt;&lt;br /&gt;The C++0x standard will have us initialising a vector of strings with an initializer list, using one of the following constructs:&lt;br /&gt;&lt;br /&gt;&lt;pre class="de1"&gt;std::vector&amp;lt;std::string&amp;gt; v = { "alpha", "beta", "gamma" };&lt;br /&gt;std::vector&amp;lt;std::string&amp;gt; v({ "alpha", "beta", "gamma" }); &lt;br /&gt;std::vector&amp;lt;std::string&amp;gt; v{ "alpha", "beta", "gamma" };&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I &lt;b&gt;like&lt;/b&gt; the first form, because it looks like array initialisation and {} alreadylooks like a &lt;i&gt;std::initializer_list&amp;lt;&amp;gt;&lt;/i&gt; to me. I'm comfortable with the &lt;a href="http://en.wikipedia.org/wiki/Return_value_optimization#Other_forms_of_copy_elision"&gt;expectation&lt;/a&gt; that my compiler will not construct a temporary vector and copy construct a vector or worse still default construct and assign a vector from that temporary. I expect elision.&lt;br /&gt;&lt;br /&gt;The second form is more explicit: you &lt;i&gt;know&lt;/i&gt; that it is constructing a &lt;i&gt;vector&amp;lt;&amp;gt;&lt;/i&gt; from a &lt;i&gt;std::initializer_list&amp;lt;&amp;gt;&lt;/i&gt;, but the first is more legible, because there is less puctuation.&lt;br /&gt;&lt;br /&gt;It is the third form that is tough for a change-adverse C++ veteran to swallow, because it looks like a class declaration or a scope gone wrong. Thankfully regular functions, which take initializer list parameters do not have an equivalent of the third form, so why have it for an object construction?&lt;br /&gt;&lt;br /&gt;Presumably because we should not expect the compiler to elide the assignment / copy construction, the second form is preferred over the first by the followers of Scott Meyers, who famously said we should &lt;i&gt;"Prefer initialization to assignment in constructors"&lt;/i&gt;, and the third form was therefore there to stop the next generation of C++ from coming to look like punctuation gibber.&lt;br /&gt;&lt;br /&gt;Sorry Scott, but I &lt;i&gt;prefer assignment in constructors&lt;/i&gt;. Nevertheless, I'll come round to the third form, as long as I'm not task-switching between C++ and Java projects. Undeniably, initializer lists are a good thing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-8287812761140128569?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/8287812761140128569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/uniform-initialisation-in-c.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/8287812761140128569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/8287812761140128569'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/uniform-initialisation-in-c.html' title='Uniform initialisation in C++'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-367302085945807226</id><published>2009-09-18T12:40:00.000-07:00</published><updated>2009-09-18T12:40:18.493-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cloudfront'/><category scheme='http://www.blogger.com/atom/ns#' term='s3'/><title type='text'>What no index page on S3?</title><content type='html'>If you look at my drab company home page in Cloudfront, you have to look at &lt;a href="http://cdn.seseit.co.uk/index.html"&gt;http://cdn.seseit.co.uk/index.html&lt;/a&gt; because &lt;a href="http://cdn.seseit.co.uk/"&gt;http://cdn.seseit.co.uk&lt;/a&gt;/ gives you an S3 bucket listing, when you make its ACL public read.&lt;br /&gt;&lt;br /&gt;S3 users have been &lt;a href="http://go.joemoreno.com/esw2"&gt;crying out&lt;/a&gt; for a solution for this for some time now. I should add my voice.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-367302085945807226?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/367302085945807226/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/what-no-index-page-on-s3.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/367302085945807226'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/367302085945807226'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/what-no-index-page-on-s3.html' title='What no index page on S3?'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-4931650388789923105</id><published>2009-09-18T07:51:00.000-07:00</published><updated>2009-09-18T07:52:37.784-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ebs'/><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='s3'/><category scheme='http://www.blogger.com/atom/ns#' term='ec2'/><category scheme='http://www.blogger.com/atom/ns#' term='ext3'/><title type='text'>Scaling out copying ephemeral storage</title><content type='html'>I'm still on my static publishing theme.&lt;br /&gt;&lt;br /&gt;The situation is this:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I have a Page Server in &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt; which is reasonably up to date and is updating itself all the time from S3.&amp;nbsp;&lt;/li&gt;&lt;li&gt;It may be one of several Page Servers in varying states of up-to-date-ness, using the trick I outlined in my &lt;a href="http://bimport.blogspot.com/2009/09/my-simpledb-binlog-for-static.html"&gt;last post&lt;/a&gt; to update their content from pages published by Publishers.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page Servers serves static-ish content, by which I mean HTML with SSI and PHP. That's why the content isn't simply in &lt;a href="http://aws.amazon.com/s3/"&gt;S3&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;I want to instantiate another Page Server, starting with a snapshot taken of the block storage used for the docroot. That way the new Page Server will be quick to catch up with the others, lagging behind only by the length of time it takes to instantiate and initialise the new EC2 instance.&lt;/li&gt;&lt;li&gt;I'm not using &lt;a href="http://aws.amazon.com/ebs/"&gt;EBS&lt;/a&gt; for the docroot. I am saving money, using ephemeral storage, accessed via /mnt.&lt;/li&gt;&lt;/ul&gt;So how do I get a copy of [most of an] an existing /mnt onto a new instance &lt;i&gt;without taking the Page Server off line for reads by Apache&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Here's how:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;First find out how much data there is to copy over e.g.:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the running instance&lt;br /&gt;rainbow1-ec2:~# df -h /mnt&lt;br /&gt;Filesystem            Size  Used Avail Use% Mounted on&lt;br /&gt;/dev/sda2             147G   48G   92G  35% /mnt&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;With (say) &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=609"&gt;Elasticfox&lt;/a&gt; (or &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=351"&gt;EC2 command line tools&lt;/a&gt;), you will need to create an EBS volume which is a bit bigger than this to allow a backup/restore file to be copied onto it. This file will include some additional meta data. We might use a 50G volume in the example shown.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Attach the EBS volume to (say) /dev/sdp and create a file system on it:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the running instance&lt;br /&gt;rainbow1-ec2:~# mkfs.ext3 /dev/sdp&lt;br /&gt;rainbow1-ec2:~# mkdir -m 000 /mnt&lt;br /&gt;rainbow1-ec2:~# mount -t ext3 /dev/sdp /mnt2&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Stop data from being written to /mnt and flush writes. In my static publishing design, the writers are replication processes, running on the Page Servers copying data from S3. Having stopped the writers,&amp;nbsp; you call &lt;a href="http://linux.die.net/man/8/sync"&gt;sync&lt;/a&gt;. Then any cached writes will be flushed to disk and the disk will be ready for a low-level copy or backup:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the running instance&lt;br /&gt;rainbow1-ec2:~# killall myreplicationprocesses&lt;br /&gt;rainbow1-ec2:~# sync&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;You can stop accidental writes to /mnt, by making is temporarily read-only: &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;rainbow1-ec2:~# mount -o remount -o ro /dev/sda2&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;At this point, Apache should be able to read from the file system, but nothing should be writing to it. Any temporary directories should be kept out of /mnt for this approach to work.&lt;/li&gt;&lt;li&gt;We can now create a "backup" of /mnt onto the EBS volume: &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the running instance&lt;br /&gt;rainbow1-ec2:~# dump -0 /dev/sda2 -f /mnt2/dump.dat&lt;br /&gt;&lt;br /&gt;# On the new instance&lt;br /&gt;rainbow2-ec2:~# mkdir -m 000 /mnt2&lt;br /&gt;rainbow2-ec2:~# mount -t ext3 /dev/sdp /mnt2&lt;br /&gt;rainbow2-ec2:~# cd /mnt&lt;br /&gt;rainbow2-ec2:~# restore -rf /mnt2/dump.dat&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The running instance should be returned to read+write and the replication processes can be started again now: &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the running instance&lt;br /&gt;mount -o remount -o rw /dev/sda2&lt;br /&gt;rainbow1-ec2:~# start.myreplicationprocesses &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The EBS volume should be unmounted before being detached, using Elasticfox or equivalent: &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the running instance&lt;br /&gt;rainbow1-ec2:~# umount /mnt2&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The EBS volume should be attached now to the new instance. This is done with Elasticfox or equivalent. Let's say we use /dev/sdp again: &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the new instance&lt;br /&gt;rainbow2-ec2:~# mkdir -m 000 /mnt2&lt;br /&gt;rainbow2-ec2:~# mount -t ext3 /dev/sdp /mnt2&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The directory structure can then be "restored" onto the new instance before un-mounting, detaching and deleting the EBS volume: &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="bbcodeblock" dir="ltr" style="border: 1px inset; margin: 0px -99999px 0px 0px; overflow: auto; padding: 3px; text-align: left; width: 98%;"&gt;# On the new instance&lt;br /&gt;rainbow2-ec2:~# cd /mnt&lt;br /&gt;rainbow2-ec2:~# restore -rf /mnt2/dump.dat&lt;br /&gt;rainbow2-ec2:~# umount /mnt2&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;If we are trying to save money, you may wonder why we don't go via S3 rather than the temporary EBS volume. The answer is that we may not have enough space on the root file system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-4931650388789923105?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/4931650388789923105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/scaling-out-copying-ephemeral-storage.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4931650388789923105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/4931650388789923105'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/scaling-out-copying-ephemeral-storage.html' title='Scaling out copying ephemeral storage'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7854269520586850568</id><published>2009-09-18T03:16:00.000-07:00</published><updated>2009-09-18T03:31:00.451-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='simpledb'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='ec2'/><category scheme='http://www.blogger.com/atom/ns#' term='sqs'/><category scheme='http://www.blogger.com/atom/ns#' term='binlog'/><title type='text'>My SimpleDB binlog for static publishing</title><content type='html'>If you've come here expecting some sort of &lt;a href="http://aws.amazon.com/simpledb/"&gt;SimpleDB&lt;/a&gt; &lt;i&gt;binlog&lt;/i&gt; to fulfill a MySQL-ish need like replication or rollback, you've come to the wrong place.&lt;br /&gt;&lt;br /&gt;What I'm trying to achieve was outlined in my &lt;a href="http://bimport.blogspot.com/2009/09/no-nas-in-ec2.html"&gt;previous post&lt;/a&gt;. I want to publish PHP or HTML-that-uses-SSI in S3 and have that pulled onto Page Servers into ephemeral storage &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt;, by which I mean the /mnt directory so that pages can be served from the local file system. I've not junked the idea of using an &lt;a href="http://aws.amazon.com/ebs/"&gt;EBS&lt;/a&gt; volume, but /mnt makes sense as long as I can initialise new instances when I scale out in a timely manner.&lt;br /&gt;&lt;br /&gt;I want Page Servers to pull content from S3 that they need to catch up with. I'm calling the replication list a &lt;i&gt;binlog&lt;/i&gt;, because it is analogous to MySQL replication, when slaves need to catch up with the master by going through the master's &lt;i&gt;binlog&lt;/i&gt; from the last known master status. No doubt this approach is used in a zillion other scenarios in CS and no doubt there is a better way to refer to this mechanism, but MySQL is a common shared experience, so let's use that as the analogy and call the list a &lt;i&gt;binlog&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Here's what I need to achieve:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A Publisher (one of several masters) publishes a file and copies it to the &lt;i&gt;safe storage&lt;/i&gt; of S3.&lt;/li&gt;&lt;li&gt;It then updates the &lt;i&gt;binlog&lt;/i&gt; so that a Page Server can pull the content off S3 and store it in local ephemeral storage in its docroot in /mnt.&lt;/li&gt;&lt;li&gt;Each Page Server has a concept of something like a &lt;i&gt;master status &lt;/i&gt;in the &lt;i&gt;binlog&lt;/i&gt; and knows where to pick up from in its &lt;i&gt;replication process&lt;/i&gt;.&lt;/li&gt;&lt;/ol&gt;There is an improvement, which I'd like to achieve in that a file - synonymous with a S3 object key - which appears twice in the &lt;i&gt;binlog&lt;/i&gt; is only fetched once. Confused? Consider a page which is published and republished before a Page Server can fetch it. You only want to fetch it once, but Publisher(s) will have put it into the &lt;i&gt;binlog&lt;/i&gt; twice in the interim.&lt;br /&gt;&lt;br /&gt;You might be wondering why SimpleDB and why not &lt;a href="http://aws.amazon.com/sqs/"&gt;SQS&lt;/a&gt;. SQS is fine for &lt;a href="http://en.wikipedia.org/wiki/FIFO_%28computing%29"&gt;FIFO&lt;/a&gt; messaging, but doesn't handle different readers having different positions in the queue - to do that the &lt;a href="http://docs.amazonwebservices.com/AWSSimpleQueueService/2009-02-01/SQSGettingStartedGuide/index.html?ReceiveMessage.html"&gt;ReceiveMessage&lt;/a&gt; would have to be able to support a request indicating where to pick up from, which it does not per the WSDL 2009-02-01.&lt;br /&gt;&lt;br /&gt;So time for another back-of-an-envelope design:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We create a domain for the application's &lt;i&gt;binlog&lt;/i&gt;&lt;/li&gt;&lt;li&gt;We implement a non-quite unique auto-increment&lt;i&gt; &lt;/i&gt;in a unique &lt;i&gt;item+attribute&lt;/i&gt;, with keyname called &lt;b&gt;index&lt;/b&gt; and attribute called &lt;b&gt;index&lt;/b&gt;&lt;i&gt;.&lt;/i&gt; Every time this gets incremented the attribute is &lt;i&gt;replaced&lt;/i&gt;.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Typically there will be one &lt;i&gt;binlog&lt;/i&gt; entry per index value stored in an &lt;i&gt;item&lt;/i&gt;, whose name corresponds to the &lt;b&gt;index&lt;/b&gt; value at the time the entry was added&lt;i&gt;.&lt;/i&gt;&lt;/li&gt;&lt;li&gt;We expect, however, that there will be &lt;i&gt;items&lt;/i&gt; with multiple &lt;i&gt;binlog&lt;/i&gt; entries, because SimpleDB's &lt;span id="goog_1253205607076"&gt;&lt;/span&gt;&lt;a href="http://www.blogger.com/"&gt;&lt;/a&gt;&lt;a href="http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/index.html?EventualConsistencySummary.html"&gt;eventual consistency&lt;/a&gt;&lt;span id="goog_1253205607077"&gt;&lt;/span&gt; dicates that an unique auto-increment implementation equivalent to a primary index is impracticable.&lt;/li&gt;&lt;li&gt;&lt;i&gt;Binlog&lt;/i&gt; entries have an &lt;i&gt;item&lt;/i&gt; name that corresponds to the &lt;b&gt;index&lt;/b&gt;&lt;i&gt; &lt;/i&gt;and have a non-unique &lt;i&gt;attribute&lt;/i&gt; named "s3-key", corresponding to the S3 bucket object key for the file to be replicated. That keyname corresponds to the docroot path of the published file.&lt;/li&gt;&lt;li&gt;Page Servers must be able to handle &lt;i&gt;items&lt;/i&gt; with multiple "s3-key" &lt;i&gt;attributes&lt;/i&gt;.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;This falls within SimpleDB &lt;a href="http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/index.html?SDBLimits.html"&gt;limits&lt;/a&gt;, as long as we do not wind up with greater than 1 billion files in the queue, which would be a sick publishing system! We are using a large number of items per domain and only one attribute per item, unless we want to add diagnostics to our &lt;i&gt;binlog&lt;/i&gt; entries.&lt;br /&gt;&lt;br /&gt;The SimpleDB items will need to be cleaned up, when they are older than (say) 48 hours (TBA). To accomplish this, the SimpleDB domain ought to have some additional house-keeping items for clean-up.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7854269520586850568?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7854269520586850568/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/my-simpledb-binlog-for-static.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7854269520586850568'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7854269520586850568'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/my-simpledb-binlog-for-static.html' title='My SimpleDB binlog for static publishing'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-1298550596101717927</id><published>2009-09-17T09:04:00.000-07:00</published><updated>2009-09-18T11:02:30.508-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mt'/><category scheme='http://www.blogger.com/atom/ns#' term='persistentfs'/><category scheme='http://www.blogger.com/atom/ns#' term='ssi'/><category scheme='http://www.blogger.com/atom/ns#' term='cloudfront'/><category scheme='http://www.blogger.com/atom/ns#' term='nas'/><category scheme='http://www.blogger.com/atom/ns#' term='nfs'/><category scheme='http://www.blogger.com/atom/ns#' term='aws'/><category scheme='http://www.blogger.com/atom/ns#' term='xfs'/><category scheme='http://www.blogger.com/atom/ns#' term='s3'/><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='ec2'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><category scheme='http://www.blogger.com/atom/ns#' term='apache'/><title type='text'>No NAS in EC2</title><content type='html'>My current application is a content publisher. I need to publish static content in Amazon's &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt; cloud. I need to do so in a manner that scales out well.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://aws.amazon.com/"&gt;Amazon&lt;/a&gt;'s preferred handling for static content is &lt;a href="http://aws.amazon.com/s3/"&gt;S3&lt;/a&gt; and, where content is served world-wide, its &lt;a href="http://en.wikipedia.org/wiki/Content_delivery_network"&gt;CDN&lt;/a&gt; &lt;a href="http://aws.amazon.com/cloudfront/"&gt;CloudFront&lt;/a&gt;. I'd be very happy with that approach if I could wean my template designer colleagues from &lt;a href="http://en.wikipedia.org/wiki/Php"&gt;PHP&lt;/a&gt;. As long as the static content isn't &lt;i&gt;really&lt;/i&gt; static, we cannot use S3 for page-serving.&lt;br /&gt;&lt;br /&gt;Let's assume that PHP and &lt;a href="http://en.wikipedia.org/wiki/Server_Side_Includes"&gt;SSI&lt;/a&gt; &lt;b&gt;are&lt;/b&gt; order of the day, and we are therefore obliged to serve not-so-static content from an elastic farm of Apache "Page Servers" (Apache+PHP) in &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt;. &lt;i&gt;Note: S3 and therefore CloudFront does not support &lt;a href="http://www.w3.org/TR/esi-lang" title="Edge Side Includes"&gt;ESI&lt;/a&gt;, unlike Akamai, which is expensive; ESI is a CDN's equivalent of SSI.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The data centre precedent for this requirement is publishing to a shared directory on a fault-tolerant &lt;a href="http://en.wikipedia.org/wiki/Network-attached_storage"&gt;NAS&lt;/a&gt;. That works for PHP or HTML+SSI, but NFS exports aren't easy to contrive in the cloud.&amp;nbsp;In EC2, we don't have a good NAS implementation. You could roll your own NFS export and implement your own fault tolerance, but &lt;i&gt;it is a square peg for a round hole&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Let's do some soul-searching about the need for the NAS. Let's look at our demands, and see if we can wean ourselves off those &lt;a href="http://www.blogger.com/2009/09/nfs-always-bites-you.html"&gt;beloved&lt;/a&gt; &lt;acronym title="Network File System"&gt;NFS&lt;/acronym&gt; shares.&lt;br /&gt;&lt;br /&gt;When you are publishing static content in a site that predominantly serves blog content, you can build the following argument against using an NFS share in favour of duplicating content on all Page Servers: &lt;br /&gt;&lt;ol&gt;&lt;li&gt;Writes are relatively infrequent.&lt;/li&gt;&lt;li&gt;We are like a broadcaster. Our authors have relatively little to say, but we say the same things to everyone. That means that &lt;i&gt;all&lt;/i&gt; content can be expected to be read often from &lt;i&gt;all&lt;/i&gt; page servers.&lt;/li&gt;&lt;li&gt;Local storage is cheap and gives us the most efficient file I/O.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Having copies of a file on all instances of clients is &lt;i&gt;wise&lt;/i&gt; because clients in the cloud are relatively volatile. This gives us redundancy, which is a &lt;i&gt;Good Thing&lt;/i&gt;.&lt;/li&gt;&lt;li&gt;Repeatedly reading the same content from an NFS export (e.g. a &lt;a href="http://en.wikipedia.org/wiki/Network-attached_storage"&gt;NAS&lt;/a&gt;) is an inefficient use of bandwidth, when the content could be local.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;What if all content servers could have their own copy of the content?&lt;br /&gt;&lt;br /&gt;I'd first like to consider using the EC2 ephemeral storage, which EC2 Linux users affectionately know as /mnt. For many purposes has been supplanted by the relatively recent launch of &lt;a href="http://aws.amazon.com/ebs/"&gt;EBS&lt;/a&gt;. I'd then like to consider EBS. &lt;br /&gt;&lt;br /&gt;These scale-out thoughts are not limited to MovableType architecture, but since that's what I have in my text editor for now, lets consider content publishing in MT to give us a context. I want to instantiate as many &lt;a href="http://www.movabletype.org/documentation/enterprise/system-architecture.html"&gt;Page Servers in a MT Advanced Configuration&lt;/a&gt;, as my not-so-static (by which I mean PHP) elastic demand requires. Since I am going to scale out with EC2 instances running Apache+PHP, I can sensibly use and initialise the ephemeral storage which I get with each of these instances. If an instance goes bad, the server and its locally mounted docroot are terminated together. I can also do likewise with a locally attached EBS volume (i.e. in the same availability zone as the EC2 instance).&lt;br /&gt;&lt;br /&gt;Instantiating a Page Server that has its docroot on ephemeral storage (/mnt) goes as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;I instantiate the new EC2 Page Server instance.&lt;/li&gt;&lt;li&gt;I use rsync to synchronise its /mnt docroot with a &lt;i&gt;known working&lt;/i&gt; Page Server instance. &lt;i&gt;&lt;b&gt;This is not all that quick, because it is done at the file system level, synchronising individual files.&lt;/b&gt;&lt;/i&gt;&lt;/li&gt;&lt;li&gt;I add the new Page Server to the load balancer when it is up top date.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;My alternative is to instantiate new instances with the docroot in EBS as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Keep the docroot on an EBS volume formatted with XFS so it can be conveniently frozen, when needs be. There needs to be one EBS volume per Page Server instance.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Freeze, snaphot and unfreeze an EBS volume attached and mounted on a &lt;i&gt;known working&lt;/i&gt; Page Server instance. &lt;i&gt;&lt;b&gt;This is quick, being done at the block device level.&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Create a copy of that volume in another instance of an EBS volume.&lt;/li&gt;&lt;li&gt;Instantiate a new Page Server.&lt;/li&gt;&lt;li&gt;Attach the new EBS volume to the new Page Server instance. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;Attach the EBS volume.&lt;/li&gt;&lt;li&gt;Mount the EBS volume and start Apache on the new instance.&lt;/li&gt;&lt;li&gt;I add the new Page Server to the load balancer when Apache is started.&lt;/li&gt;&lt;/ol&gt;There are virtues in both approaches. EBS seems &lt;i&gt;cooler&lt;/i&gt;, but there is a cost implication going that root. The virtue of block copying EBS volumes versus working with the file system in ephemeral storage depends on how quickly rsync is able to copy the docroot from a &lt;i&gt;known working&lt;/i&gt; Page Server instance and how resource-hungry that process is. The ephemeral storage also needs to be large enough in the required EC2 &lt;a href="http://aws.amazon.com/ec2/instance-types/"&gt;instance type&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In either case, this gives is a Page Server with &lt;i&gt;snapshot&lt;/i&gt; of the docroot which we expect to be &lt;i&gt;somewhat out of date&lt;/i&gt;, by the time the server goes on-line. How can we keep Page Servers docroots synchronised?&lt;br /&gt;&lt;br /&gt;MovableType handles this in the application tier, using the Schwartz publishing queue to send files to Page Servers. At the time of writing there are &lt;a href="http://www.majordojo.com/2009/01/movable-type-system-architectures.php"&gt;some gremlins&lt;/a&gt; in that procedure, but the idea is that a Publisher has the responsibility for copying its published file to all Page Servers.&lt;br /&gt;&lt;br /&gt;There are obstacles that need to be overcome:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;When a Page Server grabs a snapshot, it should register itself as a &lt;a href="http://www.movabletype.org/documentation/appendices/config-directives/synctarget.html"&gt;recipient&lt;/a&gt; for the Publishers, so it can subscribe to new updates. This creates a problem for Publishers, if it is not immediately available for file copy (a problem when the docroot is in an EBS mount rather than ephemeral storage). The registration process itself isn't part of an MT installation - this is a configuration setting requiring - &lt;i&gt;(does it?)&lt;/i&gt; - a restart to take effect. It would need some proper engineering.&lt;/li&gt;&lt;li&gt;When a Page Server becomes unavailable, the source Publisher has to handle timeout etc, before skipping it. Again, in MT this would require some work.&lt;/li&gt;&lt;/ol&gt;I prefer the idea that the Page Server is responsible for fetching its content. This scales better. &lt;br /&gt;&lt;br /&gt;Let's come up with a &lt;i&gt;back-of-an-envelope&lt;/i&gt; design:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The Publisher has pulled off a publish request from the Schwartz &lt;acronym title="Schwartz Publish Queue"&gt;PQ&lt;/acronym&gt;. It has published content to a local file system and is just about to mark the job as &lt;i&gt;done&lt;/i&gt;.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Before disposing of the job, MT currently puts another request into the PQ to get the file RSync'ed, but we know that approach is fragile. Either we need it to put the file somewhere safe and to send a message message all Page Servers to get them to collect it, or we need to make it practicable for the publish request to be reinstated in the Schwartz queue in the event that the Publisher becomes unavailable when a Page Server wants to fetch the page...&amp;nbsp;&lt;il&gt;&amp;nbsp;&lt;i&gt;&lt;/i&gt;&lt;/il&gt;&lt;/li&gt;&lt;li&gt;&lt;il&gt;Page &lt;/il&gt;Servers fetch the file, when they know it has been created or modified.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;I prefer the former approach in 2, using safe storage, because I can see race conditions if things get unstable and re-publish requests are order of the day. In the &lt;acronym title="Amazon Web Services"&gt;AWS&lt;/acronym&gt; world, safe storage means &lt;acronym title="The Simple Storage Service"&gt;S3&lt;/acronym&gt;. If our published content is in S3, it will not need republishing in the event of a Publisher outage.&lt;br /&gt;&lt;br /&gt;Here's a bum steer, but bear with me: &lt;br /&gt;&lt;blockquote&gt;We can mount a &lt;a href="http://www.persistentfs.com/"&gt;PersistentFS&lt;/a&gt; to turn an S3 bucket's objects into a block storage device. This allows one EC2 instance to write to the file system and many to read from it and the I/O is efficient. A Publisher can publish &lt;i&gt;directly&lt;/i&gt; to the PersistentFS. These file systems will come &lt;a href="http://www.persistentfs.com/documentation/Release_Notes"&gt;at a price&lt;/a&gt; in the near future (the only cost is S3's at the time of writing), but this should still be competitive with taking an NFS approach with EBS for persistence.&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;In the &lt;acronym title="Amazon Web Services"&gt;AWS&lt;/acronym&gt; world, "messaging" typically means using &lt;acronym title="The Simple Queue Service"&gt;SQS&lt;/acronym&gt;, but that really only works for 1-to1 messaging. We need something more like a MySQL &lt;i&gt;binlog&lt;/i&gt;, which Page Servers can slave to, where each Publisher is a master with a master status and each Page Server is a slave. Using an approach whichis analgous to  &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/replication.html"&gt;MySQL replication&lt;/a&gt;, we can have Page Servers act as PersistentFS slaves. Every time a Publisher publishes a file, its path is put into the &lt;i&gt;binlog&lt;/i&gt;, which is also on the PeresistentFS filesystem. Page Servers, keep a track of entry number and &lt;i&gt;binlog&lt;/i&gt; file number just like MySQL slaves keep a track of offset and binlog file. Polling the mtime in the binlog is a simple way to see that it has been appended.&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;So, each Publisher should have a PersistentFS to write to. When the file is written, it is reliably written in S3 and a "binlog" is updated. Like MySQL binlogs can only be XX days old maximum and Publishers are responsible for purging old ones. Each Page Server is a read only client, copying files to its ephemeral storage, which it uses as docroot.&lt;br /&gt;&lt;/blockquote&gt;Bah! I didn't see this in the small print:&lt;br /&gt;&lt;blockquote&gt;PersistentFS does not support changing or writing to a file system from one computer while it is mounted read-only on another computer.&lt;br /&gt;&lt;/blockquote&gt;What a shame. It isn't going to be as simple as I hoped. It looks like we cannot use S3 as a block device for this after all. We need to copy published files individually into it and we need to refine the &lt;i&gt;binlog&lt;/i&gt; idea, because that's lost its home now. &lt;br /&gt;Here's my proposed approach (revised):&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Use local ephemeral storage (/mnt) for publishing files. When the file is written, it is written into persistant storage in S3 and a &lt;i&gt;binlog&lt;/i&gt; is updated. Only then is the publishing job marked as &lt;i&gt;done&lt;/i&gt;.&lt;/li&gt;&lt;li&gt;The &lt;i&gt;binlog&lt;/i&gt; is not a file. Rather it is a log record in Amazon's &lt;a href="http://aws.amazon.com/simpledb/"&gt;SimpleDB&lt;/a&gt;. &lt;i&gt;Not sure about the details yet.&lt;/i&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Like MySQL, &lt;i&gt;binlog files&lt;/i&gt; (or SimpleDB &lt;i&gt;values&lt;/i&gt;) can only be XX days old maximum and Publishers are responsible for purging old &lt;i&gt;binlogs&lt;/i&gt;.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page Servers &lt;a href="http://www.persistentfs.com/documentation/TechNotes/Shared_File_System/Read_Only"&gt;&lt;/a&gt;poll the  for &lt;i&gt;binlog&lt;/i&gt; updates. They then walk through the new entries copying the files from S3 to the local ephemeral file system on /mnt.&amp;nbsp;&lt;/li&gt;&lt;/ol&gt;Let's re-visit the argument in favour of NFS: &lt;br /&gt;&lt;ol&gt;&lt;li&gt;NFS is easy to set up. You mount the same docroot in Publishers and Page Servers.&lt;/li&gt;&lt;li&gt;Scaling out is simply a matter of adding read-only Page Servers or read+write Publishers, which point to the same shared directory.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;Now that the PersistentFS idea has been scrapped, neither the Page Servers nor the Publishers need to mount anything more than the usual ephemeral storage. So this should be easier or system administration.&lt;br /&gt;&lt;br /&gt;The only thing that needs to be worked on is the SimpleDB. Conceptually a &lt;i&gt;binlog&lt;/i&gt; does not need to be synonymous with an individual publisher, because all Publishers write to the same S3 bucket and SimpleDB &lt;i&gt;binlog&lt;/i&gt;. We just need to see this working with key-value pairs.&lt;br /&gt;&lt;br /&gt;Ideally the static content that can remain in S3 - content that doesn't use SSI and is not PHP - should remain there. Mixing that content with the content served by the EC2 Page Servers requires some discipline (e.g. a name like static.page-server.website.com mapping to the CloudFront CNAME), but it will result in the best content delivery.&lt;br /&gt;&lt;br /&gt;This is sketchy I know... but it looks like the start of a plan.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-1298550596101717927?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/1298550596101717927/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/no-nas-in-ec2.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1298550596101717927'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1298550596101717927'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/no-nas-in-ec2.html' title='No NAS in EC2'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-1294667641605285033</id><published>2009-09-12T05:35:00.000-07:00</published><updated>2009-09-18T05:40:09.528-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='beanshell'/><category scheme='http://www.blogger.com/atom/ns#' term='upgrade'/><category scheme='http://www.blogger.com/atom/ns#' term='sqs'/><category scheme='http://www.blogger.com/atom/ns#' term='perl'/><category scheme='http://www.blogger.com/atom/ns#' term='wsdl'/><category scheme='http://www.blogger.com/atom/ns#' term='aws'/><title type='text'>SQS WSDL upgrade</title><content type='html'>I was wrongly alarmed that support for the &lt;a href="http://en.wikipedia.org/wiki/Web_Services_Description_Language"&gt;WSDL&lt;/a&gt; implementation, which I am currently depending upon in a web service in distributed environment was at its end of life. The alarm was raised by an e-mail from &lt;acronym title="Amazon Web Services"&gt;AWS&lt;/acronym&gt; reminding me about the end of life of an old version of the WSDL for its &lt;a href="http://aws.amazon.com/sqs/"&gt;SQS&lt;/a&gt; web service, which provides robust message queues. &lt;br /&gt;&lt;blockquote&gt;...we'd like to provide an update on the "end-of-life" schedule for WSDL versions 2006-04-01 and 2007-05-01. As previously communicated, Amazon SQS users will have until November 6, 2009 to complete their migration to WSDL version 2009-02-01 or 2008-01-01, after which the old WSDL versions will no longer be available.&lt;br /&gt;&lt;/blockquote&gt;Foolishly I read that November was the end of the line for the &lt;b&gt;2008-01-01&lt;/b&gt; WSDL, so I panicked. Yes I know, they were perfectly clear in that notice, but I'm human. One day I may have to confront the upgrade though, so perhaps there is a lesson to learn.&lt;br /&gt;&lt;br /&gt;I use two clients for SQS. My Java applications use &lt;a href="http://code.google.com/p/typica/"&gt;typica&lt;/a&gt; 1.3 and my Perl scripts use an old version of the &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1286"&gt;Perl Library for Amazon SQS&lt;/a&gt;. These both date back to the 2008-01-01 WSDL version of the service and create queues, which require a client that respects that version of the WSDL.&lt;br /&gt;&lt;br /&gt;Amazon have written up &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1148"&gt;some notes&lt;/a&gt; about migrating to the new version of the WSDL. We cannot expect a 2009-02-01 client to play ball with a 2008-01-01 queue, so migrating to 2009-02-01 needs to be a concerted process, which isn't easy in my application's production environments.&lt;br /&gt;&lt;br /&gt;I have a number of 2008-01-01 queues in production, which are kept fairly busy. I was curious to see how an upgrade to the 2009-02-01 Perl library would cope with one of these queues. I was annoyed that the API had changed slightly. The change seems unnecessary. With a couple of small edits to a Perl script to send a message to the queue, I was ready to go, and ready to get a &lt;i&gt;status 400 internal error&lt;/i&gt;. That was no great surprise, because Amazon &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1148"&gt;say&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;Queues created with the previous WSDL versions cannot be accessed with version 2008-01-01 or 2009-02-01, and queues created with version 2008-01-01 or 2009-02-01 cannot be accessed with the previous versions. During the migration period, you can use the previous WSDL versions to send requests to queues created with previous WSDL versions, and you can send requests using the new versions to queues created with the new versions.&lt;br /&gt;&lt;/blockquote&gt;So, I wanted to know what would happen with typica 1.6 in my Java application. I was pleasantly &lt;a href="http://developer.amazonwebservices.com/connect/thread.jspa?messageID=124028&amp;amp;#124028"&gt;surprised to find&lt;/a&gt; that the typica upgrade works fine with a 2008-01-01 queue. I'll keep testing but as long as I don't create any queues with typica 1.6 during the migration, I may be able to upgrade all of my Java wars and SiteSuite libraries and continue to plot along with 2008-01-01 and then upgrade my Perl scripts at a leisurely pace. Or perhaps I should replace my Perl scripts with &lt;a href="http://www.beanshell.org/"&gt;BeanShell&lt;/a&gt; scripts and standardise on typica.&lt;br /&gt;&lt;br /&gt;Coordinating distributed environments with different SysAdmins in each environment is awkward and something to be avoided is possible. Wherever you can depend on backward compatibility, you problems go away. 1 : nill to the Java library this time.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;September 18th update:&lt;/b&gt; It transpires that typica 1.6 needs you to us a different namespace for the 2009-02-01 WSDL, with the existing namespace you invoke the 2008-01-01 WSDL. So it it no cleverer than the Perl API. It would be nice if someone could write a high level wrapper for these libraries to get the right WSDL for the right queue. Having said that, Amazon ought perhaps to be doing that themselves.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-1294667641605285033?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/1294667641605285033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/sqs-wsdl-upgrade.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1294667641605285033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1294667641605285033'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/sqs-wsdl-upgrade.html' title='SQS WSDL upgrade'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7900534527719272797</id><published>2009-09-11T05:54:00.000-07:00</published><updated>2009-09-18T03:29:47.601-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ha'/><category scheme='http://www.blogger.com/atom/ns#' term='mt'/><category scheme='http://www.blogger.com/atom/ns#' term='schwartz'/><category scheme='http://www.blogger.com/atom/ns#' term='nas'/><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='s3'/><category scheme='http://www.blogger.com/atom/ns#' term='rsync'/><category scheme='http://www.blogger.com/atom/ns#' term='nfs'/><category scheme='http://www.blogger.com/atom/ns#' term='ec2'/><title type='text'>MovableType publishing in EC2 with dedicated Publishers and RSync</title><content type='html'>I've been thinking about scaling out &lt;acronym title="MovableType"&gt;MT&lt;/acronym&gt; publishing again.&lt;br /&gt;&lt;br /&gt;A really satisfying way to publish static content would be to put it out onto &lt;a href="http://aws.amazon.com/s3/"&gt;S3&lt;/a&gt;, and I've posted &lt;a href="http://forums.movabletype.org/2009/08/synctarget-to-amazon-s3.html#c28583"&gt;thoughts about an approach which would do that&lt;/a&gt; in the MT forums. That's an ambitious goal and ultimately a good one, but let's walk before we run and first consider publishing to MT &lt;acronym title="These are the servers that host static HTML and PHP - if PHP is needed"&gt;Page Servers&lt;/acronym&gt; in &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt;. It is something we have to do anyhow, unless we can convince MT template authors to avoid resorting to generating &lt;acronym title="Server-side script that needs a web server with PHP support and CPU and database(?) resources"&gt;PHP&lt;/acronym&gt;, where client-side JavaScript would suffice.&lt;br /&gt;&lt;br /&gt;Rather than publish on a &lt;a href="http://bimport.blogspot.com/2009/09/nfs-always-bites-you.html"&gt;NAS&lt;/a&gt; which creates a &lt;a href="http://en.wikipedia.org/wiki/Single_Point_of_Failure"&gt;single point of failure&lt;/a&gt; headache in Amazon's cloud, you &lt;i&gt;might&lt;/i&gt; choose to &lt;a href="http://bimport.blogspot.com/2009/09/using-unison-for-ha-docroot-instead-of.html"&gt;synchronise published folders on page servers with Unison&lt;/a&gt;. That approach is fine for a system which does a limited amount of publishing, but feels wrong for a busy MT system and I expect will prove to benchmark poorly. I haven't tested this, but &lt;a href="http://www.cis.upenn.edu/%7Ebcpierce/unison/"&gt;Unison&lt;/a&gt; crawls through the entire directory tree, which has got to be expensive if the tree is substantial. Let's take it as read that Unison is &lt;i&gt;expensive&lt;/i&gt; for a large blog server deployment.&lt;br /&gt;&lt;br /&gt;It is much better to use a message queue which has been tipped off about published pages, and what better queue to use for that purpose than one that already exists? MT queues jobs in the &lt;a href="http://www.movabletype.org/documentation/developer/schwartz-workers.html"&gt;Schwartz&lt;/a&gt; message queue, and there is a hook already in place to get it to do so to get &lt;a href="http://www.movabletype.org/documentation/appendices/config-directives/synctarget.html"&gt;rsync to broadcast published&lt;/a&gt; files. See Byrne Reese's &lt;a href="http://www.majordojo.com/2009/03/an-open-source-movable-type-operations-manual.php"&gt;MT Operations Manual&lt;/a&gt; about this (look for RSync).&lt;br /&gt;&lt;br /&gt;Let's consider this in a &lt;i&gt;scaled out&lt;/i&gt; &lt;acronym title="High Availability"&gt;HA&lt;/acronym&gt; deployment. MT describe an &lt;a href="http://www.movabletype.org/documentation/enterprise/system-architecture.html%20"&gt;advanced configuration&lt;/a&gt; where Publishers are separate from Page Servers. That is a good model to work from for a deployment with elastic scaling.&lt;br /&gt;&lt;br /&gt;A Publisher publishes a file to its local file system. A Publisher may be doing this because it picked up the publishing job from the Schwartz message queue in the first place. It could be a dedicated host handling publish requests from the queue. Alternatively the Publisher may be local to the &lt;acronym title="This is the possibly dedicated server(s) that blog authors use for content management"&gt;App Server&lt;/acronym&gt;, because the blog has been set up to use static (i.e. immediate publishing). It is good to support static publishing as well as queued publishing, so let's assume we have to deal with either. Let's assume that comments are always published via the queue, though. Otherwise, we need also to consider a Publisher that is synonymous with a &lt;acronym title="This is the possibly dedicated server(s) that field(s) CGI requests from users commenting to pages or entries in the blogs"&gt;Comment Server&lt;/acronym&gt; too.&lt;br /&gt;&lt;br /&gt;So do the rsync request needs to be fielded by the server that has done the publishing, since the file is going to be local to its file system? &lt;i&gt;That would mean that run-periodic-tasks needs to be run on all potential Publishers and MT would need to be designed to have Rsync requests picked up by all Publisher servers. That would be an elaborate and expensive way to defer a job on the local machine. &lt;/i&gt;Happily, I believe that the design is cleverer than that. &lt;i&gt;But I'm &lt;a href="http://www.majordojo.com/2009/01/movable-type-system-architectures.php"&gt;seeking some clarification&lt;/a&gt; from Byrne Reese on this to confirm.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The job processor - a Publisher that subscribes to run-periodic-tasks to pull job requests from the Schwartz queue (as apposed to a static Publisher) - is [** I believe **] able to use an rsync source from a server other than itself. The rsync destinations (&lt;a href="http://www.movabletype.org/documentation/mt41/rsync.html"&gt;SyncTarget&lt;/a&gt;) are the Page Servers. We only really care that content gets to Page Servers. Publishers in an Advanced Configuration do not need to have a complete set of the published content. Page Servers do.&lt;br /&gt;&lt;br /&gt;So what steps are involed in &lt;i&gt;scaling out&lt;/i&gt;? How do we add Page Servers to the EC2 configuration?&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The first thing to do is to set up the new Page Server as a SyncTarget in all units which are configured to run-periodic-tasks in their crontabs. The Publishers that run-periodic-tasks subscribe to the Schwartz message queue and publish static content that has been queued. Similarly, but not as the same job (because content may be published immediately) these units may pick up rsync requests for specific files. We need to know that new published content is being moved onto the new Page Server before it goes on line.&lt;/li&gt;&lt;li&gt;The next thing to do is to rsync the entire content of the docroot from another Page Server.&lt;/li&gt;&lt;li&gt;Then add the new Page Server to the &lt;a href="http://aws.amazon.com/elasticloadbalancing/"&gt;load balancer&lt;/a&gt;.&lt;/li&gt;&lt;/ol&gt;Page Server AMIs should be set up with the rsync package pre-installed along with MT. Publishers also need rsync to fetch files from Publisher sources.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update 12th September:&lt;/b&gt; It looks like there is a design flaw in MovableType 4.31 with respect to the RSync implementation in that it assumes that the file has been published locally. For a scaled out EC2 system with no NAS, this is a problem. I'm going to see if the RSync request in the Schwartz job queue has (or can be easily made to have) the host name of the host which published the file and then... perhaps... work on a patch to fetch the file from that unit, if an EC2 scale-out is what my client wants.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7900534527719272797?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7900534527719272797/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/movabletype-publishing-in-ec2-with.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7900534527719272797'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7900534527719272797'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/movabletype-publishing-in-ec2-with.html' title='MovableType publishing in EC2 with dedicated Publishers and RSync'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-208114226866250357</id><published>2009-09-10T10:31:00.000-07:00</published><updated>2009-10-12T04:47:46.358-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='ebs'/><category scheme='http://www.blogger.com/atom/ns#' term='bind'/><category scheme='http://www.blogger.com/atom/ns#' term='mount'/><category scheme='http://www.blogger.com/atom/ns#' term='ec2'/><category scheme='http://www.blogger.com/atom/ns#' term='ami'/><category scheme='http://www.blogger.com/atom/ns#' term='aws'/><title type='text'>Linux bind mounts to EBS in EC2 and MySQL</title><content type='html'>I've been making extensive use of &lt;a href="http://aplawrence.com/Linux/mount_bind.html"&gt;bind mounts&lt;/a&gt; lately for setting up a persistent database in &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt;, which appears to be in the root file system. Following a &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663"&gt;recommendation &lt;/a&gt;by the brilliant &lt;a href="http://www.anvilon.com/"&gt;Eric Hammond&lt;/a&gt;, I put my MySQL data and config files onto an &lt;a href="http://en.wikipedia.org/wiki/XFS"&gt;XFS&lt;/a&gt;-formatted &lt;a href="http://aws.amazon.com/ebs/"&gt;EBS &lt;/a&gt;volume so that it persists.&lt;br /&gt;&lt;br /&gt;To minimise changes to set-up files, I moved the /etc/mysql, /var/lib/mysql and /var/log/mysql directories to {EBS-MOUNT}/etc/mysql, {EBS-MOUNT}/var/lib/mysql and {EBS-MOUNT}/var/log/mysql and re-created /var/lib/mysql and /var/log/mysql as empty directories, which I bind-mounted to the directories in the EBS volume. This is the same approach that Schlomo Swidler takes in his excellent &lt;a href="http://clouddevelopertips.blogspot.com/2009/08/mount-ebs-volume-created-from-snapshot.html"&gt;article&lt;/a&gt; about attaching an EBS volume at start-up. &lt;i&gt;However, I made the mistake of putting my mounts in the ephemeral storage on /mnt, which was a bad plan with hindsight, because /mnt itself is mounted separately on /dev/sdb and isn't something you bundle with an AMI.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I noted my set up procedure, but it occurred to me that unlike soft symlinks to directories, directories which are bind-mounted are not all that obvious when you are trying to remember which you handled in that way, and you don't want to dig through notes.In a wiki by Chris Siebenmann, I found an &lt;a href="http://utcc.utoronto.ca/%7Ecks/space/blog/linux/BindMounts"&gt;article&lt;/a&gt; explaining that /etc/mtab shows the bind mounts, but it isn't main-stream system administration.&lt;br /&gt;&lt;br /&gt;I was curious to see what would happen in /etc/mtab was bundled in an AMI, which was booted without the benefit of Schlomo's recipe for attaching an EBS on start-up. What would retro-fitting the volume entail.&lt;br /&gt;&lt;br /&gt;My MySQL EBS volume was mounted in /mnt/mysql. &lt;i&gt;I don't recommend this, because the /mnt directory is itself mounted in ephemeral storage.&lt;/i&gt; The volume concerned was attached to a &lt;a href="http://aws.amazon.com/ec2/instance-types/"&gt;High CPU extra large&lt;/a&gt; 64-bit &lt;i&gt;c1.large&lt;/i&gt; &lt;acronym title="alestic-64/debian-5.0-lenny-base-64-20090804.manifest.xml"&gt;Debian Lenny&lt;/acronym&gt; EC2 instance with a MySQL slave set up with &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/replication-howto-slavebaseconfig.html"&gt;server-id = 2&lt;/a&gt;. To instance another MySQL slave, I was going to have to create a snapshot and change that ID in a fresh EBS volume created from that snapshot. But first, I needed to create an &lt;acronym title="Amazon Machine Image"&gt;AMI&lt;/acronym&gt; for the MySQL master/slave, because I hadn't already done this. &lt;i&gt;The initial set-up had been a quick response to a hosting emergency.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I followed Eric Hammond's familiar &lt;a href="http://alestic.com/2009/06/ec2-ami-bundle"&gt;procedure&lt;/a&gt; for creating &lt;acronym title="High Availability"&gt;HA&lt;/acronym&gt; AMI for a dedicated master/slave 64-bit instance. With the MySQL config in my master or slave EBS volume and the root filesystem simply pointing to these, there ought to be no differences between the a master and slave AMI.&lt;br /&gt;&lt;br /&gt;It occurred to me, however, that when I was bundling my AMI, I needed to exclude the very directories that I was concerned about. Otherwise, the bundling process would bundle /etc/mysql, /var/lib/mysql and /var/log/mysql into the AMI unaware of the fact that this data lives in the EBS volume.&lt;br /&gt;&lt;br /&gt;So here's my bundle volume command: &lt;br /&gt;&lt;pre&gt;sudo -E ec2-bundle-vol           \&lt;br /&gt;-r $arch                       \&lt;br /&gt;-d /mnt                        \&lt;br /&gt;-p $prefix                     \&lt;br /&gt;-u $AWS_USER_ID                \&lt;br /&gt;-k /root/.ec2/pk-*.pem         \&lt;br /&gt;-c /root/.ec2/cert-*.pem       \&lt;br /&gt;-s 10240                       \&lt;br /&gt;-e /mnt,/tmp,/root/.ssh,/root/.ec2,/var/log/mysql,/var/lib/mysql,/etc/mysql&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Unlike Eric Hammond, I've been keeping &lt;acronym title="Amazon Web Services"&gt;AWS&lt;/acronym&gt; authentication keys in /root/.ec2, but otherwise it it only /etc/mysql, /var/lib/mysql and /var/log/mysql that needs special handling.&lt;br /&gt;&lt;br /&gt;I hate having special handling, though. It made me think again about my use of bind-mounts. Had I been less driven by fashion, I'd simply have used soft symlinks for these directories. Directories have to use soft symlinks and links between mounts have to be soft anyhow.&lt;br /&gt;&lt;br /&gt;Had I used soft symlinks:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; I wouldn't have obfuscated things and ls -l&amp;nbsp; or find . -type l would have shown the symlinks and made it clear what I was doing. AN Other would have had a better time with that than having to rummage around in /etc/mtab to see bind mounts.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I wouldn't have needed a special procedure for the bundle volume, cleverly excluding my bind mounts, because soft symlinks wouldn't have been followed and the MySQL data and configuration files on my EBS volume wouldn't have been bundled into the AMI, if I omitted the exclude.&lt;/li&gt;&lt;/ul&gt;So I ask myself, &lt;i&gt;What good reason is there to use bind mounts, when good old soft symlinks will do?&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-208114226866250357?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/208114226866250357/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/linux-bind-mounts-to-ebs-in-ec2-and.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/208114226866250357'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/208114226866250357'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/linux-bind-mounts-to-ebs-in-ec2-and.html' title='Linux bind mounts to EBS in EC2 and MySQL'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-5037248758806547789</id><published>2009-09-09T06:40:00.000-07:00</published><updated>2009-09-18T03:18:10.500-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='static publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='rsync'/><category scheme='http://www.blogger.com/atom/ns#' term='nfs'/><category scheme='http://www.blogger.com/atom/ns#' term='unison'/><title type='text'>Using Unison for a HA docroot instead of NFS</title><content type='html'>I thought that synchronising a docroot between two "masters" in an &lt;a href="http://en.wikipedia.org/wiki/High_availability" title="High Availability"&gt;HA&lt;/a&gt; web site, would require running &lt;a href="http://samba.anu.edu.au/rsync/"&gt;rsync &lt;/a&gt;at both ends. There is something ugly about that, and like master-master replication in MySQL it is bound to be inefficient.&lt;br /&gt;&lt;br /&gt;I'm not quite sure why I overlooked &lt;a href="http://www.cis.upenn.edu/%7Ebcpierce/unison/"&gt;unison&lt;/a&gt;. There is a &lt;a href="http://blog.melimato.com/keeping-directories-in-sync-with-unison/"&gt;succinct article&lt;/a&gt; written by Pablo, which put me straight and I like this approach. Unison is genuinely bidirectional and easy to install from package managers.&lt;br /&gt;&lt;br /&gt;My mindset is this:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I have a database-driven site, and only the database really needs to be bang up to date. I've scaled "up" rather than out my database, because I can get away with avoiding clustering and my database can cope with lots of web page server connections.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Static content published in the docroot of my web page servers ought to be current, but it can lag by 5 minutes on different nodes.&lt;/li&gt;&lt;li&gt;I'm really not keen on using NFS, because you are stuck with something expensive if you aren't going to wind up with a single point of failure. I'd rather publish locally and using synchronisation to keep all of my nodes up to date with published content.&lt;/li&gt;&lt;li&gt;Content that is published can have a crude conflict resolution. If a file is published simultaneously on two nodes, the nodes should &lt;i&gt;prefer&lt;/i&gt; their local copy. It isn't a big problem than one node may be out of date, because the next update is bound to propagate anyhow.&lt;/li&gt;&lt;/ul&gt;With this mindset, I&amp;nbsp; have a quick any simple Unison profile, which prefers local copies when there are conflicts and otherwise updates writes on all nodes. The strategy isn't good enough for a database, but it is fine for published content.&lt;br /&gt;&lt;br /&gt;So unlike my database, I scale "out" my web page servers and avoid the misery of NFS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-5037248758806547789?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/5037248758806547789/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/using-unison-for-ha-docroot-instead-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5037248758806547789'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/5037248758806547789'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/using-unison-for-ha-docroot-instead-of.html' title='Using Unison for a HA docroot instead of NFS'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-7243026315802247099</id><published>2009-09-07T10:56:00.000-07:00</published><updated>2009-09-24T06:13:26.164-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dd'/><category scheme='http://www.blogger.com/atom/ns#' term='persistentfs'/><category scheme='http://www.blogger.com/atom/ns#' term='debian'/><category scheme='http://www.blogger.com/atom/ns#' term='acl'/><category scheme='http://www.blogger.com/atom/ns#' term='ami'/><category scheme='http://www.blogger.com/atom/ns#' term='aws'/><category scheme='http://www.blogger.com/atom/ns#' term='pivot'/><category scheme='http://www.blogger.com/atom/ns#' term='xfs'/><category scheme='http://www.blogger.com/atom/ns#' term='ebs'/><category scheme='http://www.blogger.com/atom/ns#' term='s3'/><category scheme='http://www.blogger.com/atom/ns#' term='ec2'/><category scheme='http://www.blogger.com/atom/ns#' term='ext3'/><category scheme='http://www.blogger.com/atom/ns#' term='snapshot'/><title type='text'>Sharing a DEV environment in AWS</title><content type='html'>If you are familiar with &lt;a href="http://aws.amazon.com/"&gt;Amazon's Web Services&lt;/a&gt; and its flagship service &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt;, you'll be familiar with the plethora of &lt;a href="http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=171"&gt;public AMIs&lt;/a&gt; which can get you kicked off with a pre-installed system in the cloud. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.anvilon.com/"&gt;Someone very clever&lt;/a&gt; has hopefully gone to the trouble of creating a base installation of your favourite Linux distro – say &lt;a href="http://alestic.com/"&gt;Debian Lenny 5.0&lt;/a&gt;. That gives you the bare bones of a system, including a package manager, which lets you add the packages which your application depends upon. Someone else may have gone to the trouble of taking that AMI and added packages to it so that it is now (say) basic Debian based web server with PHP support or with Tomcat for Java. They then bundle and register another AMI. If they make their AMI public or make it accessible to you through an &lt;a href="http://docs.amazonwebservices.com/AmazonS3/latest/index.html?S3_ACLs.html%60"&gt;ACL&lt;/a&gt;, you have something nicer to start with for your project. &lt;br /&gt;&lt;br /&gt;Something you very soon come to realise, though, is that your system is volatile. While your instance is up and running, you may build upon your application and its data, but when it is terminated, the fruit of your efforts is lost. Your instance runs on commodity hardware, and you should not depend on the hardware not to crash.&lt;br /&gt;&lt;br /&gt;So, if your application is worth its salt, you should &lt;a href="http://alestic.com/2009/06/ec2-ami-bundle"&gt;re-bundle and register your own AMI&lt;/a&gt; from it. It is a relatively big deal doing this and you don't really want to do this too frequently. The AMI is stored in Amazon's robust &lt;a href="http://aws.amazon.com/s3/"&gt;S3&lt;/a&gt; simple storage service. You can set up an ACL to make it accessible to as many or as few people as you want and distribute it to your clients, colleagues or collaborators in the general public in that way. That way you don't have to re-install all of your dependencies when you re-launch your AMI. This is fine but the process can take ~30 minutes and it is tempting to fail to save changes when things take that long.&lt;br /&gt;&lt;br /&gt;If your data is of value, you can save it independently in an S3 bucket. And make it accessible to as many or as few as you like, using ACLs.&lt;br /&gt;&lt;br /&gt;If you have a lot of data files, which you want to preserve, you should consider mounting a directory with persistent storage. &lt;br /&gt;&lt;br /&gt;There are two alternative approaches:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;You can set up a &lt;a href="http://www.persistentfs.com/"&gt;PersistentFS&lt;/a&gt; persistent file system in S3. This is fairly quick to &lt;a href="http://www.persistentfs.com/documentation/Installation"&gt;set up&lt;/a&gt;. Data persists in S3 objects as tarballs representing the block device - i.e. you don't see individual files in S3, so this is not a convenient way to publish static data. These S3 objects are &lt;a href="http://www.persistentfs.com/documentation/AppNotes/Data_Archiving"&gt;efficiently written to&lt;/a&gt; from a cache file which is local to the EC2 instance. This is efficient for most purposes but inefficient for I/O intensive data - e.g. database.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;You can set up an &lt;a href="http://aws.amazon.com/ebs/"&gt;EBS&lt;/a&gt; volume for data, which is local to the EC2 instance region, but which has a lifetime which is independent of any EC2 instance. EBS volumes are block devices, which may be attached to any one EC2 instance in the same region, and can be high level formatted with (say) an ext3 or XFS file system. When the attached EC2 instance is terminated, the EBS block persists, but it exists as a single instance and should therefore be backed up onto S3 is the data is valuable. There is a convenient mechanism for creating "snapshots" of EBS volumes into S3. These can be converted back into new EBS volumes in any region you choose.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;The relative merits of each approach are covered by PersistentFS.com in &lt;a href="http://developer.amazonwebservices.com/connect/message.jspa?messageID=98786"&gt;a forum post&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;For a development environment, EBS seems like a natural mount to work with, because things change around a lot and creating snapshots of EBS volumes is rather like committing changes to a repository, when things are stable. You can create snapshots of versions.&lt;br /&gt;&lt;br /&gt;A very clever extension to this approach is presented in an &lt;a href="http://clouddevelopertips.blogspot.com/2009/07/boot-ec2-instances-from-ebs.html"&gt;excellent article&lt;/a&gt; by &lt;a href="http://www.blogger.com/profile/10469902663120418195"&gt;Shlomo Swidler&lt;/a&gt;, where he proposes mounting an EBS volume with a root file system on it. Linux has the &lt;a href="http://linux.die.net/man/8/pivot_root"&gt;ability to change its root file system&lt;/a&gt; after booting. Shlomo's article was written after this idea was mooted and expanded upon in a &lt;a href="http://developer.amazonwebservices.com/connect/thread.jspa?threadID=24091"&gt;developer discussion&lt;/a&gt;, which he credits in his article. Shlomo describes how to make a self-contained EBS, with your complete development system, which is "pivot-booted" from a "bare bones" AMI. Schlomo's preferred poison is the Alestic Ubuntu AMI, and he creates the bootable EBS volume and a pivot-boot AMI from it. I created a bootable EBS volume from an Alestic Debian AMI from his instructions, but struggled to get my pivot-boot working. Rather than persevering with my Debian pivot-boot, though, I've been using &lt;a href="http://developer.amazonwebservices.com/connect/profile.jspa?userID=66042"&gt;N Martin&lt;/a&gt;'s public &lt;a href="http://friendfeed.com/wizardofcrowds/f04ccd65/amazon-web-services-developer-community"&gt;pivot-boot AMI from nimlabs&lt;/a&gt;, which is &lt;a href="http://developer.amazonwebservices.com/connect/message.jspa?messageID=99009"&gt;described&lt;/a&gt; in the developer discussion. I don't care which distro N Martin uses for the pivot boot AMI (actually it is Ubuntu Hardy), because my distro prepared using Shlomo's instructions applied to Debian Lenny is boot-strapped from my EBS volume. I do care that N Martin set up his pivot-boot to expect a root file system in /dev/sdj, because that's where I need to attach my bootable EBS volume.&lt;br /&gt;&lt;br /&gt;So I now have my entire DEV environment on an EBS volume, which Amazon charges $0.10 per GB month for. I minimise my EC2 usage by terminating EC2 when I go home. When I go on holiday or want to put a project on ice, I create a snapshot and kill my EBS volume, because data is more durable in S3 than on an EBS volume. It also costs less in S3, because despite being charged $0.15 per GB month, the volume is compressed and S3 snapshots store only incremental changes.&lt;br /&gt;&lt;br /&gt;So I want my EBS volume to be as small as possible and I have the lack of foresight to start with a 1G volume. My project dependencies increase - my package manager has been busy - and I outgrow that volume. What do I do now?&lt;br /&gt;&lt;br /&gt;I do the following in Elasticfox (a Firefox extension):&lt;br /&gt;&lt;ol&gt;&lt;li&gt;I shut down my EC2 instance to create a snapshot of my 1G EBS volume. Why shut it down to create a snapshot? Because it is a root file system and I don't want to create a snapshot it while it is busy or the snapshot will be unbootable. If it is XFS, xfs_freeze cannot be used on the root file system, and if it is ext3 freezing is complicated and is bound not to work on a root file system anyhow. My EBS volume is high level formatted with an ext3 file system. That means I need to un-mount the volume before creating a snapshot, and that means terminating the EC2 instance.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I then create the snapshot.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I create a 2G EBS volume from the snapshot. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;I start up an EC2 instance. This doesn't need to be a pivot boot instance, because we are going to use it to grow the ext3 file system; it should not boot from the 2G EBS volume at this point. Rather than choosing my usual Debian Lenny 5 base AMI, I pivot-boot off the 1G EBS volume, which I attach as /dev/sdj (N Martin's preference).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;When the instance is up, I attach the new 2G volume to it as /dev/sdk – i.e. something other than /dev/sdj.&lt;br /&gt;&lt;/li&gt;&lt;li&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;I don't mount /dev/sdk. I work with the EBS block device as follows:&lt;br /&gt;&lt;/div&gt;&lt;ol&gt;&lt;li&gt;I check the ext3 file system on the block device: e2fsck -f /dev/sdk&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I grow the file system to fill the 2G volume: resize2fs /dev/sdk&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I can terminate my EC2 instance now, though I don't &lt;i&gt;need&lt;/i&gt; to do this to create a snapshot of the "grown" 2G volume to supersede the snapshot of the 1G volume, because the EBS volume is not mounted and therefore is not busy.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I detach the 2G volume from /dev/sdk on the EC2 instance, if it is still running.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I use EBS volumes created from a 2G snapshot henceforth and attach it to /dev/sdj of N Martin's pivot boot AMI instance. &lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;Now I have a 2G development system for my project, a procedure for growing it if needs be and a procedure for creating robust snapshots in S3.&lt;br /&gt;&lt;br /&gt;I need to share my development system and I hit an irritating omission in AWS's current system. Unlike the manifests and parts of AMIs, EBS snapshots are not regular S3 objects that appear in buckets which you specify. That means you cannot set up ACLs to allow specific users or the general public to access your snapshots. This seems arbitrary and I'm sure Amazon will address this oversight, but in the interim you have to do the following to share your EBS snapshot with others so that you can share your development system, using Elasticfox to work EC2 and S3Fox to work with the S3 buckets and object ACLs:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Important... Make sure that your root file system doesn't contain anything that you do not want to share. If you have your authentication details in /mnt, you are OK. However, if they are in /root/.ec2 on the booted EBS, you need to beware that this will be distributed, and should be moved away before creating a snapshot. You will be fine if you followed Shlomo's advice: "The certificate and private key credentials should be copied to the instance in the &lt;span style="font-family: Courier New; font-size: 10pt;"&gt;/mnt&lt;/span&gt; partition so they don't get bundled into the image and copied to the bootable EBS volume."&lt;br /&gt;&lt;/li&gt;&lt;li&gt;You need to start an EC2 instance, if you do not already have one running. The instance does not need to be a pivot boot, but you will want to install some S3 tools onto it to upload the file to S3: a great choice is &lt;a href="http://s3tools.org/s3cmd"&gt;s3cmd&lt;/a&gt;. If you have this on your development system, you can launch that with the pivot boot as usual – you will be using a 2&lt;sup&gt;nd&lt;/sup&gt; EBS volume if you do.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;You should create an EBS volume from your latest snapshot. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;Attach the EBS volume to (say) /dev/sdk&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Copy the contents of that block device into a file in /mnt, using dd. If it is a 2G volume, you will create a 2G file:  &lt;span style="font-family: Courier New; font-size: 10pt;"&gt;dd if=/dev/sdk of=/mnt/x.dat&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Compress the file - this compressed to 90% of its size for me:  &lt;span style="font-family: Courier New; font-size: 10pt;"&gt;gzip /mnt/x.dat&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Upload the compressed file into an S3 bucket.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Set up the ACL in the S3 bucket so the file may be read by everyone, who you want to read it.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;The person wishing to use your snapshot should:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Instantiate a bare bones EC2 instance&lt;span style="font-family: Courier New; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Attach a new 2G EBS on (say) /dev/sdk&lt;span style="font-family: Courier New; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Fetch the x.dat.gz from your bucket, using (say) s3cmd. A good destination for this is /mnt.&lt;span style="font-family: Courier New; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Uncompress:  &lt;span style="font-family: Courier New;"&gt;gunzip /mnt/x.dat.gz&lt;span style="font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Copy the contents of the uncompressed file onto the block device: &lt;span style="font-family: Courier New; font-size: 10pt;"&gt;dd if=/mnt/x.dat of=/dev/sdk&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Terminate the EC2 instance or detach the EBS volume from it&lt;span style="font-family: Courier New; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Start an instance of the N Martin's public pivot boot AMI ami-2feb0f46 nimlabs/pivot-sdj-20080824.manifest.xml&lt;span style="font-family: Courier New; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Attach the EBS volume as /dev/sdj &lt;span style="font-family: Courier New; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Wait for it to boot – just like you &lt;span style="font-family: Courier New; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;All of this is I expect blindingly obvious to Shlomo et al, and I hope the EBS sharing requirement will soon becomes obsolete (I know that Schlomo is working on the Amazon folk to that effect), but that's my recipe for now for sharing a nifty development system without sharing your AWS account.&lt;br /&gt;&lt;br /&gt;A production system is an altogether different kettle of fish.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update: 24th September&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;EBS volumes may now be shared, so no need to mess around directly with S3.&lt;br /&gt;&lt;br /&gt;Here's what they now say:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Snapshots can be shared using AWS Management Console or using API calls. You have full control over what accounts you share each snapshot with, including the option to share it with the entire AWS community.    &lt;br /&gt;&lt;h3&gt;AWS Management Console:&lt;/h3&gt;&lt;ol&gt;&lt;li&gt;Log in to the AWS Management Console, click the &lt;b&gt;Amazon EC2&lt;/b&gt; tab and click &lt;i&gt;Snapshots&lt;/i&gt; on the left navigation pane &lt;/li&gt;&lt;li&gt;Right-click on the snapshot you wish to share and select &lt;i&gt;Snapshot permissions&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Add the AWS account numbers of the developers who you want to grant access or share it publicly &lt;/li&gt;&lt;li&gt;Hit &lt;b&gt;Save&lt;/b&gt; to apply the permissions&lt;/li&gt;&lt;/ol&gt;&lt;/blockquote&gt;More on this &lt;a href="http://clouddevelopertips.blogspot.com/2009/09/cool-things-you-can-do-with-shared-ebs.html"&gt;here&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;That makes a shared DEV environment more of a snap, doesn't it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-7243026315802247099?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/7243026315802247099/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/sharing-dev-environment-in-aws.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7243026315802247099'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/7243026315802247099'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/sharing-dev-environment-in-aws.html' title='Sharing a DEV environment in AWS'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-8874692517921366145</id><published>2009-09-03T15:00:00.001-07:00</published><updated>2009-09-09T07:32:59.400-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nas'/><category scheme='http://www.blogger.com/atom/ns#' term='nfs'/><title type='text'>NFS always bites you</title><content type='html'>When I was working for the forerunner of &lt;a href="http://www.emailsystems.com/"&gt;a company&lt;/a&gt; that runs managed services for e-mail filtering, the system administrators hated NFS with a vengeance. No doubt that was because they literally lost sleep when alerts systems indicated that NFS has gone down in one data centre or the other.&lt;br /&gt;&lt;br /&gt;Once again, I've found myself trying to figure out why three servers in a production cluster have terrible update times. Once again, it was NFS that was letting us down. If my understanding was right, the NetApp cluster that's used as a NAS was rebooted at some time and no one noticed that rpc.statd had died on those hosts. My guess is that rpc.lockd gets in knickers in a twist about lock recovery and that somehow makes the NFS client go slowly for writes.&lt;br /&gt;&lt;br /&gt;I'm certainly not a SysAdmin, and no doubt my read of the situation is na&lt;span style="font-family: Times New Roman; font-size: 12pt;"&gt;ï&lt;/span&gt;ve, but one way or another I reckon that we are paying the price of the &lt;i&gt;simplicity &lt;/i&gt;of NFS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-8874692517921366145?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/8874692517921366145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/nfs-always-bites-you.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/8874692517921366145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/8874692517921366145'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/nfs-always-bites-you.html' title='NFS always bites you'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-1438842564841873729</id><published>2009-09-03T03:49:00.001-07:00</published><updated>2009-09-03T10:30:21.260-07:00</updated><title type='text'>Why isn’t this MovableType?</title><content type='html'>&lt;span xmlns=''&gt;&lt;p&gt;I've been working extensively with MovableType, which ticks a lot of boxes for me.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;These are the main points for me:&lt;br /&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;My primary &lt;a href='http://www.rainbow-media.com/'&gt;client&lt;/a&gt; uses MT as its primary CMS. [That is a good enough reason on its own.]&lt;br /&gt;&lt;/li&gt;&lt;li&gt;It scales nicely, publishing static content.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;It allows a software developer to get his hands dirty, using Perl, which I like for irrational reasons.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;So why aren't I using this now?&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Well... Google provides free hosting and it is good to see other technologies, if I'm going to improve what I do with MT.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;I am for example trying to get AtomPub to work properly, and was curious to try out the Microsoft Word Client. This post comes from a Word 2007 blog post. The integration with Blogger looks to be mature, and I've had &lt;a href='http://forums.movabletype.org/2009/09/atompub-client-update-fails.html'&gt;teething problems&lt;/a&gt; with MT, using a &lt;a href='http://search.cpan.org/~miyagawa/XML-Atom/'&gt;Perl client&lt;/a&gt;. This post should be proof of concept for Word 2007 AtomPub implementation – not that I can see a compelling reason to use that as a UI under normal circumstances!&lt;br /&gt;&lt;/p&gt;&lt;p&gt;As part of my experiment, I want to see if adding this paragraph in Word - having previously published the entry - makes the post update rather than appending a new entry. If this get's repetitive, we have a thumbs down!&lt;/p&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-1438842564841873729?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/1438842564841873729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/why-isnt-this-movabletype.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1438842564841873729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/1438842564841873729'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/why-isnt-this-movabletype.html' title='Why isn’t this MovableType?'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-212269981320258846.post-698874480389957897</id><published>2009-09-03T03:25:00.000-07:00</published><updated>2009-09-03T03:28:46.749-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='kiss'/><title type='text'>My apprenticeship</title><content type='html'>My first IT employer described what he did as &lt;i&gt;the trailing edge&lt;/i&gt; of technology. Trail blazers had beaten down a path and made massive profits from being the first to provide the world with the dubious benefits of &lt;a href="http://en.wikipedia.org/wiki/Interactive_voice_response"&gt;IVR &lt;/a&gt;technology for &lt;a href="http://en.wikipedia.org/wiki/Premium-rate_telephone_number"&gt;premium rate&lt;/a&gt; telephone services. His aim was to mop up the more complicated work, which they didn't have the time or inclination to tackle and perhaps do simple things better. He did very well at that, and my apprenticeship was a privileged one. He was conservative about change and a strong believer in &lt;acronym title="Keep It Simple Stupid"&gt;KISS&lt;/acronym&gt;.&lt;br /&gt;&lt;br /&gt;He was still using the &lt;a href="http://en.wikipedia.org/wiki/Turbo_C"&gt;1987 TurboC compiler&lt;/a&gt; in the mid '90s, because he has such an in-depth understanding of its foibles. You could fit the compiler, an editor, Norton Utilities and a copy of LANtastic on a 720K diskette, which fit nicely into your pocket (unlike the 360K ones, which corrupted if you sneezed), and you were set up to trouble-shoot any problems in the early '90s, as long as you were dealing with DOS.&lt;br /&gt;&lt;br /&gt;I failed utterly to convince him about the virtues of C++ and the benefits of &lt;acronym title="Object Oriented Programming"&gt;OOP&lt;/acronym&gt;. While I have subsequently prospered adopting new technologies and trusting new libraries, which he would have steered clear from, I still hear echoes of his disdainful wisdom, when adopting a new technology for the sake of being fashionable or short-sighted expediency causes a project to collapse.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/212269981320258846-698874480389957897?l=bimport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bimport.blogspot.com/feeds/698874480389957897/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bimport.blogspot.com/2009/09/my-apprenticeship.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/698874480389957897'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/212269981320258846/posts/default/698874480389957897'/><link rel='alternate' type='text/html' href='http://bimport.blogspot.com/2009/09/my-apprenticeship.html' title='My apprenticeship'/><author><name>Rob Staveley (Tom)</name><uri>http://www.blogger.com/profile/12685264964545576535</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_mmq4zdFTiOI/Sp-aftOmhZI/AAAAAAAAPJM/GSiGOc7pRsc/S220/Look-8.jpg'/></author><thr:total>0</thr:total></entry></feed>
