Blog Operations

In an earlier post, I wrote,

The whole point of a blog like this is to share everything on the site with literally anyone who wants to see it. In fact, I’ve gone through rather a lot of trouble to make sure that happens.

That got me thinking about all the bits of technology that it takes to put a blog like this on the web. I decided to make a list, and it turns out there are a lot of pieces. To start with, there’s the main WordPress installation:

  • WordPress — the blogging engine that powers it all.

The look and feel comes from the WordPress theme I use, which is made up of about four big pieces:

I’ve customized Skeleton to produce the theme I use for Windypundit. I use the SCSS preprocessor to do some of the math to make the look and feel more tunable.

Naturally, Windypundit uses a number of WordPress plugins:

  • Cookies for Comments — A simple anti-spam plugin that rejects some less intelligent spambots.
  • Google XML Sitemaps — Adds a sitemap for Google.
  • Growmap Anti Spambot Plugin — Another simple anti-spam plugin. This generates the “Confirm you are NOT a spammer” checkbox.
  • NextGEN — An image gallery management tool. I don’t use all the gallery features.
  • Search Regex — A potentially dangerous tool that allows me to search-and-replace regular expressions across all blog posts.
  • W3 Total Cache — Caching software, so pages can be served quickly without as many hits to the database.
  • WordPress HTTPS — Improves HTTPS support.
  • WordPress SEO — Makes pages a little more search-engine friendly. I only use a fraction of the features.
  • wp-jquery-lightbox — Displays photos bigger when you click on them. I use this instead of NextGEN’s viewer.
  • WP Widget Cache — Caches the widgets on pages so they can be served without re-querying the database.

I also wrote two small custom plugins:

  • Custom Dynamic Photo Resizer — A WordPress shortcode to generate resized photos for blog posts. If I want to change the size of all the photos in my layout, I just change the resizer implementation and all the images will be generated and served at the new size.
  • Custom Shortcodes — A variety of custom WordPress shortcodes to replace features I had in custom tags on Movable Type. Hardly ever used anymore.

I also make use of a bunch of web-based services which are integrated via WordPress plugins:

All advertising content on the site comes from

Windypundit is hosted on a smallish Virtual Private Server from the folks at A Small Orange. I use a VPS instead of shared hosting because I host a number of other websites and because I wanted to learn about running a website on a VPS.

The server is running a standard LAMP stack:

  • Linux — The operating system. It’s the CentOS distro.
  • Apache — The web server.
  • MySQL — The relational database that holds the WordPress content.
  • PHP — The scripting language that ties it all together. WordPress is written in PHP, along with all of its plugins.

A few other bits run on the server as well:

  • nginx — Running as a reverse proxy for improved caching.
  • cPanel/WHM — The web-based hosting management software.

As I said, it’s actually a virtual server, meaning that A Small Orange has carved out a couple of processor cores and some memory from a larger server. There’s also a SAN array for disk storage. All hosted sites are backed up nightly to Amazon S3 and then rolled over to Glacier.

The servers and disk arrays are located in (I think) TierPoint’s a 68,000 square foot data center in Dallas, Texas. The data center is SSAE 16 SOC 1 certified, with fully redundant HVAC and fully 2N redundant power, with six backup diesel generators fed from a 50,000 gallon fuel reserve. The facility is multi-homed and carrier neutral with connections to Level3, Abovenet, TimeWarner, Suddenlink, Cogent, Global Crossing, and IP Transit. I’m sure there are Fortune 500 companies hosted in the same data center.

Only the main HTML page of Windypundit comes from that server. All of the embedded static components — images, style sheets, Javascript — are served from cached copies in 31 data centers all over the world in the Cloudflare content distribution network so that pages load more quickly.

This is, beyond a doubt, an insane amount of technology for a humble little blog full of libertarian ranting and occasional cat pictures.

The thing is, none of this is particularly special. All of you who have WordPress blogs are using mostly the same technology. You might not have Cloudflare or some of the other layers of caching, and you’re using a different theme and a different selection of plugins, but the rest is basically the same. You might not be as interested in the technology as I am, but we’re all using WordPress, we’re all running on some variation of the LAMP stack, and we’re all hosted out of a ridiculously over-spec’d data center.

On a personal bloggy note, this past week I’ve started seeing signs that Windypundit is once again beginning to draw a bit of attention. The tweet for my post about the attempt to shut down the Bronx Defenders because of a rap video was retweeted by a relatively large number of people, starting with Gideon and Scott, and that post has received about 25 times the traffic one of my posts usually gets. I’ve also received a couple of private “Thank you” emails from Bronx Defenders staff, and David Feige was nice enough to drop by in the comments.

And now I see that Radley Balko at the Washington Post has published an article on the Bronx Defenders’ troubles, and it includes a link to my post and an extended quote from it. That’s pretty cool — getting that kind of attention from a WaPo columnist. Strangely, however, the part of my post that Radley decided to use was my attempt to explore what the lyrics of Uncle Murda’s “Hands Up” really mean. Because, when you’re looking for insights into rap music, aren’t I the first person you think of?

Just call me MC Big Windz. And join me next week when I disclose yet another version of Big Sean’s Blessings, speculate whether Meek Mill’s second studio album can possibly repeat the rambunctious mayhem of his debut, and discuss rumors of an emerging 100th problem for Jay-Z.

According to WordPress, 2014 was a slow year, with only 114 posts, almost a quarter of which were on a Saturday. And for the first time in years, the #1 post was not my anti-Sprint rant. Instead it was my post about the protests in Ferguson. In addition to the usual social media outlets, I owe Popehat and Gamso for much of my traffic.

My most active commenter was Allison Williams, or rather, the Indian outsourcing firm to which Allison Williams’ internet marketing agency gave her link building subcontract. The second and third most active commenters were Scott Greenfield and Jack Marshall, which probably annoys both of them. Matt Haiduk and nidefatt round out the top five. The rest of you didn’t make WordPress’s list, but I thank you all…or at least all of you who are real people. It would get lonely here without you.

Furthermore, here at Windpundit, 2014 was the year in which:

That was most of it. See you all in 2015!

 

WordPress’s Jetpack plugin is a nice collection of features for bloggers. I host my blog on a server I pay for instead of on the big WordPress.com cluster because I appreciate the extra flexibility, but by using Jetpack, I can also get some of the more powerful cluster-based features, like improved search and uptime monitoring. I have also apparently been making use of a feature called Jetpack Comments, which provides a more elegant comment interface and allows users to authenticate through WordPress, Facebook, and Twitter.

Not that I get a lot of comments. Windypundit doesn’t have the readership it used to have, and I never really had an active commenter community. Lately, in fact, it seems I hardly get any comments at all, which has been kind of disappointing. I assumed people just weren’t finding my posts interesting enough to engage with.

Over the last couple of weeks, however, I’ve been writing about the events in Ferguson, Missouri, and traffic to my site has roughly doubled because of it. And still there were no comments, even though this was a highly controversial subject. That was suspicious. Could there be something wrong with comments on my blog? Would that explain why I haven’t received any comments in a while?

Yes, yes it would.

It turns out Jetpack comments work by replacing the entire comment entry section with something called an iframe, which is an HTML element for embedding a web page inside another web page, and so the comment form displayed at the bottom of my posts wasn’t generated by code running on my server, it was fetched from WordPress.com. And when the user types in a comment and submits it, the form is sent back to WordPress.com. I assume it’s then authenticated appropriately and submitted back to my server and displayed.

At least that was the theory. But when I launched an incognito browser window and used it to submit a test comment, for some reason the iframe filled with a cropped-down duplicate copy of the Windypundit web page, complete with animated banner, but all trapped in a box where the comments used to be. I don’t know where the comment went, but it never made it to my blog’s database.

So…maybe the lack of comments wasn’t due to my being boring after all. I wonder how long it’s been that way…

I assumed that problem was either that my theme design was missing some crucial element that makes Jetpack comments work, or that some other plugin was interfering with Jetpack, so I switched to the WordPress-provided Twenty Fourteen theme and I disabled every plugin except Jetpack. Essentially, I was running WordPress fresh out of the box. And still the problem didn’t go away. I don’t know, maybe it’s some weird Cloudflare thing.

I finally gave up. I put everything back the way it was and then disabled Jetpack comments, which seems to have fixed the problem.

I still wanted the social media connection, so I installed the Social Login plugin, which provides alternate authentication through lots of different social networks using the rather amazing protocol translation services provided by oneall. I almost immediately started getting spam comments, so I also dropped in the Growmap Anti Spambot Plugin, which supposedly checks for humanity by requiring you to check a box. I’m not sure why that can’t be automated, but I’ll give it a try.

I need to test this, so if you’ve read this far, please help me out by leaving a comment. Just say hi. Let me know that my blog software is no longer turning readers away.

Have you been seeing pop-up ads on my blog?

You see, a couple of days ago I was fiddling with some darned thing here in WordPress — I can’t remember what it was anymore — and I wanted to take a look at how the page lays out to ordinary people. As a logged-in WordPress user, I get extra features like a command bar at the top and special links I can click to edit content, and I wanted to get a look at the site without any of that. I could just log out, but then I’d have to login again to tweak it some more, and that’s a pain.

So instead, I launched the Chrome browser in “incognito” mode, which runs the browser with no cookies of any kind, as if it had never been launched before, which means my blog will treat it like just another random first-time visitor.

And the darnedest thing happened. Some kind of full-page ad popped up and completely covered the contents of my blog? At the top it had the words “A Message From Our Sponsors” with a link in the upper right corner labeled “Click to continue to site” or something like that. It was just like one of those full page ads you sometimes get when you follow a link to a big media property like Forbes.

What the fuck?

Just to be clear, I don’t put ads like that on my blog. I have an Amazon banner on the right side, and another one at the bottom of every article, and that’s it. Whatever this was, I didn’t do it.

I flipped on developer mode in the browser and took a look at another page. I wanted to see where it was getting content from. This page didn’t show the ad, but when I looked at the Network tab, I was in for another shock. The first file was the main HTML page, and the next dozen or so were the usual bits and pieces from the wordpress folder, bits and pieces of CSS, Javascript, and a handful of images. An awful lot of the rest of the files were stuff I didn’t recognize.

There’s always some of that on a page. If you use widgets or Javascript code to link to Twitter or Facebook or Amazon or Gravatar, the tiny stub of code you use to do that pulls down more code and other assets that it needs to function. But this went beyond any of that. It was hitting a horrifying array of unfamiliar sites:

  • specificclick.net
  • vindicosuite.com
  • yashi.com
  • demdex.net
  • nexac.com
  • bluekai.com
  • mookie1.com
  • spotxchange.com
  • turn.com
  • adadvisor.net
  • ib-ibi.com
  • doubleclick.net
  • scorecardresearch.com
  • adnxs.com
  • specificmedia.com
  • rubiconproject.com
  • invitemedia.com
  • btrll.com
  • collective-media.net
  • tidaltv.com
  • tubemogul.com
  • exelator.com
  • mathtag.com
  • dotomi.com
  • casalemedia.com
  • pubmatic.com
  • cpmaxads.com
  • advertising.com
  • criteo.com
  • adsrvr.org
  • veruta.com
  • wtp101.com
  • connexity.net
  • openadserve.com
  • insightexpressai.com
  • doubleverify.com
  • serving-sys.com
  • betrad.com
  • vizu.com
  • 2mdn.com

As near as I can tell, every single one of those is in some way associated with web advertising. Someone was using my blog to get credit for distributing ad content. At the very least, they were probably littering my visitors’ browsers with tracking cookies.

I’m truly sorry about that. This is an embarrassing discovery, and I apologize.

On discovering this, I got a little panicked. My mind went to a bad place: It was possible I had been hacked via malware in a WordPress plugin. WordPress and all its plugins and themes are built on PHP and PHP is wide open: Every WordPress plugin, every theme, has complete access to all the files in a web server account. If any one of the plugins is malicious, it can infiltrate itself into a WordPress installation in ways that are hard to remove. It’s like a virus.

You should never install WordPress plugins or themes from an untrustworthy source. The plugins and themes that you can find from within WordPress via the Add New button have been somewhat vetted by the community and are probably safe, but with the exception of a few reputable vendors, you should never download and install a theme or plugin from another web site. One study on a small sample of random free WordPress themes found that 100% of them had some kind of hidden code to insert tracking cookies or place hidden links on the site. Every single one.

I had been careful about adding plugins, but maybe I had made a mistake, or maybe one of them had slipped past the guards on the WordPress repositories. I began disabling plugins, starting with the most recent ones, and refreshing the page to see if the invading websites were still there. Eventually I got down to just a handful of trusted plugins — Google, Amazon, stuff everyone installs — and the ads were still there.

The culprit turned out to be one of the oldest things I had ever put on my site: The Site Meter badge. I’ve had that thing on my website since before WordPress. I think I even had it before MovableType, back when I was hosted on Blogger. At the time, every blogger in the world used Site Meter to track their stats, and I was no exception.

Site Meter didn’t press their advantage, however, and they didn’t keep up with the times. Their simple counter and statistics are no match for a modern powerhouse like Google Analytics or Woopra. Apparently, at some point they just gave up trying to monetize stats and started just pushing out all kinds of advertising crap. Given how many advertising companies they’ve sold my site to, I assume they’re trying to squeeze out as many tiny fractions of a penny as they can.

(This was actually good news, since Site Meter was just a <script> tag I had embedded in the footer. It didn’t run any PHP code on my site, so it couldn’t have corrupted anything else. It all ran in the relatively safe sandbox of my visitors’ browser.)

I Googled around about this, and apparently everyone else noticed the problem a few years ago. I wasn’t paying attention, and I missed it. I guess that’s because I don’t really use Site Meter any more, and I would have abandoned it, but…it was the oldest counter I had on the site, and I was enjoying watching the numbers climb. It used to go up pretty fast, and I would have reached my first million several years ago, but my site statistics took a dive about 6 years back when I was busy taking care of my parents.

According to Site Meter, I still haven’t quite reached my first million visits. The counter currently sits at 999315.

And there it will stay.

WordPress likes to send out a year-end review to all their JetPack users. Most of it is routine statistics I can find anywhere, but a few items are amusing.

On my busiest day, my most popular post was “Yet Another Tale of the Awful, Awful People at ICE”. I think this line captures the essence of the post:

The sad thing is that if she had freaked out and, say, gouged out one of his eyeballs with a pen so she could make her escape, some prosecutor would have tried to make it seem like she was the bad guy.

On the other hand, the post with the most views all year was the perennial winner, “Fucking Sprint!!!” which continues to receive comments 8 years later, as does my 4th most popular post, “Sallie Mae is a Nightmare”. The 2nd most popular post was Never Get Busted Again, Volume 1: Traffic Stops, my review of Barry Cooper’s DVD, the 3rd most popular was “Yet Another Tale of the Awful, Awful People at ICE” again, and the 5th was my advice piece, “I’m Sorry, But I Simply Can’t”.

Then there are the top search terms:

  1. sw 500
  2. fuck sprint
  3. michelle malkin bikini
  4. michelle malkin nude
  5. sw500

“Fuck Sprint” is self explanatory — I’ve been at or near the top result for that for years — and the Michelle Malkin stuff is probably because of this, which has attracted some vitriolic comments from people who didn’t get the joke. I think the SW 500 stuff is probably because of this post by J-Dog.

In any case, here at Windypundit, 2013 was the year in which:

See you next year.

In keeping with tradition, I wanted my blog to have the most severe and over-the-top Terms & Conditions possible, with robust protection against spammers, lawsuits, and intellectual property theft.

I could have engaged a lawyer to write them for me, or I could have researched it on the web and tried to write my own, but those options seemed to be too expensive and too much work, respectively. So I took the easy route and stole my T’s & C’s from intellectual property lawyer and First Amendment Badass Marc J. Randazza.

Just change a few names and…done. I’m sure that’s good enough for my purposes. Probably the biggest change in content was to the blacklist.

It’s mostly boring legalese, but if you haven’t seen Randazza’s original, here are the Terms and Conditions For This Website…which you really should have read and assented to before getting this far anyway.

The Popehat folks are far better at this sort of thing, but let me give it a try. I got this email with the subject “Content Partnership”:

Hey there Mark,

My name is Sladen West and I wanted to discuss the possibility of some sort of content partnership with you and windypundit.com. I do a lot of writing in the automotive space and thought my articles could be a great fit for your readers. I don’t expect any kind of payment for articles. I just want to get my name out there. Anything I write for windypundit.com will be original of course.

If you’d like, you can take a look at some articles I have just recently published here:

Why Women Are Probably Better Drivers
Gas Saving Myths Debunked

7 Tips To Ignore If You Want To Be A Better Driver

If you’re interested, I would love to contribute something like an article a month on topics such as driving safety, car maintenance, and various drivers tips and advice. Thanks so much for your time, Mark!

– Sladen West

My response:

Hi Sladen,

First of all, the second link is broken, or rather, it goes to the same place as the first. Actually, it looks like the two lines are combined into a single link.

Second, do you really think your articles would be a great fit for my readers? Because I don’t think you’ve actually read my blog.

I don’t really write much about cars. I mostly do rants about civil liberties and criminal law, usually from a libertarian perspective, which means that I want to legalize not only marijuana and gay marriage, but also prostitution and heroin. I also swear a lot, and I once wrote that Sheriff Joe Arpaio is like Hitler.

As for my readers, at least one of them is a former prostitute and madam, and several of them earn their living by defending accused murderers and child molesters. (And God only knows what drugs my readers use.) It’s not that they’d be against defensive driving…but that’s not really why they’re here.

Yours is one of the nicest and least slimy can-I-write-for-your-blog emails I have received, but I don’t think your topics are really a good fit here, and I don’t think you really want to be associated with the likes of us either.

It would probably be best for both of us if I just said “No, thank you.”

— Mark Draughn

For some reason, I just didn’t feel like ignoring this one. I hope that is sufficient explanation.

I’ve decided that running April Fools Day prank posts is not a good idea for me. I let Eric Turkewitz talk me into it last year, and the only people I fooled were my loyal readers. That’s not the relationship I want to have with them. I don’t expect readers to agree with me or even like me, but (except for obvious sarcasm or hyperbole) I want them to know I believe what I say here.

Even Eric, who has pulled off some legendary pranks, seems to be giving it up this year, although this could be part of some super-subtle meta-prank I just don’t get.

Here at Windypundit, 2012 was the year in which:

See you in the nNew Year!

This is it, the new WordPress version of Windypundit. There are lots of tweaks and changes still to come, but I think I can go ahead and make the switch now and worry about the details later.

Some of you may have noticed that Windypundit was off the air for a while. Yeah, well, it turns out the switchover didn’t go quite the way I planned.

About a week and a half ago, the hosting company I’ve been using for Windypundit sent out a message saying they would be moving some of my other domains to a new server in December. The message included this warning:

On the new servers PHP 5.3.x is going to be running, so please take the measures to ensure that you have your software up to date to avoid any conflict with the new PHP version, take in mind that we can’t keep PHP 5.2.x running in our servers since is already EOL.

Then last weekend they sent out another message about the server that I was using for Windypundit:

Emergency Server Migration – Server 34 – Do not ignore this email.

Server 34 is using a OS version that’s not longer supported by Cpanel, for that reason some package are not being updated and is causing troubles on the DNS and making the server fail. We are going to make an emergency migration tonight to a new server.

A little later, the Windypundit site failed, displaying a variety of PHP errors. My guess is that the emergency server migration put Windypundit on one of the new servers with the new version of PHP. As I’ve mentioned, I lost the ability to upgrade Movable Type long ago, and I guess the version I’ve been using isn’t compatible with the new version of PHP.

Despite all the nice things I’ve said about my hosting provider, I’d been having some problems in the last couple of weeks, and I had been exploring the possibility of switching to a new provider. Having the blog go down like this was the last straw. I’ve moved Windypundit and all my other sites to a hosting company called A Small Orange, about which I’ve heard good things. (Low-cost shared hosting is something of a crapshoot, so we’ll see.)

I’ve spend most of my spare time in the last week on a crash program to finish the port to WordPress. I had to make a few more changes to the conversion tool I wrote to handle some problems with implied paragraphs, and I had to make a bunch more changes to the CSS. I spent a lot of time just paging through Windypundit posts for the first four years and the last two years, one after another, looking for problems.

I discovered a few pages that are messed up. Many of them can be fixed with more CSS changes or by editing individual posts. I haven’t found any new systematic problems caused by the port process in the last few hundred posts I checked, and the site passes all of the automated tests I ran, so I decide it was time to put Windypundit back online.

I still have a lot of changes to make — enabling comments, configuring statistics, tweaking the theme — but I think I’m ready to go live. Here’s hoping it all works.

Update: I want to make sure comments are working, so it will help if a few of you will leave a comment.

I’ve been blogging about the experience of moving Windypundit from Movable Type to WordPress. In Part 1, I described the process I developed for moving over all the posts. In Part 2, I talked about the development of the new Windypundit site.

The one thing I still had to do was to re-create the blogroll in WordPress.

I suppose I could figure out a way to port it over like I did the posts, but I think creating it by hand is probably faster. Besides, I need to do some blogroll maintenance. I think I’m going to simplify the categories. I also need to update it to reflect a few changes in the neighborhood.

I’ll start with the new additions, which will appear in the blogroll after the switchover.

appellatesquawk

A hilarious blog about appellate law. No, really.

Cafe Hayek

An occasional stop of mine for random economics quotes and other bits in the Hayekian tradition.

Centives

Links to fantastic stories with economic angles. Worth a read even if you don’t think you’re interested in economics.

Accursed Farms

Ross Scott’s amazing Half-Life themed machinima.

The Ancient Gaming Noob

“Wilhelm Arturus” writing about multi-player online gaming.

Booker Rising

Shay Riley, black libertarian. No, really.

Charlie’s Diary

Science fiction author Charles Stross. Lots of thought-provoking stuff here.

That’s all the new people for now, but I’ve got some tidying up to do, and a few removals. Among other things, I’m going to make some of the descriptions a little more, er, descriptive.

Virgina Postrel

Virginia used to be the editor-in-chief of Reason magazine and her editorials were a huge influence on my thinking. She’s one of the reasons I started blogging. She’s not terribly prolific these days, but always worth reading. She actually calls her blog “Dynamist Blog” so maybe I should use that name too.

StrategyPage

An interesting source of information about warfare and national defense, but not a read-every-article blog. It moves to the Resources section.

StoptheDrugWar.org

The blog on that site is actually called “The Speakeasy Blog” so I should probably change it.

The D’Alliance

Seems to be down. I’m striking it from the list.

Vigil for Lost Promise

This is Pete Guither’s site listing victims of the War on Drugs. It’s his response to a DEA advertising campaign. It used to be a separate site, but it’s now part of his main blog, and it looks like no one has been added to the list lately. I’m going to remove it from the list.

Not Guilty

That’s Mirriam Seddiq’s blog. She writes well about legal and social issues but not nearly enough. I didn’t tag her blog “A lawyer in search of a clue” because she’s stupid but because for a while she was always blogging about how she couldn’t figure out what to do with her life and career. Now that she seems to have settled in as both a solo lawyer and a mother, I should probably change it.

Marc Randazza

Randazza also gets a new tag. I believe his official title is now “First Amendment Badass.” And I should use the real name of his blog, which is Legal Satyricon.

Blonde Justice

Sigh. The Blonde One has been missing in action since March. But if Gothem can wait 7 years for Batman to return, I can wait a little longer for Blonde Justice.

South Carolina Criminal Defense Blog

This one’s apparently named “Trial Theory” now.

Google Blogoscoped

Philipp Lenssen has been off the air for over a year now, but he’s a friend of the blog and he helped me with some ideas in the early years, so he stays a little longer.

Steve Landsburg

Landesburg’s blog is actually called The Big Questions, and I should link to the blog not the main page.

Megan McArdle

Megan has moved to Asymmetrical Information at the Daily Beast, so I’ll link to her there.

That’s it. The blogroll is done.

And that’s the last major hurdle for porting the blog. Just a few more tweaks, and I’ll run the port process one more time and switch it over. Maybe over Thanksgiving.

If you’ve been following along, you know I’ve given up on Movable Type as a blogging platform, and I’m moving this blog to WordPress. In Part 1 I explained why moving 10 years of blog posts with minimal link breakage was a lot of work.

Of course, I’ll have to have some place to move to. Which means I’ll have to design and set up another Wordpress blog. I managed to teach myself a bit about WordPress when I set up the Nobody’s Business group blog, and I liked what I saw of the technology. WordPress is easy to use and has a very active developer community, so I think it’s safe to assume it will be supported for a long time.

If you want to set up a WordPress blog, you have a choice of two broad solutions: You can let WordPress host your blog for you at wordpress.com, or you can download a free copy of the WordPress software from wordpress.org and install it on your own web host. I prefer the latter route because it gives me more flexibility. It does mean, however, that I need a place to host a blog.

Obviously, I already have hosting, but I decided to get a new hosting account so I could more easily set up the new website without interfering with the old one. And it’s just cleaner to start from scratch, so I can move over only the files I’m using and leave the junk behind. Windypundit had accumulated a lot of junk in its directory tree over the years.

The kind of hosting I need is a basic LAMP stack. That means Linux operating system, Apache web server, MySql database, and PHP programming language engine. This is far and away the most common hosting environment, and it will run WordPress and tons of other software I might want to install. I also want cPanel management tools so I can administer my account over the web without having to get a command line.

For now, I’ve decided to use a new shared hosting account from the provider I currently use, Downtown Host. I’ve had a few problems with stability over the years, but I see that as the natural flipside of their flexible configuration policies. (I.e. I’m paying the price in stability for flexible configurations of other websites on the same server.) The folks at Downtown Host have usually responded to service tickets fairly quickly, day or night, and they’ve been helpful when I had some minor special requests. I’ve tried using larger (and presumably more stable) hosting companies in the past, but they’ve been rigid and uncooperative.

I could probably get more flexibility with a virtual private server (VPS), but that’s more expensive. And more work. I’d have to install software updates regularly and clean up messes. There are tools to make that easier, but I’d have to learn what they are and how to use them. (And the tools change depending on which Linux distribution you install — CentOS, Debian, Ubuntu, Gentoo, Fedora — and I haven’t got a clue how to choose a distro wisely.) Ultimately, that’s just not a learning curve I want to follow. I guess I could avoid all that with a fully managed VPS plan, but that’s even more expensive. Windypundit just isn’t big enough to need all that.

Since my account allows me to host as many sites as I want for one monthly fee, subject only to storage and bandwidth limits, I’ve been consolidating some of my other sites onto the new server, in what I’m calling the Windypundit Media Empire. As I write this, Windypundit is still on the old server, and it will stay there until I finish the port to WordPress. The old server also has a test WordPress installation that I’ve been using to develop the migration process I described earlier.

After choosing WordPress itself, probably the most important decision I had to make was which WordPress theme I was going to use. I don’t want to just use a pre-built theme because I want Windypundit to have a unique appearance. On the other hand, I really didn’t want to build a theme completely from scratch. It’s a lot of work, and I’m not familiar with how to do it for WordPress. So what I really needed was a highly customizable theme. Or even better, a theme framework.

All WordPress themes are built from an HTML page layout with snippets of embedded PHP code that call into WordPress to fill in content such as posts, comments, and widgets. There’s also some CSS to style the HTML and a bunch of assets such as header images and custom artwork.

As far as WordPress is concerned, a theme framework is just another theme, but to a blog owner it’s a kind of construction kit for building themes out of template page layouts, template CSS, and a library of PHP code that can be used to implement certain common features. Often these frameworks come with substantial user interface tools that makes it easier to design themes without a lot of code.

Many theme frameworks are commercial products developed by professional WordPress design companies. Probably the most well-known of these is Thesis by DIYthemes, which is used by a lot of professional web designers to quickly create blogs and websites without a lot of coding and with results that are, frankly, pretty damned good. Jamison Koehler’s law firm web site is one of the most attractive blogging sites I know, and it’s built on Thesis.

Another big-name framework is Genesis by StudioPress. Genesis is oriented more toward using WordPress as a Content Management System for commercial web sites than for blogging. Unlike Thesis, Genesis isn’t an out-of-the-box theme; it’s intended as a base from which designers will build usable child themes. For example, Rick Horowitz used a Genesis child theme when he built his lawfirm website.

There are lots of other commercial theme frameworks out there such as PageLines, Carrington, Woo, Elegant Themes, and such drag-and-drop wonders as Ultimatum and Headway. They differ in the amount of HTML and CSS you have to write, whether they have built-in SEO or use a plug-in, security, support for designer and developer workflow, and dozens of other criteria. It’s a fascinating software niche, and if I were a professional web designer I’d probably license a few of them for building websites for clients.

But I’m not a professional web designer, I’m an amateur blogger who is also a professional web programmer. The difference being that I work with HTML and CSS and program code all the time. I’m used to it, and I like tinkering with code. So I prefer a framework that makes theme programming easier rather than one that eliminates theme programming entirely.

When I built Nobody’s Business, I used the Thematic framework because it was a popular free framework. It worked fine and I have no regrets. However, when I started thinking about porting Windypundit, I investigated a whole bunch of frameworks, and this time I settled on Hybrid because it seemed to have better documentation, a more active user community, and more recent releases (although I see that the folks at Thematic have been busy lately).

Hybrid actually comes in three layers. First, there’s Hybrid Core, which is essentially a PHP library for implementing WordPress themes. It can’t be used as a theme, but if you want to build a theme from scratch, Hybrid Core should make it easier.

The second layer is the Hybrid theme, which is full WordPress parent theme built using Hybrid Core. It’s a relatively simple basic theme, but you can use WordPress’s support for child themes to create a more interesting theme that provides more interesting versions of parts of the Hybrid theme. Hybrid’s creator, Justin Tadlock, has also released a bunch of other themes built on the Hybrid framework, several of which also come with child themes.

Finally, there’s Skeleton, which is a child theme of the parent Hybrid theme. It has all the CSS selectors laid out for you to fill in. You can just grab a copy of it, rename it, and start customizing it for your site. Which is exactly what I did. Unimaginative as I am, I call the result WindySkeleton.

Of course, I couldn’t resist the urge to make a few small modifications. I’m not overly fond of maintaining raw CSS, so I added a PHP version of the SCSS preprocessor to the site, copied all the Hybrid CSS theme files and renamed them to end in “.scss”, and wrote a small PHP front end page to compile and render CSS from the SCSS files. This makes modifying Skeleton even easier.

As for the actual site design, I’m not very good at that part, so I like to keep it simple, sticking to a very traditional blog layout. And my approach to choosing colors is as simple minded as picking out a nice banner photo and reusing some of its colors in the site design. Thus, the current design uses a lot of blue because the banner photo of the Chicago skyline has a lot of blue sky.

Originally, I wanted the new design to have even more of an old-school journalism feel to it than my current design, but I managed to derail that plan when I got it into my head that I wanted to use a night photo in the banner. That ended up taking the design in a very different direction. When I make the switch, you won’t recognize the place.

I’m also in the process of teaching myself Javascript and jQuery, so I decided to add a little animated flourish when the page loads. At the moment it looks kind of cool, but I expect it will get tiring. It’s basically the modern web version of the <blink> tag that made Geocities into such a hellscape.

I’ve still got a bunch of details to attend to before I cut over to the new site. For example, I almost forgot that Movable Type keeps its feeds in different locations than everybody is used to with WordPress. This meant I had to add redirection rules to send feed readers seeking the old feeds to the new locations. I’ll probably still put up a final post on the old site that warns people that they’re subscribed to an old feed. The site will be down so they won’t be able to read it, but it will be the last thing in the dead feed.

In addition, I’ve still got to add a bunch of other sidebar items to the blog (recent posts, archives, search, etc), create a robots.txt file (maybe), and tweak the design a bit more. Then, once I switch over, I’ll have to re-run the verification programs one more time, turn on comments, and enable all the statistics tools. I plan to use the same ones I have here — Sitemeter (which I consider to have the official authoritative visitor count), Google Analytics (the most useful), and Woopra (the cool new thing) — along with WordPress jetpack, but I don’t want to turn them on until the site is actually live.

And somewhere along the way, I’ll have to re-create my blogroll. That will be the subject of my next post about the move to WordPress.

As I mentioned in a previous post, I am done with Movable Type, and I’m in the process of moving this blog to WordPress. That turns out to be a lot easier said than done.

(Warning: Much technical computer geekery ahead.)

WordPress has a Movable Type import tool, but It doesn’t solve what I consider to be the most important problem: It doesn’t preserve permalinks. Over the years, many people have linked to posts on Windypundit — Google webmaster tools reports over 90,000 inbound links — and I want to make sure that as many as possible continue to work. Not only is it basic ettiquette not to break links, but having working links to my site is also important to maintaining what passes for my search engine rankings.

I searched for other tools and methods but I couldn’t find anything that would solve the permalinks problem without introducing other problems of some kind. Eventually, out of frustration and a desire to learn new tricks, I decided on a rather crazy course of action: I decided to port my blog to WordPress by writing my own program.

I chose to write it in C#, mostly because I’d just been hired for a job programming in C# and I figured writing the importer would be a good way to learn C#. Also, I already owned Microsoft Visual Studio 2010 Professional and I’m comfortable with the development environment.

That was a year ago. I got in enough C# practice that I had no trouble when I started the job, but I kind of lost interest in writing the blog migration tool. Development slowed to a crawl.

A few weeks ago, however, I started getting a lot of spam comments that weren’t being caught by my anti-spam tools. Knowing that I’d have much better tools if I switched to WordPress, I decided it was in my best interest to get the job done.

The main tool I wrote is the BlogMigrator, which pulls all of my posts out of Movable Type and generates a WordPress extended RSS (WXR) file suitable for import into WordPress.

It starts by downloading all the posts from the current Windypundit website, which it does by connecting directly to the MySql database that Movable Type uses to store all my posts. It queries to get all the authors, categories, posts, and comments in the blog, including all pages (such as the About pages) and all draft posts loaded into an ADO.NET dataset, which it saves to a file. From then on, the migrator just loads blog posts from the file, avoiding the time consuming download from the server. If I want the latest stuff instead of my locally cached copy, I just delete the DataSet file and the BlogMigrator downloads a new one next time I run it.

Next the program builds an in-memory model of the Movable Type blog, including all authories, categories, blog entries, template maps, and comments. This is a fairly mechanical process. After that’s done, it iterates over all the blog entries and generates a report of the location and publishing status of every post. I use these reports (and others) as input to the iterative development process.

The next step is to traverse the Movable Type blog model and build a matching WordPress blog model. All of the basic concepts are the same, but there are a lot of little details that change, including the names of the data fields, and I try to follow the naming conventions of each blog technology as much as I can. (E.g. Movable Type author have single name field, WordPress authors have first and last names.) Among the steps of the conversion are splitting the author name into first and last names, generating unique IDs for each post, merging main and extended post text, converting the URL format for the post, and converting from local time to universal time.

The next pass looks into the actual content of each post and uses the HTML Agility Pack to analyze the HTML and catalog every element and all the class and style attributes. It also generates reports of which posts each of those items is used in. I’ve been using those reports to make iterative modifications to the program. For example, some posts have embedded HTML class junk that was introduced when blog authors cut-and-pasted from Microsoft Word. By finding these classes in the reports, I was able to modify the program to strip them out. I have a whole collection of whitelists, blacklists, and replacement tables.

In other cases, where the reports showed that a strange class or misspelled element was used in only a handful of posts, I’ve just gone back in to the blog on MovableType to fix the problem at the source, which is easier than adding code to fix the problem. Then I re-import the database and re-run the BlogMigrator to confirm the problem is fixed. (For problems fixed by the tool, there are before- and after-cleanup reports so I can verify the problem is gone.)

Another thing I had to handle was custom tags. MovableType allows you to create custom HTML-like tags for your blog by writing a little PHP and/or Perl code, and I had built a few of them over the years. I had to modify the HTML Agility Pack to recognize them as legitimate tags, and then I had the BlogMigrator replace them either by generating raw HTML or by re-writing them into custom WordPress shortcodes (which are a similar concept to MT’s custom tags). Fortunately, I was mostly able to implement the shortcodes by reusing the PHP code I had already written.

The BlogMigrator also catalogs all the links and images in each post, generating CSV files that I can view in Excel. Each link is identified as internal — back to Windypundit — or external depending on the hostname. Internal links are further classified by checking whether they point to a known Windypundit post URL or something else, such as an image or sound file.

I then have a separate AssetDownloader program that reads in these report files and downloads all the assets on the site and builds a directory structure for them. It filters out file types that are not static assets, such as links to .php files. I can upload all the files in that directory to the new website so the internal links work, although the program rewrites them with a new top-level subdirectory so they won’t collide with the new blog’s native assets. It also cleans up problems like replacing spaces in the URLs with underscores.

I then have a third program, the Probulator, that reads the link report, rewrites every link to point at the new blog, and tests the link to make sure it works. The first time, it found 61 broken internal links, due to basename shortening, badly formed URLs, embedded spaces, and so on. I went back to the fix (or remove) the links.

It also found a couple of dozen links that were broken because my program was using the date of publication of a post to generate the URL and Movable Type was using the date of creation (or vice versa, I can’t remember).

The Probulator also tries to download a copy of every blog post by using its original URL — except for the hostname, which it rewrites to refer to my test server. This serves as a test for broken links, and it also provides local copies for further analysis by two more programs.

The SanityChecker program examines each blog post for odd bits of HTML that might not format correctly. For example, all post content should be a list of a limited set of tags — <p>, <ul>, <ol>, <h5>, <h6> — or <blockquote>, which should contain a list of the same set of tags. The program reported anything that did not match that pattern. This uncovered a flaw in my implementation of one of the shortcode replacements for custom tags. It also found a bunch of posts which had been authored using mangled HTML. (God bless Joel Rosenberg’s memory, but he didn’t know a damned thing about HTML.) I had to go back to the Movable Type version of the blog to clean those up before importing.

The last program is the LinkVerifier, which finds all the links in every post and makes sure that all the internal ones still work. A few of the links are to blog-engine-specific resources, such as category archives and author about pages, that can’t be easily mapped. I’ll keep a list of those so I can go in and fix them later.

At this point, my process for a full import goes something like this:

  • Delete the database cache file (if I’ve changed something on the Windypundit site and I want to re-download everything).
  • Run the BlogMigrator.
  • Run the AssetDownloader.
  • Zip up the downloaded assets, upload them to the new blog host, and extract them into the proper directory.
  • On the new blog host, restore a backup of the WordPress database that has all the configuration items set but doesn’t have any posts in it.
  • Use the WordPress importer to upload the WXR file from the BlogMigrator that contains all the posts.
  • Hit the blog homepage to verify that it’s working.
  • Run the Probulator, check the reports for missing items.
  • Run the SanityChecker, check the reports for problems.
  • Run the LinkVerifier, check the reports for problems.
  • Go fix some problems and try the process again.

I’ve pretty much been doing that in my spare time for the past couple of weeks, and I think I’m almost done. I’ll probably roll out the live site in the next few days.