WordPress – Automatically publish a pending post each weekday morning from a PHP script

In an earlier post, Piwik Web Analytics, and some unfortunate stats about my blog, I mentioned that the Feedburner stats for this blog show a relatively high subscribe/unsubscribe rate for this blog. I think a large part of that is my tendency to blog in spurts, and even worse, my tendency to write drafts and not publish them. In an effort to combat this, I’ve been trying to finish blog posts and then set them to “Pending” status, and go back and publish one every day (well, every day that I have some still sitting unpublished). Of course, that counts on me logging in to WordPress every day, which isn’t something I do. The following script is, at least for now, the answer for me.

This script (a standalone PHP script) uses wp-load.php to load the wordpress environment, and then finds the oldest post with a given status (“pending” in my case) and attempts to publish it. It only does this if there has not been another post published in the last 24 hours. The following script can be found in subversion at http://svn.jasonantman.com/misc-scripts/wordpress_daily_post.php:

#!/usr/bin/php
<?php
/**
 * wordpress_daily_post.php
 * Script to publish the oldest post with a given status, if no
 * other post has been published in 24 hours. Intended to be run
 * via cron on weekdays.
 *
 * Copyright 2012 Jason Antman 
 *
 * Licensed under the Apache License, Version 2.0 
 *
 * use it anywhere you want, however you want, provided that this header is left intact,
 * and that if redistributed, credit is given to me.
 *
 * It is strongly requested, but not technically required, that any changes/improvements
 * be emailed to the above address.
 *
 * The latest version of this script will always be available at:
 * $HeadURL: http://svn.jasonantman.com/misc-scripts/wordpress_daily_post.php $
 * $LastChangedRevision: 40 $
 *
 * Changelog:
 * 2012-09-03 Jason Antman  - 1.0
 *  - first version
 */
 
# BEGIN CONFIGURATION
define('WP_LOAD_LOC', '/var/www/vhosts/blog.jasonantman.com/wp-load.php'); // Configure this to the full path of your Wordpress wp-load.php
define('SOURCE_POST_STATUS', 'pending'); // post status to publish
# END CONFIGURATION

$VERBOSE = false;
$DRY_RUN = false;
array_shift($argv);
while(count($argv) > 0) {
  if(isset($argv[0]) && $argv[0] == "-d" || $argv[0] == "--dry-run"){
    $DRY_RUN = true;
    fwrite(STDERR, "DRY RUN ONLY - NOT ACTUALLY PUBLISHING.\n");
  }
  if(isset($argv[0]) && $argv[0] == "-v" || $argv[0] == "--verbose"){
    $VERBOSE = true;
    fwrite(STDERR, "WP_LOAD_LOC=".WP_LOAD_LOC."\n");
    fwrite(STDERR, "SOURCE_POST_STATUS=".SOURCE_POST_STATUS."\n");
  }
  array_shift($argv);
}
 
$_SERVER['HTTP_HOST'] = 'localhost'; // needed for wp-includes/ms-settings.php:100
require_once(WP_LOAD_LOC);
 
# check that we're running on a weekday
if(date('N') >= 6) {
#  if($VERBOSE){ fwrite(STDERR, "today is a saturday or sunday, dieing.\n"); }
#  exit(1);
}
 
# find the publish date/time of the last published post
$published = get_posts(array('numberposts' => 1, 'orderby' => 'post_date', 'order' => 'DESC', 'post_status' => 'publish'));
$post = $published[0];
$pub_date = $post->post_date;
$pub_id = $post->ID;
 
if(strtotime($pub_date) >= (time() - 86400)) {
  if($VERBOSE){ fwrite(STDERR, "last post (ID $pub_id) within last day ($pub_date). Nothing to do. Exiting.\n"); }
  exit(0);
} else {
  if($VERBOSE){ fwrite(STDERR, "Found last post (ID $pub_id) with post date $pub_date.\n"); }
}
 
 
# find the earliest post of status SOURCE_POST_STATUS, if there is one.
$to_post = get_posts(array('numberposts' => 1, 'orderby' => 'post_date', 'order' => 'ASC', 'post_status' => SOURCE_POST_STATUS));
if(count($to_post) ID;
$to_pub_date = $post->post_date;
$to_pub_title = $post->post_title;
$now = time();
$new_date = date("Y-m-d H:i:s", $now);
$new_date_gmt = gmdate("Y-m-d H:i:s", $now);
 
if($VERBOSE){ fwrite(STDERR, "Post to publish: ID=$to_pub_id DATE=$to_pub_date NEW_DATE=$new_date TITLE=$to_pub_title\n"); }
 
# actually publish it
if(! $DRY_RUN){
  $arr = array('ID' => $to_pub_id, 'post_status' => 'publish', 'post_date' => $new_date, 'post_date_gmt' => $new_date_gmt);
  $ret = wp_update_post($arr); // publish the post
  if($ret == 0) {
    fwrite(STDERR, "ERROR: Post $to_pub_id was not successfully published.");
    exit(1);
  }
  if($VERBOSE){ fwrite(STDERR, "Published post. New ID: $ret\n"); }
}
else {
  fwrite(STDERR, "Dry run only, not publishing post.\n");
}
 
# check that the post really was published
$published = get_posts(array('numberposts' => 1, 'orderby' => 'post_date', 'order' => 'DESC', 'post_status' => 'publish'));
$post = $published[0];
$pub_date = $post->post_date;
$pub_id = $post->ID;
$pub_title = $post->post_title;
$pub_guid = $post->guid;
 
if($pub_title != $to_pub_title) {
  fwrite(STDERR, "ERROR: title of most recent post does not match title of what we wanted to post.");
  exit(1);
}
 
fwrite(STDOUT, "Published post $pub_id at $pub_date\n");
fwrite(STDOUT, "Title: $pub_title\n");
fwrite(STDOUT, "\n\n\n GUID/Link: $pub_guid\n");
fwrite(STDOUT, "\n\n".__FILE__." on ".trim(shell_exec('hostname --fqdn'))." running as ".get_current_user()."\n");
 
?>

You’ll need to set WP_LOAD_LOC (line 29) to the full path of your WordPress installation’s wp-load.php (it should be in the top-level directory of your WordPress installation. I run this script from cron like:

0 6 * * 1-5 /home/jantman/bin/wordpress_daily_post.php --verbose # publish WP pending posts daily

so that it runs at 6AM (local time) each weekday. Assuming you have cron setup to send you mail, you’ll get a daily message saying what was (or wasn’t) done.

New Blog Theme

In a follow-up to my Some Thoughts on Choosing a New WordPress Theme post from a few days ago, I decided on the Admired theme by Brad Thomas. It’s amazingly full-featured and has a good set of options. I had to manually change a few things in the CSS (I wanted to tweak the top bar colors a bit in a way that’s not supported in the options), but overall it was a very simple transition. While it’s unfortunately very far from valid HTML or CSS, it seems quite nice.

If you happen to read this post and see anything wrong with the theme, or it doesn’t display properly for you, please leave a comment below (with browser version and OS, if you please).

My next project, continuing on from my Inaccuracies in Google Analytics for Website Stats post, is to compare the two self-hosted JavaScript-based open source Google Analytics alternatives I’ve identified (Piwik and Open Web Analytics) and try one out on my site (keeping in mind that my server is pretty heavily loaded, and I don’t want to push it over the edge). Once I come to some sort of conclusion on that, I’ll get back to some useful posts.

Some Thoughts on Choosing a New WordPress Theme

I think I’m going to choose a new theme for my blog. The current theme is iNove (albeit an older version with some custom modifications), and I feel like it looks a bit messy and has gotten a bit cluttered, so it’s time to find something new. I like the 2-column layout, and have a few other things I’m looking for – specifically, aside from something with advanced features like lots of widget support and hooks, something that has good visual separation between different posts and widgets. I also really want something, if possible, with relative column widths. My current home and work desktops both have dual monitors, and the minimum resolution I have on one screen is 1920×1080. When I look at my blog in a maximized window, about half the screen width is wasted with empty space. So, ideally, I’d like a theme that’s based on relative widths, probably with a “min-width” property so it wouldn’t get compressed to an absurdly narrow width on small screens.

I use Google Analytics (as noted in the privacy policy) for visitor statistics on this blog (more about that in a moment). So, I took a peek at the breakdown of visitors by screen resolution, and saw that for the past year, 94% of the 27,500 visits had a screen width of 1024px or more (and the majority of the others looked like mobile device resolutions, so they’d probably zoom the page correctly). So, my first gut reaction was to assume that I could use a theme approximately 1000px wide. Unfortunately, there’s two main problems with that: first, as mentioned by Chris Coyier on CSS-Tricks.com, just because someone has a given screen resolution doesn’t mean their browser window (let alone the viewport) is that size. As a matter of fact, I usually have my main browser window set at about 80% of the width of one of my monitors, with my instant messaging client Pidgin taking up the rest of the space. So there’s one inaccuracy. There’s a potentially much greater inaccuracy in my stats as well, which I’m going to discuss in a separate post.

Nagios / Icinga Configuration Highlighting with GeSHi

As you may know from former posts, this blog (WordPress-powered) and a few MediaWiki sites that I have use the excellent PHP-based GeSHi syntax highlighter. Today I was writing a post that includes some Icinga (Nagios) configuration snippets. After a quick search, I found a Nagios language file for GeSHi on GitHub. Thanks very much to Albéric de Pertat (adepertat) for writing this and providing it to the public.

WP-Syntax Plugin GeSHi Path Fix

The Wp-Syntax plugin for WordPress provides syntax highlighting for WordPress blogs via the GeSHi PHP syntax highlighter. Unfortunately, the plugin includes a builtin version of GeSHi (currently 1.0.8.9) in geshi/. As a result, not only are users of the plugin not instructed to use the latest version of GeSHi, but it won’t use a host-wide GeSHi installation that’s already in the PHP include path (i.e. /usr/share/php/), like the the many php-geshi packages offered by repositories including EPEL (for Fedora, CentOS and RHEL).

The fix is quite simple. Just open wp-syntax.php in the wp-syntax/ plugin directory in your favorite text editor and change the GeSHi include line (for WP-Syntax 0.9.12, this is line 53) from:

include_once("geshi/geshi.php");

to:

include_once("geshi.php");

If you already have GeSHi installed in the PHP include path, just remove the geshi directory in your wp-syntax/ plugin directory, flush the WordPress caches (if any), and load a page which uses GeSHi – it should now use the host-wide version. If you want to still use a local version for wp-syntax, you can move things around to where they should be in the wp-syntax/ plugin directory:

mv geshi/geshi.php . && mv geshi/geshi/* geshi/ && rmdir geshi/geshi

Note – if you’re in a shared hosting environment, or are otherwise not able to upgrade the php-geshi package on your server yourself, you might not want to do this.

I also posted about this in the WordPress support forums. Hopefully the WP-Syntax devs will include this change in the next version…

Puppet Syntax Highlighting with GeSHi

This blog is run on wordpress, and I also do quite a bit in PHP, so I’m familiar with the GeSHi syntax highlighter. It’s PHP-based, and can run both as a WordPress plugin (WP-Syntax) and as a PHP module. It also works quite well with the MediaWiki SyntaxHighlight GeSHi extension.

Today I was documenting some Puppet code in a wiki, and realized that I didn’t have syntax highlighting. Well, fellow Linux sysadmin and puppetmaster Jason Hancock was nice enough to post on his blog (Puppet Syntax Highlighting with GeSHi) that he’s developed a GeSHi language file for Puppet, available from GitHub. Many thanks!

New web server, WP optimization

Tonight, more or less on a whim, I moved my blog from my older (dual 1GHz Pentium III Coppermine, 1GB RAM, 10k RPM SCSI disks, Compaq Proliant DL360 G1, OpenSuSE 10.2 32-bit) web server to my newer one (dual 1.4GHz Pentium III, 2GB RAM, 10k RPM SCSI disks, HP Proliant DL360 G2, CentOS 5.3 32-bit). I did some profiling with ab (ApacheBench), and just moving from one server to the other got some serious performance gains (I was profiling with runs of 1000 requests total, 10 concurrent requests). I also added the W3 Total Cache WordPress plugin, which got the numbers to look even better!

As a side note, this was all done pretty quickly (moving the database and tarball for the vhost, installing the plugin, changing DNS), so please give me a heads-up if you experience any problems.

The numbers are rather impressive:

 Total Time(s)RPSAvg. Connection Time (ms)
Old Server1192.252838.7511,893
New Server569.1211757.095,667
Default W3tc Config23.75442,098.44237
Tuned W3tc12.28181,428.76122

All tests were performed on my workstation, a Dell Precision 470, two dual-core Xeons at 2.8 GHz, 2GB RAM, 16GB swap, OpenSuSE 11.1 64-bit. This was on the same LAN and subnet as the servers, with the workstation connected via a 1Gbps copper Ethernet link and the web-serving interfaces of the servers connected via 100Mbps (There’s a trunk in between, from the gigabit aggregation switch to the 100Mbps distribution switch).

WordPress Installation, Finished

Found this from a month and a half ago, waiting as a draft:

So, I mostly finished the WordPress installation. I got everything for WordPress up and running, tested my Blogger URL redirection script and then switched over my subdomain redirection.

The blogger redirection takes two parts, but is in fact quite simple. First, I went into the directory where the Blogger content had lived – /srv/www/htdocs/blog and moved everything in there into another directory, out of the way. I then created a .htaccess in the directory like:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* index.php [L]
</IfModule>

All this does is used mod_rewrite to serve blog/index.php up for every page request. In index.php, I handle the important URL forms for Blogger – archives, tags, feeds, and posts – and redirect to the appropriate place. For archives, I just parse out the year and month from the Blogger URL and redirect to the proper page for WP. The feed is straight redirection. The tags (“labels” in Blogger parlance) are pulled out of the URL, have spaces (after urldecode()) replaced with dashes and are redirected to the right tag for WP.

The posts, on the other hand, were a bit more difficult. My solution ended up being parsing the post name out of the URL. When I used the import tool, WP kept the original Blogger URLs in the wp_postmeta table with a meta_key of “blogger_permalink”. I just looked for a Blogger permalink matching the title from the Blogger URL, found the corresponding post ID and redirected to the proper new WP URL.

The code for index.php, for me, looks something like:

<?php
// redirect old Blogger URLs in /blog to new WordPress in /wp
$request = mysql_real_escape_string(str_replace("/blog", "", $_SERVER['REQUEST_URI']));
 
// handle constant stuff like feeds and top-level pages
// TODO
if(strpos($request, "_archive.html"))
{
    // redirect to an archive
    $request = substr($request, strpos($request, "/", 1)+1);
    $ary = explode("_", $request);
    $redirect_to = "http://blog.jasonantman.com/".$ary[0]."/".$ary[1]."/";
    header("Location: ".$redirect_to);
    die();
}
elseif(strpos($request, "labels/"))
{
    // redirect to a tag page
    $redirect_to = substr($request, strpos($request, "labels/")+7);
    $redirect_to = str_replace(".html", "", $redirect_to);
    $redirect_to = urldecode($redirect_to);
    $redirect_to = str_replace(" ", "-", $redirect_to);
    $redirect_to = "http://blog.jasonantman.com/tags/".strtolower($redirect_to)."/";
    header("Location: ".$redirect_to);
    die();
}
elseif(strpos($request, "/blogger.html"))
{
    // redirect to main blog
    header("Location: http://blog.jasonantman.com/");
    die();
}
elseif(strpos($request, "/atom.xml"))
{
    // redirect to new feed
    header("Location: http://blog.jasonantman.com/feed/");
    die();
}
 
// handle the posts, months, tags, etc.
$fail = false;
$redirect_to = "";
$conn = mysql_connect()   or die("Error. MySQL connection failed at mysql_connect");
if(! $conn)
{
    error_log("SCRIPT ".$_SERVER['PHP_SELF'].": "."Unable to connect to MySQL.");
    $fail = true;
}
$select = mysql_select_db('wordpress');
if(! $select)
{
    error_log("SCRIPT ".$_SERVER['PHP_SELF'].": "."Unable to select DB wordpress.");
    $fail = true;
}
$query = "SELECT m.meta_key,m.meta_value,p.post_name,p.post_date FROM wp_postmeta AS m LEFT JOIN wp_posts AS p ON m.post_id=p.ID WHERE m.meta_key='blogger_permalink' AND m.meta_value='".$request."';";
$result = mysql_query($query);
if(! $result)
{
    error_log("SCRIPT ".$_SERVER['PHP_SELF'].": "."Error in query: ".$query." ERROR: ".mysql_error());
    $fail = true;
}
if(mysql_num_rows($result) < 1)
{
    // couldn't find an appropriate page
    // TODO: find a better way... for now just redirect to the month page
    $ary = explode("/", $request);
    if(count($ary) > 3)
    {
        $redirect_to = "http://blog.jasonantman.com/".$ary[1]."/".$ary[2]."/";
    }
    else
    {
        $redirect_to = "http://blog.jasonantman.com/";
    }
}
else
{
    $row = mysql_fetch_assoc($result);
    $redirect_to = "http://blog.jasonantman.com/".date("Y", strtotime($row['post_date']))."/".date("m", strtotime($row['post_date']))."/".$row['post_name'];
}
 
if($fail)
{
    // redirect to main page with 302
    Header( "Location: http://blog.jasonantman.com/" ); // implicit 302
}
else
{
    // redirect to the post or month
    Header( "HTTP/1.1 301 Moved Permanently" );
    Header( "Location: ".$redirect_to );
}
 
?>

So, it now looks like I’m pretty much done with setup, and even get to keep my links. The one interesting problem that will crop up is due to the fact that, at the moment, I’m hosting off of a dynamically IPed residential internet connection, so I’m at http://jantman.dyndns.org:10011. The problem lies in the fact that Blogger used this for its’ URIs and Permalinks, and it seems that (though http://blog.jasonantman.com uses a 302 not a 301 to redirect) Google, Technorati, etc. have indexed my site with this hostname and port, instead of the redirecting subdomain. Normally this wouldn’t be a problem, but I plan on soon moving to a business hosting account with 5 static IPs and port 80 open. Which means that soon the subdomain will become “real”… and all of those pesky dyndns.org:10011 links will be obsolete. The only way I can think of fixing this is, once I make the switch to static IP and port 80 (which will also include moving all of my subdomains to name-based virtual hosts) I’ll have to craft RewriteRules or redirect rules to replace http://jantman.dyndns.org:10011/wp/ with http://blog.jasonantman.com/, update DynDNS with my new static IP, and keep a default vhost listening on 10011 to provide rule-based redirection to the new subdomain. Eek.

WordPress… Finally!

Well, I finally bit the bullet. I woke up this morning, got in to work (slow day) and said to myself, “My blog’s finally moving to WordPress. Today.” So, I read through the docs (really psyched about the Blogger import feature), downloaded 2.7 (latest), setup the DB and installed away. Once I did the preliminary things like changing the admin password and setting up some categories, I set about importing from Blogger.

Here was the first of the major issues. Though WordPress has really hyped their Blogger importer, they failed to mention that it is totally useless if you are self-hosting Blogger and publishing via FTP or SFTP, as I was. So, after a lengthy consultation with the great oracle of Google, I dedided I had two options – switch Blogger to publish to BlogSpot.com and use the WP import tool, or custom hack a script. I must admit, it felt a bit pointless… all of my old posts were in a directry at the same level as WP, it seemed quite stupid to have to jump through hoops to get that data. Though I’m not quite sure whether the GData API really doesn’t give access to posts if self-hosted.

I migrated publishing of Blogger to BlogSpot.com (foobarthudgrunt.blogspot.com), despite worries about confusing Google or messing up the little bit of ranking I’ve been able to gain. It successfully imported all 128 of my entries, and the few comments. But, as I navigated to the “Edit” page to have a look, an even bigger problem was apparent. All of the tags from Blogger had ended up as categories in WP. So I had no tags for any of my posts, and a gazillion categories with only one post in them. Luckily, the DB schema is pretty sane, and I qickly figured out that both categories and tags are stored in the wp_term_taxonomy table, and the difference between a category and a tag is simply that the taxonomy field is either “category” or “post_tag”, respectively. So, since I hadn’t added any posts in WP yet, I just changed taxonomy to “post_tag” for anything with an ID past the categories I’d added. And it seems to have worked beautifully.

Up next, however, was the hard part: sitting down with my list of categories and sifting through 128 posts to categorize them. The biggest pain is the default edit posts table in the admin interface lists 15 posts per page, and with a cursory glance at the source I couldn’t for the life of me figure out where that limit is set.

Next up, I spent some time looking over the configuration options in WP, updating my About page, and listing things to do in the future (more static pages, blogroll and links, etc.).

I was finally ready to setup pretty URLs. And… sure enough… I clicked “submit” to apply some changes to the default blog URL (my real domain, as opposed to the DynDNS domain) and subtitle, and I got kicked out of the admin interface. Try as I might, I had no luck logging in. I then found that WordPress doesn’t log anything anywhere – no MySQL log and no error messages in the Apache error log. Wondeful. I spent about an hour looking through the source, figuring out how the auth method works, and trying to set an MD5 password. It was obviously apparent that the password in the DB for my one user (administrative user) was generated by PHPass, not MD5, but the auth function was evaluating anything >= 32 characters as PHPass, and the MD5 of my password was 32 characters, so that wouldn’t work. I tried the “forgot password” link with both username and email, but no mail was being sent. When I reached the 2-1/2 hour mark, I started instrumenting the login code with some error_log() lines to see what was up. I narrowed the problem down to the block of code starting at lin 48 in wp-include/user.php – specifically, according to the comments, no credentials were being passed but the cookie wasn’t set. And it’s silent in that case. So, at this point, I’m totally lost. I decided to clear all of my browser’s cookies and auth info, and try again. No dice. After nearly 3 hours, I decided to do a packet capture, and found the horribly simple reason. Somewhere in the code, WP is evaluating the “siteurl” or “home” values from wp_options, and using them instead of just doing relative links. As a result, when the form submits, Apache keeps returning a 302, and the form submission never makes it there. Hopefully this won’t create a problem when I transition from DynDNS to a real static IP and domain name.

Next I enabled pretty permalinks, enabled mod_rewrite in the vhost, and selected a new template for my blog (though I’m planning on doing some heavy customization). I didn’t want to leave the default template up for too long, in case Google’s heavy crawling of my site picked up all of the new content somehow… I ended up narrowing it down to three themes, coincidentally all designed by mg12: Blocks2, iNove or ElegantBox. I installed them all on my box and looked at iNove first, and it was love at first sight.

So, that’s where it stands right now. I finally have my blog on WordPress and running, and have a theme. The action plan for tonight includes:

  • Using RewriteRule and a PHP script to point all of the old Blogger URLs to the new WordPress installation.
  • Adding some new static content, and top-bar links to my other sites.
  • Refining the theme a bit, possibly?
  • Adding links to my sites, and to the blogroll.
  • Adding the buttons/plugins for del.icio.us, Digg, Slashdot, reddit, etc.
  • Redirecting my main blog page to WordPress.
  • Redirecting or linking my old feed locations to WordPress.

New Project – Blog Migration to WordPress

For quite some time, I’ve been frustrated with Blogger. First of all, its’ publishing system is horribly inefficient. As everything is static HTML, at this point, writing this blog entry alone will require it to re-publish approximately 6 MB to my server. Seems sort’a pointless. Not to mention, it doesn’t allow any of the stuff that I really want, such as multiple categories with per-category RSS, or good searching. It also means that, though this blog is hosted on my own server, I’m dependent on Blogger to add posts.

I’m still horribly busy dealing with insurance companies and the police in relation to my stolen truck, as well as looking around and trying to figure out what my next vehicle will be, and how much I can spend on it.

Anyway, I’ve decided that at some point in the future, I’ll be migrating to WordPress for the blog. It will, of course, be hosted on my own machine, and will hopefully also include a migration of everything from this Blogger account. And, somehow, will include some sort of redirection from old posts to the relevant new ones. Most importantly, though, I plan on deferring the project until I get my multiple static IP service from Optimum Online, as the new blog (and the rest of my subdomains) will be moved from GoDaddy forwarding to their own subdomains setup as Apache name-based VHosts.

Stay tuned for progress updates…