PracticalWeb Ltd

Websites that work for you.

Creating New Vagrant Base Boxes With Veewee

Creating your own base box for vagrant is a great thing to do, you get to undestand exactly what is on that box and to choose exactly what base OS you use.

After all one of the big draws of vagrant is keeping your dev environment close to what production looks like - and for that you need to know what is in the base box.

When I first started using vagrant I wrote a post How to build a Centos 6 base Box for vagrant whcih details the manual steps needed. At the time I was busy learning puppet, vagrant and related tools - veewee was just one tool too many and a manual build seemed the best way to get my head around what a base box was.

While the base box isn’t something I have needed to repeat much, it is beneficial to update the base image from time to time and here veewee is brilliant. It also makes shareing the reponnsbibility for the base box within a team easy.

Here’s what I did

1
veewee vbox tenplates | grep -i centos

This told me that the template I wanted was CentOS-6.5-x86_64-minimal

1
veewee vbox define centos65 CentOS-6.5-x86_64-minimal

Now I made some edits to customise the base box how I wanted

1
2
3
vim definitions/centos65/definition.rb # comment out the chef line
vim definitions/centos65/cleanup.sh # remove build packages like gcc
vim definitions/centos65/vagrant.sh # remove insecure vagrant key and add your own public key

Build

1
2
veewee vbox build centos65
veewee vbox export centos65

add to my local vagrant so I can test

1
vagrant box add centos65 ./centos65.box

Now I can copy the base box to a team-accessible server

And push my custom definition to my veewee fork so anyone from the team can build an updated box.

1
2
3
4
5
vim .gitignore # remove definitions/*
git add .gitignore
git add definitions/
git commit -m "added my definition"
git push

Insert Line to All Files When Missing

In this case to turn on comments in all posts for this blog

All files that don’t have comments at the start of a line get “comments: true” inserted on the 5th line

1
2
for f in * ; do grep -q "^comments" $f || sed -i '5 i\
comments: true' $f ; done

Maybe Octopress has a way to globally enable comments - but I didn’t find it and this was quick

Migrating From Drupal 5 to Octopress

I’ve been running this blog (or some version of it) for almost 10 years now.

I write to help clarify my own thoughts, or to note down technical details of a task that I have struggled to figure out. I often found myself coming back and have saved many hours of trying to figure out the same thing again a year or more later.

For a long time this site was running Drupal 5, I set it up at a time when I was getting to know Drupal, starting out as an independant, and had plenty of time to spend on it. At the time this was a very useful excercise, installing lots of modules, and writing some code was good experience. But when Drupal 6 came out and I was busy it wasnt worth upgrading, then when drupal 7 was released and Drupal 5 no longer supported, upgrading was even more difficult as I would have had to upgrade in two steps. Besides Drupal didn’t seem like such a good fit for my blog any more.

I don’t want to have to apply security updates on a site I’m not getting paid for: so a static html site is a great fit for me.

I lose integrated comments, but spam had already killed those for me - I’ll try disqus and see how that goes (the need to enable comments in the yaml for each post threw me at first).

Search was useful - but I can grep the source files myself.

I had all sorts of Drupal plugins before - but really I don’t think they were very important.

Jekyll seems great, especially because with github’s patronage it seems unlikely to become unsupported; and at the end of the day it is just a bunch of simple files so importing to another system should be easy.

Exporting from Drupal 5 needed a small patch on the importer without this the categories were seen as some kind of binary object in yaml. The import reads direct from the database, so doesn’t run all Drupal’s filters and I suspect a drupal export module from drupal would do a much better job. I still need to pull over some old comments and formatting could do with a tidy up, but I need to move to a system that gets me writing new content, and not worry too much about the old.

Jekyll itself didn’t use tags in the way I wanted - I find the ability to cross link from one post to similar ones very useful so I am using Octopress which seem to do what I want out of the box.

To get the content in Octopress I just did

1
cp jekyll/mysite/_posts/* octopress/source/_posts

I have switched from pygments highlighter to linguist (this seems to be what github use and supports code highlighting well)

I added a twitter aside for which I just copy pasted the twitter widget into source/_includes/asides/twitter.html and enabled this in _config.yml

I’m not a ruby coder, so instaling all the required ruby gems and figuring out how to run a modified version of the jekyll importer took a little while, in the end I think it was just a case of getting all the gems installed that I needed. I didn’t blog soon enough!

Static Export of Drupal Site

I’ve exported this site from Drupal using wget to create a static html version like

wget  --mirror -p  -e robots=off --base=./ -k -P ./ http://localhost/

Then rsync to the server and use mod rewrite to retain the paged links like frontpage?page=4

I’ve had some trouble getting mod rewrite to work, it seems that getting apache to serve content from filenames containing question marks is tricky.

in apache 2.4 this worked

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<VirtualHost *:80>
   ServerName practicalweb.localhost
   DocumentRoot /home/sean/rescue/localhost
   RewriteEngine on
   LogLevel alert rewrite:trace3
     <Directory /home/sean/rescue/localhost>
       RewriteCond %{QUERY_STRING} !=&quot;&quot;
       RewriteRule ^(.*) /home/sean/rescue/localhost/$1\%3F%{QUERY_STRING}? [L]

       Options Indexes FollowSymLinks MultiViews
      AllowOverride All
      Require all granted
   </Directory>
</VirtualHost>

But on apache 2.2 (which my server runs I) needed an external redirect

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<VirtualHost *:80>

  DocumentRoot /var/www/practicalweb
  RewriteEngine on
  RewriteCond %{QUERY_STRING} !=&quot;&quot;
  RewriteRule ^(.*) $1\%3F%{QUERY_STRING}? [last,noescape,R]
  <Directory />
      Options FollowSymLinks
      AllowOverride None
  </Directory>
  <Directory /var/www/practicalweb>

      Options None
      AllowOverride None
      Order allow,deny
      allow from all
  </Directory>
</VirtualHost>

It still basically works - but users see a slightly changed URL which isn’t quite what I wanted.

Developing for Ops

We often work on large websites with strict change control practices and scheduled release cycles. Sometimes we also hand over the systems to the client for production and don’t have direct access ourselves.

Some bugs have a nasty habit of only occurring in production, this may be due to high load, odd/old browsers, changes in data, or just because test scenarios don’t cover every eventuality.

What this means is that when we have a bug in production we can only understand it through the error logging we have already built into the system. If we need to put in place additional logging we usually lose the chance of actually fixing the bug for another release cycle.

One of the real arts in this flow of development is to be pessimistic enough to assume that somehow something is going to go wrong, to remember that the people who see the bug will not be the developers who know the code, and that at this point (unlike during dev) we will have very limited access to the systems we might want to debug.

One temptation is to log everything - but you soon find that doesn’t scale.

The art of error messages is a bit like the art of commenting - especially for those errors that should never happen. You often don’t need to say exactly what went wrong, hopefully your compiler or runtime engine will do this along with a stacktrace or at least a line number. You need to say what it means to have this error - especially if it indicates a breakdown in business logic. It also helps to raise errors as early in the code flow as possible.

When working on a large project with multiple teams it is especially helpful for errors to make clear whenever possible which team the bug belongs to. Clear error data like this can really cut down on the politics that can accompany a production bug and radically reduce the time to fix.

For example let’s imagine that we are developing a website, we are responsible for the shopfront but we obtain product data from a feed. As well as all the bugs that can occur in our code there are likely to be a whole host of possible problems with the incoming data.

What if we receive a null instead of the agreed object, what if the price is non numeric (or zero, or negative), what if an expected field is missing?

You might display a product with zero price, fail to display it, or perhaps you do catch an error but the log just says something like “Notice: Trying to get property of non-object”. The bug gets reported to the front end team - because that’s where the error appears. The front end team can’t see the production data.

The politics here is that teams often blame each other, developers are generally optimistic that their code is good and pessimistic about other teams code. Therefore bugs get thrown over the wall too quick and then get thrown back - leaving bugs bouncing about with no fix.

Now imagine you have explicit validation at the point you load third party data (or any place you have a boundary of responsibility like this). You log an error that points directly to the data feed, hopefully with the actual data that is wrong, perhaps even logging the request/response pair that led to it. This time you can give the data team enough info that they can quickly identify the problem. Conversely if you have missing products and no data errors - the impossible has happened - there is a bug in your own code. Now you get to be the hero and fix it.

Remember, in development (where developers spend 95%) of their time this bug would be trivial, the developer would see the error, know which bit of code it related to, be able to view the data feed, and report the bug back to the data team. The trick is to remember it won’t be like this in production.

In my experience it is well worth spending a chunk of time up front writing good generic error routines, that capture as much detail as possible, and 5 or 10 minutes every day looking at stubbed out error routines.

This may add up to a non-trivial time investment - but you only have to save a few minor production bugs to get paid back plenty.

Merge Google Plus Accounts

If you have a gmail account, you get a google plus account with it, many of end of with several of each. For mail this is usually what you want but for plus with it’s circles you usually just want one account and to split post by circle. Fortunately there is a way to redirect one account to another.

This way anyone looking for you at the account you don’t really use or looking you up via an alternate email gets redirected to your active account.

First make sure you are only logged in to the google account you want to transfer your connections away from

Go to Google takeout, this is usually used to download backups of your data.

On the google circles line is a little folder icon, the mousover text tells you this is to "Transfer your google+ connections to another account"

Follow this link and you’ll have to log into your destination account.

That should set up your transfer.

How to Use Google Comments on Your Blog

To use google plus comments on your site all you need to do is insert the following code on teh page where you want the comments to appear

url should be the full url to the page, then any comments in google plus with this url will appear to any user who can see them (ie public commnts or those from people in the viewers circles).

width is just how wide you want the comment box

BLOGGER This method is as far as I can see only officially supported for Blogger comments and only this value seems to work

FILTERED_POSTMOD similarly seems to be just what people use - I didn’t find docs for this so I’m unsure if other values work

1
2
3
4
5
6
7
8
9
<script src="https://apis.google.com/js/plusone.js">
</script>

<g:comments
    href="The url of the blog post"
    width="500"
    first_party_property="BLOGGER"
    view_type="FILTERED_POSTMOD">
</g:comments>

Hacked Server (Now Restored)

A few days ago I was notified by my ISP that my server was "emitting a UDP-based denial of service attack ", as a result the VM had been rebooted and taken off the network.

With console access I was able to verify there was a problem, and the ISP was able to give me a clean VM, with my old system available as a mount, in read-only mode

I had thought that the server was fairly well locked down with minimal users, secure passwords and so on, but to be honest with other commitments I hadn’t been keeping on top of patching.

Reviewing logs etc didn’t show any clear point of entry, and since the server was rebooted I didn’t see any ongoing activity. I decided the best thing is just a clean rebuild, with better security, better monitoring, and keep on top of those patches.

The site is back up - but the code is based on Drupal 5 which isn’t supported and I don’t have time for an upgrade right now. What I’ve done is to run the site locally (after audit) and put online a static html mirror. As well as leaving less avenue for attack this should make the site a lot faster to load. I keep meaning to revamp the site properly but just never seem to have time, and there are always paid tasks that take priority!

Comments are the main loss - but they were overloaded with spam anyway - so I’m trying out google plus comments instead which I hope will avoid the spam and give a better experience

Using Varnish to Cache Authenticated Drupal Pages

I have a site which requires users to be logged in, but the pages are not customised. I was playing with a way to cache the content in varnish while still doing an access check. This method uses an access check pages (test.php below) which then uses ESI to load the real, cacheable content.

I’ve tried it in a dev env, I’m not yet sure if we’ll use this in production.

Varnish config

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
probe checkslash {
    .url = "/robots.txt";
    .interval = 500s;
    .timeout = 10s;
}

include "backends.vcl";

/** generic config from here down */
sub vcl_recv{

  /* if the drupals are down, this is how long we cache for */
  set req.grace = 6h;

  /* Make sure we direct 443 traffic to the secure drupal */
  if (server.port == 443 ) {
    set req.backend = drpau_ssl_director;
  } else {
    /* port 80 traffic goes to the correct LB */
    set req.backend = drpau_director;
  }
  # just pass through non-page files, and the login page
  if (req.url ~ "(?i)\.(pdf|asc|dat|txt|doc|xls|ppt|tgz|csv|png|gif|jpeg|jpg|ico|swf|css|js|htc|ejs)(\?.*)?$") {
  } else if (req.url ~ "(?i)(sites/default/files)|(js/)|(/login)" ) {
  } else if (req.esi_level == 0 ) {
    # pass regular pages to a spoecial url
    set req.url = "/esi" + req.url;
  }
  return (lookup);
}



sub vcl_fetch {

  if (req.url ~ "/esi/" && req.esi_level == 0 ) {
    set beresp.do_esi = true; /* Do ESI processing               */
   }

}

Then in apache I redirect all requests for pages that come via the esi prefix

1
RewriteRule ^esi/(.*)$ test.php [L]

and test php is

1
2
3
4
5
6
7
8
9
10
11
12
13
define('DRUPAL_ROOT', getcwd());
// We prepare only a minimal bootstrap.
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_SESSION);
global $user;
$roles = user_roles();

if (in_array('anonymous user', $user->roles)) {
  $uri = preg_replace('#^/esi#', '', $_SERVER[REQUEST_URI]);
  echo "<esi:include src=\"http://$_SERVER[SERVER_NAME]$uri\"/>";
} else {
     header("Location: https://$_SERVER[SERVER_NAME]/login");
}

Search Skype History Across Chats

I find a lot of clients use skype

For me the biggest frustration is the limited search

But it turns out the data is stored using sqlite - and so you can search using sql directly on the sqlite db files.

or for all your chats from a given day