:HIPSTER_DEV_BLOG

Another Octopress blog about programming and infrastructure.

Writing Functions for AWS Lambda Using NPM and Grunt

AWS recently announced a new compute product called Lambda allowing users to run Node.js functions on full managed infrastructure while paying only for the actual compute time used.

NPM is the primary package manager for Node.js, and while Lambda does not provide explicit NPM support it is possible to bundle NPM packages with your function to leverage 3rd party modules.

Grunt is a task runner for JavaScript, allowing easy automation of project tasks such as building and packaging. Recently I released a Grunt plugin to assist in testing Lambda functions and packaging functions including NPM dependencies.

This blog post provides an example of how to use NPM, Grunt and the grunt-aws-lambda plugin to create a Lambda function which will scrape a web page and save a list of links within that page to S3 using the cheerio NPM package.

Part 1: Integrating OpsWorks and CodeDeploy

Amazon recently announced a new deployment service called CodeDeploy. OpsWorks is another application management product which provides excellent configuration management via Chef, however it lacks the advanced deployment functionality of CodeDeploy. It therefore makes sense to integrate these two products, delegating the configuration management to OpsWorks and the deployment functionality to CodeDeploy.

This is part 1 of integrating OpsWorks and CodeDeploy.

This section provides an introduction to OpsWorks and CodeDeploy, and the basic configuration required to get started.

NY Taxi Data Visualized

Recently a massive dataset of NYC Taxi Data was made public. There are torrents available but at 19gb the data can be quite unwieldy to manage on a home machine. /r/BigQuery have uploaded the dataset to Google’s BigQuery service.

BQ provides a simple way to get insights out of this dataset without tearing through your internet usage or waiting for your home machine to query 173 million records. For example on reddit they have already discovered some anonymization issues.

I’ve taken some of the popular Queries and charted them.

PDF’s in Ruby

Avoid using PDF’s in your application. There are no great solutions to PDF generation in general, and Ruby does not have any perfect options. If you really need PDF’s, this is the landscape of options, and my suggestion.

Automatic DNS Records Using Route53 on OpsWorks

Lets say you have a load balanced web application managed with OpsWorks – your application traffic will be addressed to the load balancer, but sometimes it’s still handy to address your application nodes directly for testing purposes or perhaps so each node has a unique SNS endpoint for HTTP notifications. You could just use their IP, but unless you use an EIP that IP address may change. You could create a DNS record, which would be easier to remember and allows the IP to change – but managing this manually would be a pain.

Fortunately this process of managing DNS records can be automated using Chef, Route53 and the EC2 instance metadata functionality to obtain the public IP. Each instance will automatically create an A record for [instance name].example.com on setup using their OpsWorks instance name.

Lazy Processing Images Using S3 and Redirection Rules

In a system dealing with user generated images it’s common to have to resize images before they can be served to the web. Storing and serving large quantities of user generated images can also be a challenge – that is unless you’re using AWS S3. A typical implementation using S3 to store and serve images requires images to be resized into every required size and saved to S3 upon being uploaded. An unfortunate limitation of this technique is that you must know all required sizes at the time the image is uploaded – something that may not be constant, consistent or known in some (particularly legacy) applications.

One solution is to automatically resize images the first time they’re requested using dimensions provided in the image URL, this way the application requesting the image can choose an appropriate size. While S3 doesn’t provide functionality to transparently proxy image misses to your image processor, it is possible to use S3 S3 routing rules to achieve a similar function.

Retrieving Files From S3 Using Chef on OpsWorks

You may also be interested in my new post Revisited: Retrieving Files From S3 Using Chef on OpsWorks which includes support for IAM instance roles.

Say you wanted to manage some configuration file in your OpsWorks stack – typically you’d create a custom Chef recipe, make your configuration file a template and store it within your custom cookbook repository. This approach works well in most instances, but what if the file is something not suited to version control such as a large binary file or perhaps a programmatically generated artifact of your system?

In these cases you may prefer to store the file in an S3 bucket and automatically download a copy of the file as part of a custom recipe. In my case I wanted to have a dynamically generated (by a separate system) vhost configuration file which could be deployed to a stack using a simple recipe.