Skip to content

Using regular expressions with your web site and Ruby on Rails application

September 25, 2011

Regular expressions, also known as regex or regexp, are a useful and powerful way to match strings of text. I first ran into regular expressions when reading chapter 6 of Michael Hartl’s Rails Tutorial. After testing some examples and doing some research, I realized that using regex is also helpful for filtering Google Analytics reports and improving your organic search results in Google. (I don’t know if regex works with Bing but would assume so). So it is definitely worth it to learn a little regex, especially if your site is similar to Twitter or Groupon, i.e. your web application is very tied into your www website.

A brief overview

Regular expressions are used to match strings of text. This is done by using characters and metacharacters.

  • Characters are literal and case sensitive. The letter A matches the letter A, the letter b matches the letter b, the number 1 matches the number 1, etc.
  • Metacharacters are not treated literally. Here are the most common ones:
    • Dot . is a wildcard for any character. .ind will match with Windows, windows, or lindows.
    • Backlashes \ are used to match characters that need to be matched with metacharacters, such as 192.168.1.1. Since the dot (.) is a metacharacter, matching 192.168.1.1 needs to be done with 192\.168\.1\.1 adding the backlash before the standard matacharacter.
    • Brackets [], similar to lists in most programming languages, are used to group a set of characters that you need to match. If you need to match a character regardless of case, you could match [Dd]og for matching Dog and dog.
    • Repetition ? + * {} allows you to search for sequences of characters.
      • The ? allows you to match for no characters or up to one character. For example, 123? would match 12 and 1233 but not 12333.
      • The + matches one or more characters. For example, 123+ would match 123, 1233 or 1233333.
      • The * combines ? and +, in other words, in matches none or any number of characters after. For example, 123* matches 12, 123, 123333.
      • The {} matches repetitions. [0-9]{2} will match any two digits, [a-z]{6,8} will match any number of lower case characters between six and eight digits.
    • Grouping () |. Parantheses are used to group characters together. If you wanted to match results with your short name and your full name for example (in may case it would be Greg or Gregory) you could use Greg(ory). Pipes (|) are used for the OR operator. For example you could use (Greg|greg)ory for Gregory and gregory.
    • Anchors ^ $ are used mainly for matching the beginning set of characters, which is handy for URL matching. The caret matches ^services/ would match services/myservices and services/myservices/theseservices. The $ works the other way around, that is, /services$ would match myservices/services but not services/these services.
By the way, kudos to http://www.seomoz.org/ for helping clarify some of the above definitions.

Using Regular Expressions to Improve Organic Search Results

Gone are the days of focusing on plastering your logo on a big bill board over you local interstate, particularly if you are a new company with limited marketing resources. And if you sell your products and services online, having your site positioned well with organic search results is extremely important to improve web site traffic and sales. Well, regular expressions can also help you with this task.

Filter Internal IP Addresses from Web Site Traffic

If you are serious about SEO you are probably using Google Analytics to monitor the quantity and quality of your web site traffic. Therefore your first step is to make sure your data within Google Analytics reflects external traffic only, not traffic generated by your internal network. You can use a regular expression with Google Analytics to filter out your internal network IP addresses. Below is a screen shot on how to use User Defined filters with regex:

Here the regular expression ^201\.159\.133\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))$ looks for any IP address that matches 2o1.158.133.1 through 2o1.158.133.254.

If you don’t want to code from scratch, Google Analytics has a handy feature to create the regex for you at http://www.google.com/support/analytics/bin/answer.py?answer=55572. There new custom filter feature allows you to create all kinds of filters without necessarily having to know regular expressions, but understanding them will help you tweak how the filters work.

Rewrite URLs

As you have probably heard, if your keywords are within your URL then organic placement in search results will improve. Most content managers these days, particularly Word Press, allow you to edit how your URL displays within your visitors web browser without knowing regex. Lots of web sites still use Apache though, so knowing a little regex will help when editing your .htaccess files to Rewrite URLs (make sure that RewriteEngine On is set).

Using regular expressions within your RoR application

One of the cool topics in the Rails Tutorial by Michael Hartl is the explanation on how to simplify field validations. The following example shows how the code looks in your *.rb file, for example in <your application>/app/models/user.rb:

class User < ActiveRecord::Base
  attr_accessible :name, :email

  email_regex = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i  validates :name,  :presence => true,
                    :length   => { :maximum => 50 }
  validates :email, :presence => true,
                    :format   => { :with => email_regex }
end

The above example looks for a specific pattern of characters in order to determine if it complies with an email format. At first, regular expressions look like gibberish but if you learn how to use regex it will make you life a lot easier.

For Ruby, there is a site named Rubular (www.rubular.com), which allows you to use the site as a reference guide for Ruby based regex. It also allows you to test your regex before you add it to your code.

So next time you are adding validations or other types of string matching tasks to your Ruby on Rails site, consider using regex in order to simplify your life. Just practice a little and in no time you’ll be able to use regex within your RoR application for field validations and the likes, use it to filter Google Analytics reports and improve your organic search results by modifying your web server URLs.

About these ads

From → Ruby on Rails, SEO

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 79 other followers

%d bloggers like this: