Thursday, October 01, 2009

Base 36 shortner

How does the typical URL shortening service work? One of the most often used solutions deploys Base 36 encoding.

Base 36 is a positional numeral system using 36 as the radix. So for example decimal 10 is A in Base 36, decimal 100 is 2S, decimal 1000 is RS etc.

The choice of 36 is convenient in that the digits can be represented using the Arabic numerals 0-9 and the Latin letters A-Z. Base 36 is therefore the most compact case-insensitive alphanumeric numeral system using ASCII characters.

So shortening service at the first step replaces an original URL:

http://something_long_here

with a new one:

http://my_host/ID

where ID could an unique value from some sequence. E.g. it could be an auto-incremented key from database. And on the second step we can encode this ID with Base 36 and make it much shorter.

And here is a custom JSP taglib, lets you perform shortening: Base 36 taglib.

No comments: