Isolating File Extensions Using Regular Expressions In SQL Server 2000
One of our our older applications is still running on SQL Server
2000 -- Yes, I know, it's three revisions and nine years behind but
I am told there is a valid reason why it hasn't been upgraded yet.
And no. I don't know what that reason is -- While the application
has it's own dedicated support team I am sometimes called upon to
lend assistance when an issue requires a more experienced skill
set. Today was one of those days.
One of the database tables is used to store virtual references
to files that end users have uploaded. Unfortunately there is no
validation applied to the filenames and users are free to upload
whatever they want. Although this is fine for files being uploaded
it can present problems for subsequent GETs. If a file has a badly
named extension then no appropriate mimetype can be determined.
With no appropriate mimetype a web browser will simply present a
download dialog box instead of opening the file with a registered
handler or plug-in. It should be noted that this is not the fault
of the end user. To them it makes perfect sense to name a video
file named mymovie.mov as mymovie.mov copy.
My main task was to parse hundreds of thousands of table rows to
get a count of all the unique file extensions in use. With that
information we could determine the scope of the problem and
prioritize the work effort required to fix it. My first thought of
course was to simply use the CLR to create a wrapper around a
regular expression. Facepalm! This application is running on SQL
Server 2000, ergo no .NET, ergo no CLR, ergo "now what?"
I had a faint recollection of doing something almost a decade
ago with SQL Server 2000 and COM Automation. After some digging
through Google as well as my own memory I came across sp_OACreate() and
the companion sp_OASetProperty(),
sp_OAMethod() and
sp_OADestroy().
These stored procedures allow communication with COM and
subsequently the VBScript Regular Expression object. Creating an
instance of RegExp in a stored procedure using dynamic SQL is as
simple as
1:
DECLARE @ObjectId Int
2:
DECLARE @ReturnVal Int
3:
EXEC @ReturnVal = sp_OACreate 'VBScript.RegExp', @ObjectId OUT
There exists many examples on the web for implementing Regular
Expressions this way. I rolled my own proof of concept so that I
made sure I fully understood how the sp_OA* stored procedures work
then resorted to Google to locate a finished solution. See
http://www.simple-talk.com/sql/t-sql-programming/tsql-regular-expression-workbench/
for a great implementation of User Defined Functions (UDF)
supporting matching, finding and replacing.
Once the UDFs were added to the SQL Server 2000 database it was
time to work on the query that would return all the unique
extensions. That was accomplished with a simple cursor that called
RegExFind on each iteration. The RegEx pattern was \.[\w\s\d]*.$
and it was applied to the file path obtained from the table
containing the file references. The query is reproduced below for
anyone interested.
The pattern \.[\w\s\d]*.$ means match all characters between the
last period in the path (assumed to be the extension delimiter) and
the end of the line. Referring to the example earlier this would
match the .mov in mymovie.mov as well as the .mov copy in
mymovie.mov copy.
1:
DROP
TABLE
2: #Extensions
3:
4:
CREATE
TABLE
5: #Extensions (
6: Extension VARCHAR(255)
7: )
8:
9:
DECLARE MY_CURSOR CURSORFOR
10:
SELECT
11: filePath
12:
FROM
13: FilesTable
14:
WHERE
15: FilesTable.[Type] = 10
16:
AND filePath ISNOTNULL
17:
18:
OPEN My_Cursor
19:
DECLARE @VAR1 VARCHAR(255)
20:
DECLARE @Ext VARCHAR(50)
21:
22:
FETCH
NEXT
FROM
23: MY_Cursor
24:
INTO
25: @VAR1
26:
WHILE(@@FETCH_STATUS <> -1)
27:
BEGIN
28:
IF(@@FETCH_STATUS <> -2)
29:
SET @Ext = (
30:
SELECT
31:
TOP 1 value
32:
FROM
33: dbo.regexfind('\.[\w\s\d]*.$', @Var1, 1, 1)
34:
WHERE (
35: LENGTH > 0
36: )
37: )
38:
IF (
39:
SELECT
40: Extension
41:
FROM
42: #Extensions
43:
WHERE (
44: Extension = @Ext
45: )
46: ) ISNULLBEGIN
47: INSERT INTO
48: #Extensions (
49: Extension
50: )
51:
VALUES (
52: @Ext
53: )
54:
END
55:
FETCH
NEXT
FROM
56: MY_CURSOR
57:
INTO
58: @VAR1
59:
END
60:
CLOSE MY_CURSOR
61:
DEALLOCATE MY_CURSOR
62:
63:
SELECT
64:
DISTINCT Extension
65:
FROM
66: #Extensions
67:
Now that the analysis query has been run we are evaluating the
results to determine the best remediation solution. It will
probably come down to a replacement of obvious malformed extensions
with the correct extension. I will use a variation of the query
above substituting RegExFind with RegExReplace.
One important note when using this method is that is quite
costly in terms of performance. In our case it is being applied as
part of a one time data maintenance and cleanup operation. I would
not recommend using COM Automation like this in a high throughput
stored procedure without careful thought first. At this stage it
is probably well worth upgrading to SQL Server 2008 and using the
CLR.
I'm In Love With jQuery
Me: I'm in love with jQuery!
Ali: What will Prototype say?
Me: We won't tell Prototype.
Ali: She'll get pretty upset after what you did to Dojo!
The first javascript library I used was Dojo 0.4.1 which
I thought was awesome at the time. Then I discovered the more pure
syntax of Prototype and script.aculo.us and dumped Dojo. I was really
enjoying Proto-taculous when I decided to check-up on jQuery and was blown
away by it's simplicity and power.
At the moment jQuery seems to be the leading javascript library
but it's not without competition. I have to remember to keep an
eye on the other leading libraries because they're all innovating
at a fast rate.
Idea A Day #5 -- The Ultimate Travel Mug And Friend Of The Environment
Fast food coffee shop cups such as Tim Hortens account for a
significant amount of annual landfill content. Even if they were
recyclable the cradle to grave energy cost to manufacture,
transport, collect, transport and recycle is still high. We could
reduce that cost dramatically if everyone used their own cup as
much as possible. While you do see some people doing this they're
often inappropriately associated with a sub-culture such as
starving students or weird hippies. Carrying a personal cup should
be socially acceptable and encouraged. To achieve this social
acceptance we need to make using a personal cup easier and provide
incentives.
A prime incentive, that of money, already exists but it's not
openly promoted. I know that Second Cup, Tim Hortens and Starbucks
all provide discounts to customers who bring their own cup. The
discount is small, about the equivalent of sales tax. It may not
seem much on a single purchase but it could save $25 a year -- 10c
per day * 5 days per week * 50 weeks (coffee
drinker was on holiday for 2 weeks). The discount is probably not
promoted because it means a customer is walking around town without
displaying the shops logo.
Enter the Ultimate Travel Mug!
- It's double-walled stainless steel for durability and heat
retention.
- It has an industrial strength stay closed lid with a short
plastic strap that can never be separated from the cup and
lost.
- A small groove between the handle and the body conceals a
reusable stir stick because millions of disposable stir sticks also
have an environmental impact!
- The handle incorporates a carabineer style snap closure so the
mug can be attached to a backpack, briefcase, etc.
- An inset is provided so that coffee shops can easily attach
their own branding or logo. Buy the cup from Starbucks and it says
Starbucks to everyone you pass in the street.
- A second smaller inset has an indelible writing surface for
personalization thus preventing "Hey, get yer hands off my cup!"
syndrome .
- An integrated RFID allows the user
to "charge" their cup with pre-paid coffee credits (similar to
Second Cup's Coffee Card) so that subsequent visits don't require
digging for change.
- It has no restrictive patent or license meaning that anyone can
manufacture and sell it.
- It is unveiled to the world as part of a cooperative campaign
shared by all the major fast food coffee shops who pledge to
eliminate 99% of their disposable cups within one year.
- The campaign also pledges to use 100% fair trade, shade grown
beans using sustainable practices.
- It's dishwasher safe.
Idea A Day #4 – WKRP In Twitterville - All Request Streaming Radio
I had this idea a few weeks ago and it prompted me to start
exploring Google's App Engine as a result (which I like a lot, but
that's another post).
I've been using Last.FM for quite some time now and really like
it as a radio station for discovering new music. I start by adding
an artist or tag that I already like then just sit back and
listen. The problem with using the "like an artist" and "like a
tag" method is that the radio stream is still going to be somewhat
constrained to a narrow and vertical sound. There could be all
sorts of related music that is just beyond the edge described by
the meta-data of the current station. This is were people come
in. People can offer suggestions that computers can't (ooh, I'll
regret that statement).
Now I'm thinking of a small app probably with an invisible
embedded Flash player that consumes a playlist from Last.FM. The
app has a number of genres, or stations, that the listener can tune
into. Without outside influence the app will simply play the
Last.FM stream as the API defines it. However, the app server is
also wired to the Twitter API and listening for tweets sent to a
specific user created for this purpose, maybe @DJ although I
suspect that's been long taken. @DJ starts receiving requests for
different artists and singles which it looks up in the Last.FM API
to ensure broadly matches the genre. After all, we don't want a
Rhianna song following ACDC and Kiss. The app simply keeps a count
of the requests favouring the most requested over a given period of
time injecting the most requested into the playlist.
As an exercise in API's, Google App Engine, etc this is a fun
project (yes, it's gone a bit beyond the idea stage already).
However, it does have one possible flaw, namely that of Twitter
reliance. After all, if the listener has the app then they also
have an interface capable of logging music requests to be sent back
to the server. So why use Twitter? Well, I'm working on that
one. Perhaps this can be extended into a service that can be
resold to terrestrial radio stations? But that's another idea
Idea A Day #3 -- The Reflecting Clock
Simply put this is a clock whose hands travel anti-clockwise
past reversed numbers.
Have you ever sat in front of a mirror while getting your hair
cut and saw a clock on the opposite wall? It's backwards because
you're looking into the mirror.
A clock like this would make a great conversation starter. It
would also have great practical use in live theatre and show
business in general. Actors sitting in front of their dressing
room mirror can simply look at the Reflecting Clock to see how long
till curtain call.
Idea A Day #2 -- Use A Juicer To Make Vegetable Stock
It all started with a craving for French Onion Soup. Most
recipes call for beef stock and as a vegetarian for the past 20
years this isn't an option. I've made French Onion Soup with store
bought vegetable stock and it works quite well apart from the
inhumanely high sodium levels. I have recently bought some
cheesecloth and I plan on simmering all my vegetable cutoffs to
make and freeze homemade stock. Unfortunately I haven't got around
to that yet and I wanted the onion soup tonight. So how could I
make a low(er) sodium vegetable stock in a hurry?
Use my juicer! Anyone that has had fresh vegetable juice knows
that it is packed with flavour. The key, I thought, would be in
balancing the flavours to get a rich stock-like essence without
tasting like something from a juice bar. Carrot juice is very
sweet so I used one small one. Celery has a milder flavour so 4
stalks went in. It was going to be an onion soup with caramelized
onions but throwing another in the juicer couldn't hurt. And for
good measure I also juiced the bottom halves of 3 broccoli stems
(heads to be steamed for dinner tomorrow).
The resulting juice was flavourful and not too sweet. I started
simmering it with a pinch of salt, pepper and a couple of bay
leaves. Meanwhile I caramelized 4 medium thinly sliced onions with
some olive oil, eventually deglazing with a cup of left-over red
wine. Some Soy sauce added a little more salinity and helped
darken the hue to French Onion-ish brown. I transferred the
caramelized onions to the simmering stock and continued to cook for
about another 20 minutes seasoning a bit more along the way.
The final result was not French Onion soup. But it wasn't
unpleasant either. In fact it was quite good. Some of the
flavours were definitely interesting but that can corrected by
playing with the types and quantities of vegetables. I'm going to
keep working on this until I can nail the right recipe.
Now I just have to think of what to do with the pulp that is
left over in the juicer. Perhaps mixing with tofu and seasonings
can be the basis for a homemade veggie burger? Or is this is
another idea to be saved for another day?
Idea A Day #1 -- iPhone Web Stack Scratchpad
Actually, idea
#1 should be this silly idea
of posting an idea
every day
:) It seems that every waking hour (and some not so waking hours)
has my brain in overdrive coming up with all sorts of ideas. Ideas
for a new recipe to make for dinner, a novel piece of software, a
fiction novel, a nifty bicycle drive train, inaugurating a hiking
trail in downtown Toronto and so, so much more. In the spirit of
mathematics and programming I declare my silly idea
to be Idea
A Day
#0. So, without further ado, my second published idea,
Idea
A Day
#1.
I love my iPhone. I use it all the time although surprisingly
enough I make very few actual telephone calls on it. It works
exceptionally well as a tiny internet tablet device and the number
of decent applications is impressive. I find that I'm turning to
it more and more frequently for utilitarian task. So much so that
I have been wondering if I can use it for prototyping development
ideas targeting the standard web stack of HTML, javascript and
CSS. So far I haven't come across any applications that do this
but I don't think that means it can't be done.
There are some obvious hurdles that would have to be overcome
right away. The primary one being that of the UI. The iPhone, for
all it's glory, has pretty poor text editing. You can move the
cursor around with difficulty, there is no select, cut, copy and
paste. As well, punctuation characters always require a switch to
a different keyboard layout. Typing all those angle brackets,
braces and semi-colons will get tedious very quickly.
My solution for the text entry woes would be to place a sliding
panel that is normally hidden, save for a subtle grabber. A touch
of the grabber slides the panel out exposing an accordion menu of
individual characters as well as code snippets. The menu would
take advantage of context and predictive patterns when presenting a
dynamic list of suggested options. Another hidden but readily
available panel would display 4-way arrows and a couple of modifier
button to make it easier to cursor up, down, left, right, start of
line, end of line, page up, page down, cut, copy and paste.
Double-tapping a word would select it. Double-tap and swipe would
select adjacent words. Triple-tap would select a code block. Tap
and hold would popup a menu of characters after a short delay. A
menu that would contain characters such as <>(){}#;'".
Rolling left to right on the menu chooses the appropriate
character. A double-space would place a smart period but unlike
the usual iPhone behaviour it would not leave a trailing space.
Instead the cursor would be flush against the period anticipating a
dot-notation requirement.
I've also thought about how the actual edit/run would be
implemented. The entire development environment would live inside
an html file that already includes the latest jQuery and Firebug
Lite javascript libraries as well as a code syntax highlighter
supporting, of course, javascript. The majority of the html markup
would implement an editor plugin and the sliding panels at each
window edge. When the run/debug button is tapped (hmm, I think we
need another sliding panel, maybe on the top now) the current code
in the editor is saved to a cookie. This limits the editor to a 4K
file. Now I don't want to be accused of 640K
ought to be enough for anybody but I know for me right now 4000
characters is probably more than I want to type into the iPhone in
its current incarnation. For reference, my word counter for this
article says I'm at 593. I'd have to type a little less than 8x
what is on this page. Yeah, 4K may not be enough for anybody but
it's enough for me. Sorry, going off on a tangent there -- The
code being edited is saved to a cookie and then an iFrame is opened
full screen over the current viewport. The iFrame uses a small
boot loader to read the code from the cookie and inject it into the
DOM. The iFrame would already contain references to jQuery and
Firebug Lite. It would also have sliding panel accessible via a
tap on a grabber that allows the user to return to the editor.
In principal I don't see anything wrong with this idea
(yet). Practice, unfortunately, sometimes proves principal to be a
waste of time. I doubt that I will take this idea
from the drawing board and into Visual Studio but it has been a
great distraction for my brain on the walk home from the office
tonight. I'm sure I'll refine the idea
within my own grey-matter based VM a bit more. In fact, typing
this out has already got me thinking that the iPhone Web Stack
Scratchpad could even be socialized with cloud based storage for
the code block, snippet sharing and programmer collaboration. Oh,
and built-in Twitter SDK support that can be programmed against as
well as used to tweet out snippets to fellow developers
Safari On The iPhone Suppots Bookmarklets
I didn't even think about it when I first started exploring my
new iPhone but in hindsight it makes perfect sense. The browser is
Safari after all.
Unfortunately the only way I have found so far to get
bookmarklets on my iPhone is to install Safari on my desktop,
delete all pre-installed bookmarks, add my favourite bookmarklets
and synchronize. It's pretty straightforward but it would be much
nicer if I could synch directly with FireFox, my default
browser.
Using jQuery to GET an ASP.NET webservice
After an hour or two of sleuthing I've discovered that jQuery
1.2.6 does not set the Content-Type header for HTTP GETs even if
you explicitly use the contentType parameter. My understanding of
the jQuery rationale is that GETs don't contain data and therefore
a Content-Type header is not required. Unfortunately, Microsoft
reckons that GETs returning JSON provide a security risk and that a
Content-Type header must be specified. Scott Guthrie explains it
here http://weblogs.asp.net/scottgu/archive/2007/04/04/json-hijacking-and-how-asp-net-ajax-1-0-mitigates-these-attacks.aspx.
After reading documentation from both camps I can fully
understand their opposing views. Ultimately it is yet another
indicator that the responsibility of user protection lies in the
browser is blatantly wrong! If the browser raised a message asking
the user if they are ok with a cross domain request and acted
according to the response it would render both jQuery's and
Microsoft's approaches obsolete.
So now I am caught between a rock and a hard place. I love
jQuery and I love Microsoft ASP.NET webservices and I am going to
use both in my current projects. But, if I can not explicitly set
the Content-Type header for an .ajax() GET then I have just one
choice and that is to use POSTs instead. Unfortunately this
contradicts the notion of using the correct HTTP verbs and removes
the possibility of a RESTful API.
It's always about compromise.
A Bicycle Lock That Tags The Thief
This is awesome. I have to get one of these locks although it
doesn't appear that the lock is in manufacture yet which is a
shame.
The lock contains a pressurized dye that is both visible and
detectable with forensic equipment. The designer's home page can
be found at
http://www.mikelambourn.com/smartlock.html
and I've also linked a YouTube video of the lock in action.