Archive

Posts Tagged ‘search technologies’

Nervous!

October 23rd, 2009 4 comments

OK, we’re almost here. After 5 months of efforts we are about to go live. The first 3 months were layered on top of a VERY demanding 60 hour work-week consultancy, and the last 2 months of which were a flat out, fully-focused, I-am-going-to-burn-out-or-get-this-done-even-if-it-kills-me rollercoaster ride.

Almost there. Almost.

The code works in testing. Everything is perfect so far.

The crawlers perform as per spec, the custom heuristics we’ve created to analyse blogs tests out fine (they give VERY sane results), the machine learning components give us over 96% accuracy. We can tell a ton about bloggers just by what (and how) they write, and the structure of their blogs (which are automagically reverse-engineered.. eat your heart out Harry Potter).

The entire flow is rock-solid, and I’m grateful that I chose the more robust option of Java EE to express the logic in, rather than ‘quicker’ language like Perl, PHP or Ruby.

There are some ‘routine’ elements (i.e. subscription, registration etc) to take care of. Nothing associated with any risk.. stuff that we’ve done dozens of time before. It’s all about polish right now.

There may be some important pieces missing, or more likely in infancy (some amazing heuristics we can put in, but that’s for Q3 now), but then, this is a prototype. For those who would appreciate an analogy, it’s like building the airplane while riding in it. You have a high probability of experiencing crashes (which thankfully are not fatal in this scenario! .. yet).

I am feeling really giddy as I take the system through the last steps. Got to make sure that everything works.. have to keep an eye on security, batten down the hatches, set up the corporate presence (yes, emails, accounting systems etc), set the analytics to capture every nuance of user-interaction data on the site.

Perhaps once I get the site up, open it up to the world and unleash the dogs (sorry, Shakespeare’s King Lear always gets me), I’ll feel better. Meanwhile, things to do… things to do !!!

I’m nervous! Have to look at patents and funding grants now.

We’re almost past the prototype stage (stage 2)! Now to delight the customers and scale up (Got an eye on you Amazon EC2… spare a smile for me, and a server or a few!).

Wish me luck guys!

Seminar on Open Source Search Tools

February 25th, 2009 No comments

I was the speaker at the Ottawa LAMP meetup for February. I must confess that I really enjoyed giving this presentation as it touch on three of my passions, (1) open source tools, (2) search theory and technology and (3) using technology to help people and organizations work more effectively and efficiently.

The video of my presentation, titled ‘Scalable techniques for large-text repositories: Google in a box with LAMP tools’ is available at http://vimeo.com/3228028.

Special thanks to Andrew Ross for recording such a high quality video.