Decomposing Relevance

Recently my research has taken a bit of a deviation from the norm. I’m quite a practical programmer, not much liking to get down into mathematical theory or philosophical questions. However, that is taking a back seat at the moment, as my research is really trying to answer the question “given an object (anything!), what is it implicitly related to?”. My approach to this answer, given the way in which my application works, is to use the very core of web-2.0: tags! 🙂

Now, my program explicitly knows a lot of relationships, but it doesn’t always know the best relationships, afterall, it only has data from databases – us humans have the smarts to know that things are related just because – try explaining that to a machine. Of course, just because I know a relationship exists, doesn’t mean it does: I can claim all I want that pigs should be tagged with the terms ‘fluorescent green’, ‘fly’, ‘canFly’, ‘oink’, ‘yum’ and so on, but hopefully you all don’t agree with me all of the time.

So, given now the assumption that any user can tag any object with any word (creating what us geeks call a folksonomy), how do we trust that user x isn’t a jerk (as my research supervisor so nicely puts it)? Is it possible to trust other people whom you have never met? Well, now we’re getting philosophical…perhaps we can trust them just enough to let their thoughts on a mattter impact what information we get provided?

So, given my tags about my friend the pig – you could agree with me, disagree with me, or really be indifferent. We could then form a trust network based on people who agree and disagree with you on particular tags. I won’t go into the details, but basically everybody ends up with a trust rating between 0 and 1.

Righty, that’s cool – my program knows who to trust and who not to trust (not that the user should know this). Now skipping some important details again, we can calculate which other objects are relevant to a particular object based on the tags applied throughout the application.

Of course there are a lot of details missing from this post, but believe me, they have been considered. I have the math to prove it :-).

So, for now, I’m writing up on all this theory. Eventually this should all be published in a paper later this year (but this takes a long time – a paper I was involved with for my research last year into software plugin contracts is only about to be published in July at the Twelfth IEEE International Conference on Engineering of Complex Computer Systems…..*whew*).

Interestings times…..I wonder where next week will take me.

Making the web work for you (or, How to be a lazy-ass)

The semantic web is all about making it possible for people to get lazier, whilst the semantic web takes over our tedious tasks. Things like taking that email about the conference you’re attending and putting it into your diary, and temporarily putting the contacts into your address book (presuming they aren’t already in there). It’s about tentatively booking your flights so that you can get to the nearest hotel to the conference (which the agent once again tentatively booked for you). Coincidently, the hotel is where all other conference guests are being informed to go to as well (imagine the fun!).

Sounds like a pipe dream, and right now it is. To be honest, it’s not my dream at all – it’s Tim Berners-Lee’s – that guy behind the WWW among other irrelevant inventions. Sooner or later this will become a reality (“not an if but a when, yada yada”).

How far are we along this pipe dream? Surprisingly far, actually – not that we are all going to have semi-autonomous agents doing our virtual bidding anytime soon however. That’s still a while off (alas). What we do have is a lot of the plumbing coming into place. RDF and OWL are the HTML of the semantic web, and they are both W3C standards. They already exist and are slowly getting embedded all over the place (just waiting for these smart agents to pop up). Inferencing engines are getting increasingly smart – an example being Pellet. These engines can infer new knowledge from whatever knowledge it is given – just give it the rules (using another W3C standard called SWRL). Query RDF data stores using the SPARQL query language (which is analogous to SQL as you can get, and once again is nearly a W3C standard).

Don’t be too surprised about these all being standards, dirty old Tim Berners-Lee is at it again – he made up the W3C, and is in charge of it. The web doesn’t stand a chance against this kind of bias!

Surprisingly, all data that is currently inside RDBMS can already be exposed on the semantic web – so we don’t have to start again in any regard.

What we need to still work on is mapping databases together easily (i.e. ‘smushing’ databases), and then walk up the semantic web cake to sorting out how we trust the information we have….

What am I doing to help? I’m working with these technologies, and in particular working on both the mapping of database tables throughout the net, and trying to work out methods to trust information (based on user trust). An interesting side-effect of my research, and really one of the core goals, is that by joining together data we get explicit links in a graph being formed – but users have many implicit links as well – and it is my goal to unlock these links by letting users tag and rate information that they come across (once again linking into the whole trust issue). This gives us relevance and similarity suggestions that can be calculated by our agents.

Then….world domination….(but I’m not sure if it’s for ourselves or our smart agents)…..

By the way, is it just me or whenever you think of a smart agent you think of a 10 pixel high game character with dark glasses? In particular, I am reminded of an old Apogee game I used to play….

Introduction

So, to introduce myself….My name is Jonathan Giles, I’m a resident of the mega-happening Palmerston North (and have been here for the 22 years I have been alive pretty much). I am married as of January 2007 (yes, my wife and I have been friends since we were young – we had known each other for a number of years before getting hitched). I figured why wait to get married if it was inevitable anyway….

I’ve just completed a Bachelor of Engineering with Honours degree at Massey, where I studied software engineering. I managed to get first class honours, be put on the Massey University Merit List, and given the title of Massey Scholar. I tossed up between doing a PhD or a masters, but finally settled on masters, as given the choice to be theoretical or practical, I definitely fall on the side of practical.

My research last year was in the area of plugin-based development. What this means is writing a program composed entirely of plugins – think Eclipse if you’re a Java person. Do not think Winamp or Firefox – they of course offer plugins, but functionally they are complete with plugins really to ‘smooth the edges’. Programs I have written, and applications like Eclipse, on the other hand are entirely based on plugins – even the core functionality. The benefits of such technology are that development can proceed far more agilely – customer pressuring you for a piece of functionality? Stop work on your current plugin, and begin a new plugin. Drop it into their ‘plugins’ folder, and it all works nicely – and this is in my practise, not in academic theory.

Of course, this doesn’t sound very web 2.0, and it isn’t. I haven’t jumped on that platform yet. I still think in some circumstances you just can’t beat an application, therefore none of my research is in the web 2.0 area.

My new area of research (which coincidently still builds atop my plugin research) is in the area of semantic web technologies. This is all about getting data off intranets and the internet, and making it far more comprehensible for people. The main push for semantic web in this case is in terms of making things standards-compliant, so that in the future, as more systems are able to interact with this kind of data, my research is able to become more valuable to the end user. At the same time, there is so much data inside businesses that can be accessed using the results of my research, and this data can once again be used to help people comprehend their environment.

The point of my semantic web research may sound rather odd – but I will make sure to clarify it more in the future – there is an actual ‘real-life’ product attached to this work – it isn’t just vapourware. I’ll post as soon as I am able to on that!

Righty, I’ll leave it there. I’ll make sure I keep you all updated about my work as it progresses over the next year.

Hello Planet NZTech

I’ve just been informed that I have been added to Planet NZTech, which I must admit should hopefully result in the ratio of European vists to NZ visits hopefully correcting itself. The reason for this is that I’m a postgrad student at Massey University in Palmerston North researching semantic web technologies, and in particular the linking of such technologies into business applications. Semantic web research is really something that the Europeans seem to be heavily pursuing these days, but I’m really excited about the possibilities it opens up. I’ll try my best to outline this in the future (my research is only just beginning).

This post is really to say hello to anyone who reads Planet NZTech, so…..hello!