Decomposing Relevance

Recently my research has taken a bit of a deviation from the norm. I’m quite a practical programmer, not much liking to get down into mathematical theory or philosophical questions. However, that is taking a back seat at the moment, as my research is really trying to answer the question “given an object (anything!), what is it implicitly related to?”. My approach to this answer, given the way in which my application works, is to use the very core of web-2.0: tags! 🙂

Now, my program explicitly knows a lot of relationships, but it doesn’t always know the best relationships, afterall, it only has data from databases – us humans have the smarts to know that things are related just because – try explaining that to a machine. Of course, just because I know a relationship exists, doesn’t mean it does: I can claim all I want that pigs should be tagged with the terms ‘fluorescent green’, ‘fly’, ‘canFly’, ‘oink’, ‘yum’ and so on, but hopefully you all don’t agree with me all of the time.

So, given now the assumption that any user can tag any object with any word (creating what us geeks call a folksonomy), how do we trust that user x isn’t a jerk (as my research supervisor so nicely puts it)? Is it possible to trust other people whom you have never met? Well, now we’re getting philosophical…perhaps we can trust them just enough to let their thoughts on a mattter impact what information we get provided?

So, given my tags about my friend the pig – you could agree with me, disagree with me, or really be indifferent. We could then form a trust network based on people who agree and disagree with you on particular tags. I won’t go into the details, but basically everybody ends up with a trust rating between 0 and 1.

Righty, that’s cool – my program knows who to trust and who not to trust (not that the user should know this). Now skipping some important details again, we can calculate which other objects are relevant to a particular object based on the tags applied throughout the application.

Of course there are a lot of details missing from this post, but believe me, they have been considered. I have the math to prove it :-).

So, for now, I’m writing up on all this theory. Eventually this should all be published in a paper later this year (but this takes a long time – a paper I was involved with for my research last year into software plugin contracts is only about to be published in July at the Twelfth IEEE International Conference on Engineering of Complex Computer Systems…..*whew*).

Interestings times…..I wonder where next week will take me.

Making the web work for you (or, How to be a lazy-ass)

The semantic web is all about making it possible for people to get lazier, whilst the semantic web takes over our tedious tasks. Things like taking that email about the conference you’re attending and putting it into your diary, and temporarily putting the contacts into your address book (presuming they aren’t already in there). It’s about tentatively booking your flights so that you can get to the nearest hotel to the conference (which the agent once again tentatively booked for you). Coincidently, the hotel is where all other conference guests are being informed to go to as well (imagine the fun!).

Sounds like a pipe dream, and right now it is. To be honest, it’s not my dream at all – it’s Tim Berners-Lee’s – that guy behind the WWW among other irrelevant inventions. Sooner or later this will become a reality (“not an if but a when, yada yada”).

How far are we along this pipe dream? Surprisingly far, actually – not that we are all going to have semi-autonomous agents doing our virtual bidding anytime soon however. That’s still a while off (alas). What we do have is a lot of the plumbing coming into place. RDF and OWL are the HTML of the semantic web, and they are both W3C standards. They already exist and are slowly getting embedded all over the place (just waiting for these smart agents to pop up). Inferencing engines are getting increasingly smart – an example being Pellet. These engines can infer new knowledge from whatever knowledge it is given – just give it the rules (using another W3C standard called SWRL). Query RDF data stores using the SPARQL query language (which is analogous to SQL as you can get, and once again is nearly a W3C standard).

Don’t be too surprised about these all being standards, dirty old Tim Berners-Lee is at it again – he made up the W3C, and is in charge of it. The web doesn’t stand a chance against this kind of bias!

Surprisingly, all data that is currently inside RDBMS can already be exposed on the semantic web – so we don’t have to start again in any regard.

What we need to still work on is mapping databases together easily (i.e. ‘smushing’ databases), and then walk up the semantic web cake to sorting out how we trust the information we have….

What am I doing to help? I’m working with these technologies, and in particular working on both the mapping of database tables throughout the net, and trying to work out methods to trust information (based on user trust). An interesting side-effect of my research, and really one of the core goals, is that by joining together data we get explicit links in a graph being formed – but users have many implicit links as well – and it is my goal to unlock these links by letting users tag and rate information that they come across (once again linking into the whole trust issue). This gives us relevance and similarity suggestions that can be calculated by our agents.

Then….world domination….(but I’m not sure if it’s for ourselves or our smart agents)…..

By the way, is it just me or whenever you think of a smart agent you think of a 10 pixel high game character with dark glasses? In particular, I am reminded of an old Apogee game I used to play….

Introduction

So, to introduce myself….My name is Jonathan Giles, I’m a resident of the mega-happening Palmerston North (and have been here for the 22 years I have been alive pretty much). I am married as of January 2007 (yes, my wife and I have been friends since we were young – we had known each other for a number of years before getting hitched). I figured why wait to get married if it was inevitable anyway….

I’ve just completed a Bachelor of Engineering with Honours degree at Massey, where I studied software engineering. I managed to get first class honours, be put on the Massey University Merit List, and given the title of Massey Scholar. I tossed up between doing a PhD or a masters, but finally settled on masters, as given the choice to be theoretical or practical, I definitely fall on the side of practical.

My research last year was in the area of plugin-based development. What this means is writing a program composed entirely of plugins – think Eclipse if you’re a Java person. Do not think Winamp or Firefox – they of course offer plugins, but functionally they are complete with plugins really to ‘smooth the edges’. Programs I have written, and applications like Eclipse, on the other hand are entirely based on plugins – even the core functionality. The benefits of such technology are that development can proceed far more agilely – customer pressuring you for a piece of functionality? Stop work on your current plugin, and begin a new plugin. Drop it into their ‘plugins’ folder, and it all works nicely – and this is in my practise, not in academic theory.

Of course, this doesn’t sound very web 2.0, and it isn’t. I haven’t jumped on that platform yet. I still think in some circumstances you just can’t beat an application, therefore none of my research is in the web 2.0 area.

My new area of research (which coincidently still builds atop my plugin research) is in the area of semantic web technologies. This is all about getting data off intranets and the internet, and making it far more comprehensible for people. The main push for semantic web in this case is in terms of making things standards-compliant, so that in the future, as more systems are able to interact with this kind of data, my research is able to become more valuable to the end user. At the same time, there is so much data inside businesses that can be accessed using the results of my research, and this data can once again be used to help people comprehend their environment.

The point of my semantic web research may sound rather odd – but I will make sure to clarify it more in the future – there is an actual ‘real-life’ product attached to this work – it isn’t just vapourware. I’ll post as soon as I am able to on that!

Righty, I’ll leave it there. I’ll make sure I keep you all updated about my work as it progresses over the next year.

Boca

Boca is part of IBM’s Semantic Layered Research Platform.Boca Overview

It is “the foundation of many of our components.It is an enterprise-featured RDF store that provides support for multiple users, distributed clients, offline work, real-time notification, named-graph modularization, versioning, access controls, and transactions with preconditions.”

To be honest, I am still trying to fully understand what that paragraph means entirely. I am in communication with the IBM people involved with this project, and it may turn out that this technology becomes a part of my research this year.

I think my options right now are to either:

  • Use Jena to store the graph data by querying a database that has a D2R server sitting in front of it (D2R allows for relational databases to be queried using SPARQL).
  • Use Sesame to store the graph data by querying Boca, which wraps around a database (mind you, the database is a custom Boca one).

Regardless of option taken, I will likely need to write custom SPARQL queries to populate the graph.

Having written this, I’m torn. Boca offers some cool features (particularly real-time notification, versioning, access controls and the ability to update data), but so does D2R server (the ability to map multiple databases together). Ideally, it would be possible to get both sets of functionality. Maybe the Semantic web client library can help there?

Introduction to the Semantic Web

My brain is presently in overload as I try to wrap my head around all the concepts of the semantic web. I thought I might as well try to get my thoughts out here, to help other people, and to selfishly help myself out (by either just clarifying my thoughts, or by being beaten by a stick from those in the know).

So, why should one care about the semantic web – isn’t web 2.0 doing just fine by being oblivious to this semantic web thing? I guess so, but then, the web 2.0 is a bandwagon anyone can jump on (and as an aside, my blog is now largely web 2.0 :p). Web 2.0 is all about, in my opinion, making the web a more user-focused place. Semantic web is all about (once again, in my opinion) all about making the web a more usable place.

Web 2.0 == User-focused?
All web 2.0 sites are about making the web do things it has never done before, using technologies like ajax. We all marvel at google maps, del.icio.us, etc, etc. It’s amazing – the web makes my life easier, or at least, more convenient. It also makes me more social in a geeky kind of way. I can now see what other people bookmark when I tag my bookmarks with a particular keyword. The opportunities explode really.

But Google maps, del.icio.us, etc, etc all have a common theme (if you don’t dig too deep) – they use meta-data to make the experience more valuable. What would Google maps be without the additional information about where places are. What about the whole basis of del.icio.us – tags? Without them, and what they can do when computed and commonalities found, del.icio.us would be nothing more than a static html page where I post my favourite links.

Semantic Web….Usable for who?
So…..meta data huh? That’s pretty web 2.0 I hear you all saying. The other thing that can use metadata is of course the computer you are sitting in front of right now (don’t worry – I’m not watching you). Imagine if there was a way for me to explain to the computer my details, and my friends. Imagine if my friends could do the same. Essentially, we’d be building up a big web of friends, and in a way, making a dynamic address book (insofar that I don’t need to update my friends details, as I will simply be linking to his profile remotely). Well, it exists – the FOAF project is all about this. It builds on top of technologies such as RDF to allow people to make their own ‘foaf files’. I have one here. My one is in a pretty pitiful state at the moment – but that doesn’t matter – I can update it as time progresses. Imagine if OldFriends gave each user account a FOAF file (or alternatively let them link in their own foaf file). I would love the opportunity to add all these old friends of mine into my foaf file.

You’re probably all wondering, why bother doing this? Well, for me, two reasons:

  1. I can visualise these people in a program like Centruflow. Imagine being able to see all your friends in what is essentially an constantly updated, and fully distributed address book. I could discover lost friends by navigating to the friends of my friends, and further out (‘To Infinity….and Beyond!’).
  2. I never maintain address books, as they are always my responsibility to update. FOAF is all about making each individual only have to worry about keeping their address book up to date

So….FOAF is the semantic web….seems kinda sucky?
No, FOAF is just an application built on top of the semantic web. It offers a small glimpse into what is possible when we give our computers a little structure to the information. You’ve probably noticed my site lists my contact details on the side – this is of no use to a computer really. My FOAF profile gives the computer everything it needs to help you.

So what are you doing again?
My work right now is on making it easier to get data from all over the semantic web (or just straight from the web in terms of databases), and making it possible for a computer to make far more sense from it. Want to say “I’m interested in things related to this widget here”? How would you do that now – go to Google and search through the results? My research will automatically give you suggestions on this kind of thing, but not based on computer-decided result, but on how people tag and inform the system using metadata. My work won’t be on the internet, as it will initially focus more on individual business intranets.

This explains why my head is exploding – I’m needing to learn a lot of technologies to allow for this work to be done.