Don't follow *me*, follow me/tags/interesting-to-you

This post assumes some knowledge of the technologies coming to define the social web, including WebFinger, FOAF and XFN. For an introduction to these technologies, check out my Primer on the Social Web and The Advent of Following.

The Problem

In The Advent of Following, I concluded with a problem: it has become all too easy to follow someone on the Web, but even more difficult to follow just the content you want to. Between FriendFeed and Buzz, following someone and everything they do online is pretty possible and the experience will only get easier and richer. More and more people will sign up and more and more services will be aggregated, either because they publish content in a (possibly de facto) standard format, like Atom, or by brute force (manual integration of some kind). However, already this is making it difficult to sift through the noise and find just the content I am interested in.

Recently, Robert Scoble has been writing about a similar topic. Scoble talks about individual tweets, Flickr images and blog posts as info atoms and the need for curators (bloggers, journalists, etc.) such as himself to create info molecules out of these atoms. Scoble's needs as a blogger are different than mine as a media consumer, but our problems are much the same. Whether a blogger or a media consumer, we need a way to point to a group of related content.

Pointing Towards a Solution

This problem is largely solved on a site-by-site basis. The solution, or a big part of it, is tags. Here are all of the bookmarks I've saved to Delicious having to do with photography. Here are all of the photos I've taken at Mass MoCA. If you want to, today, you can subscribe to feeds of either of those curated groups of content, but this is a site-specific solution. If I post a Mass MoCA image to Smugmug or Facebook, you won't see that.

What we really want is to be able to follow the people we like but automatically filtered to the content we're interested in. If I, as a media producer, am consistent with the tag names I choose from site to site, then this should be possible.

Currently, with WebFinger, you can follow me@gmail.com on Buzz (and anything else that can make sense of the FOAF/XFN/WebFinger web (de facto) standards), but you will see everything. The problem is, no one wants to follow everything I do online (even my wife; I'm sure she has no interest in Clojure-related blog posts, for instance). People don't really want to follow me@gmail.com, they want to follow me@gmail.com/tags/what-they-find-interesting. As long as I, as content creator, tag content consistently across sites, this could be powerful.

The Wavy-Hands Proposal (where we get a little technical)

These technologies are too new to me for me to suggest anything near fully-baked. However, I think I've got enough of the details down that this isn't too far off. Either way, I'm going to spew out what's been floating around in my head and will just consider it a jumping off point for the ensuing conversation.

Currently, WebFinger uses a site's host-meta file (http://example.com/.well-known/host-meta) to find a URL template that allows one to turn acct:me@gmail.com into a URL to an XRD document describing me. If this XRD document points to a FOAF document listing my accounts online, then my Webfinger address, acct:me@gmail.com, essentially becomes a pointer to everything I do online (wavy hands flailing here).

Now, just as a host-meta file can support WebFinger by providing a template to turn a WebFinger URI into a URL to a profile page, what if it also provided a template for looking up an account holder's tags? E.g., it could look something like:

<xrd xmlns:hm="http://host-meta.net/xrd/1.0"
     xmlns="http://docs.oasis-open.org/ns/xri/xrd-1.0">
  <hm:host>example.com</hm:host>

  <link rel="http://somestandardsbody.org/rel/tags"
        template="http://example.com/users/{user}/tags" />
</xrd>

This would provide a standard way for services to look up filtered data about its users. If all publishing services provided this, it could be possible for someone to look me up on Buzz and following anything I tag "Clojure", for instance, whether it be a blog post solving some Project Euler problem, a Flickr image of the class hierarchy of Clojure's code base or a tweet of some Clojure golf fun. They wouldn't have to sift through all of the non-Clojure-related content I produce all over, assuming I take a few seconds to tag everything appropriately.

It may even be possible to extend WebFinger to provide the same kinds of pointers to these curated groups of content as we do people. If acct:me@gmail.com points to me, could acct:me@gmail.com/tags/Clojure point to everything I've produced tagged Clojure?

Is there any work like this going on? I haven't seen anything, but I'm new to this space and would be happy to be pointed in the direction of ongoing discussion. If not, is there interest in pursuing something like this? The benefits in terms of media consumption, curation (a la Scoble) and discussion seem huge. We all need a way to sift signal from noise and this could be a big step towards that goal, Web-wide.

The Advent of Following

On the Road to Following

At least since the birth of RSS, it has been possible to pull in all of the latest content from the friends, luminaries and journalists you most wanted to track online. This move made the Web much more user-centric than it had been. It was the Information Age of the Web, if you will. Instead of browsing to 100 different destinations to read the content you were interested, you went to one destination and read your content there.

The next step on the road to following was the blooming of the social networks. From Friendster to MySpace to Facebook to the penultimate Classmates.com, it was finally possible to keep up to the day to day trials and tribulations of your closest friends and family (and the guy who pants'ed you in fourth grade and has since become enlightened, or so he says).

Still, the notion of following, and the idea that the moniker @foo could uniquely identify an individual didn't really take hold until the birth of Twitter. Twitter opened up the possibility of finding out not only what your buddy John had for lunch, but what Paris Hilton and Oprah had for lunch! But it was more than that, users began tweeting their latest blog posts, their latest Flickr images, their and their latest and favorite YouTube videos. Twitter turned into a place to follow what someone was doing online. Instead of finding the RSS feeds your friends' and idols' blogs, images, videos, etc., etc., you can just follow all of their activity in one place, on Twitter.

The Rise of Aggregation

But their was a problem. A couple, really. First, Twitter is not a great place for conversations. There are exceptions, but for the most part everything is one-way on Twitter. I can post this blog entry there and my followers can see it, they may even @-reply back to me, but none of my other followers will see that comment. Second, with Twitter the onus is on me, the author, to push all of my content there. If I remember to publish my Flickr images but not my Picasa images, they my followers will miss stuff. This emphasizes the fact that on Twitter, you are not following a friend, you are following a friend's Twitter account. What doesn't exist there might not as well exist. Lastly, publishing everything I do online to Twitter makes my tweet stream very noisy. It's much better to self-curate what content you publish to Twitter so only the best stuff gets through so as not to overwhelm your followers.

These are the problems the aggregators were trying to solve. The most successful of these was FriendFeed. It allowed you to specify all the accounts you had across the Web, from not only Facebook, Flickr and Twitter, but Yelp! and Laconica and Delicious and anything that published an RSS feed of your content and it would aggregate all of that content in one place. You set it up once and all of a sudden you have a single location for people to follow what you do online. As for the three problems Twitter had in this role, 1) FriendFeed allowed conversations around each post, 2) all of my content gets pulled in automatically once I set it up, 3) my followers on FriendFeed could "hide" content from services they weren't interested, so they could see all of my blog posts and tweets but ignore reviews of my small town restaurants they'll never be in the vicinity of.

Social Web Technologies

Despite appearances, and coming extremely close, FriendFeed was not a panacea. The biggest problem is that it is a closed system. Yes, they have an API and let all of their data out. They went way further than just about anybody out there. However, if I cannot run software on a server of my choice and participate in your system, then you are a closed system. This is antithetical to the ideals of the Web and no method of following an individual on the Web can be built from these foundations.

Thankfully, this was not the end of the road. A new group of technologies is currently evolving to fill the role currently best served by FriendFeed. As mentioned in my Primer on the Social Web, FOAF and XFN provide a way for me to specify all of my accounts and build a little mini-web where those interested can follow what I do online. Webfinger allows me to name my mini-web. Sites such as Google Buzz act as a view of this web, not as silos like the old social networks. Sure, the comments are mostly owned by Buzz at the moment, but that will change when Salmon is implemented, which is the plan. With Salmon, if someone comments on a blog post of yours in Buzz, that comment will route it's way back to your blog.

Once these technologies catch hold, we will be in the next era of following, where you can actually follow a person, not a feed or a service or a series of tweets, on the Web. If I give you my email address, assuming it corresponds to a WebFinger account, like my GMail address does, everything should work automagically.

There is just one problem with all of this that I have not seen a solution for yet. There was something great, almost empowering, about the old, original method of following, where you subscribe to whatever RSS feeds you find interesting. It meant I could read all of the great writing an expert in my field published without 1,000 pictures of his cat being pushed in my face. Similarly, I could view all the pictures of my friends baby without having to comb through all the parenting-related bookmarks they saved to Delicious. With all of this great Social Web technology, how can I follow the people I want while still maintaining some control over the content I see?

That will be the subject of my next post, along with a possible solution (or, at least the steps towards one).

Short Primer on the Social Web (as defined by FOAF, XFN and WebFinger)

Since its creation, the Web has proven itself as a great social platform. Though we've come along way from talk, BBS's and chat rooms, there still remains a big problem with social software on the Web, that of identity.

As it stands today, my online persona is split in a hundred different ways. Am I http://facebook.com/jmcconnell or http://twitter.com/jdot? What about http://del.icio.us/jdot and http://flickr.com/j-dot? Of course, I am all of these, but if you only follow me on Twitter or friend me on Facebook, you'll only see a sliver of my online identity (unless I do a bunch of work to cross-post all of my content to all of my accounts, which just annoys anyone following me on more than one of them).

So what is the solution? The solution is to build the fabric of this social landscape and content into the Web itself. Just as the original Web was built on the idea of linking documents to one another, the Social Web is built on the idea of linking:

  1. People to one another
  2. People to the content they author and sites they own

The two main technologies involved in enabling this linking are FOAF and XFN. What FOAF allows me to do is create one or more files that list my online accounts and my friends. XFN allows me to do the same thing on a case by case basis when I link to something. For example, to link to my Flickr account, instead of the usual:

<a href="http://flickr.com/j-dot">My Flickr Account</a>

I could use:

<a href="http://flickr.com/j-dot">rel="me">My Flickr Account</a>

That rel="me" is part of XFN and allows web crawlers, like search engines, see that I consider that link to go to a page of mine. If that page also has a rel="me" link back to this page, then the search engine can deduce "Well, the author of this blog post claims that this page of photos is his and the photographer that created these photos claims that he also wrote this blog post, so they must be the same person."

http://www.flickr.com/photos/dimmerswitch/ / Creative Commons

As FOAF and XFN get incorporated into all of the sites I have accounts on, a web of all of these rel="me" links will start to grow, as each of my pages links to the others. This is a big step forward as we will have a way of aggregating all of someone's publicly available content. However, a big question arises, how do we refer to this little "web", a subset of the Web itself?

Another way of asking this is, how do we identify this web of pages? The natural desire is to identify it as me, but there has never been a good way of identifying an individual on the Web. This is where our last technology comes in, WebFinger. (For the geeks, WebFinger is so named because it acts as a Webified version of the old finger protocol that was ubiquitous on Unix systems.) Basically, WebFinger is a way of taking something that looks like an email address (and may very well be one) and using it to uniquely identify an individual on the Web. It also allows for associating some metadata to that individual. One possibility would be to associate a FOAF document with the individual.

So, what does this buy you? Well, recall that a FOAF document can contain a record of all of the various accounts you hold online and who all of your friends are. In other words, it can identify all of the nodes in this "web" that we've associated with you. Now, with WebFinger, we have a name for this web.

This isn't as pie in the sky as it sounds, either. Google has taken the step of enabling WebFinger on all GMail accounts. So, if you have a [your name]@gmail.com email address, then that can now act as a WebFinger identifier. If you spend some time filling out your Google Profile, Google will build you your own little web and name it [yourname]@gmail.com. In fact, this acts as the underpinnings of Google Buzz.

Now, there are a lot of details I've left out and doubtless some that I've gotten wrong. My purpose here is not to be a definitive reference for all things FOAF, XFN and WebFinger. Instead, I just wanted to open some eyes to the possibilities and lay some groundwork for an idea on how we can better get ourselves unhitched from the silos that are Twitter, Facebook and every other social network out there.

Clojure Ant Tasks

While doing some recent work on a Facebook app I've been playing around with as a testbed for familiarizing myself with Facebook's API, Google App Engine and Clojure, I ran into the problem that I was copying and pasting my Ant setup for compiling and testing my Clojure code. This I did not feel very good about. The reason was that, though it was easy enough to just call out to <java></java> to run clojure.lang.Compile and execute the tests, there was always extra work to get the build failing when it should, especially in the case of the tests. It made me really yearn for a native Ant task that could handle the compilation and testing of Clojure code.

Thus was born Clojure Ant Tasks.

There isn't much to say about them. They pretty much do what they claim to, compile and test Clojure code. One nice thing about them is that they are written in Clojure, using gen-class. I imagine this will make it much easier to keep in sync and take advantage of future Clojure updates. Currently, they support Cojure 1.0 and the Clojure 1.0-compatible Clojure-Contrib tag. The reason is that I had to choose between supporting clojure.contrib.test-is or clojure.test and I felt it was better to stick to 1.0 for now, since that is more likely to be used in production. If there is demand for it, I can look at supporting both via a "version" attribute or something.

Both the compilation and test tasks support <namespace></namespace> elements and <fileset?</fileset>'s. One benefit to supporting both is that it provides the ability to specifiy compilation order while at the same time providing the flexibility and convenience <fileset></fileset>'s offer. For instance, if all of your code lives in "src" and has no dependencies on the order in which it is compiled other than a single gen-class'ed namespace that must be compiled first, you can do this:

<clojure-compile>
  <classpath>
    <path refid="sources.and.classes"></path>
  </classpath>
  <namespace>com.foo.genclassed.ns</namespace>
  <fileset dir="src" includes="**/*.clj"></fileset>
</clojure-compile>

Here, com.foo.genclassed.ns will be compiled first and, subsequently, every other *.clj file in src will be compiled.

For more information and to see the format of the tasks, check out the README. Feature requests, bug reports and all other feedback are welcome over on GitHub. Feel free to either message me or open a ticket directly.

Enjoy!

Project Euler in Clojure - Problem 2

In our last post, we solved Problem 1 from Project Euler and learned a bit about sequences and filtering while we were at it. Now we'll tackle Problem 2, which reads:

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
 
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
 
 Find the sum of all the even-valued terms in the sequence which do not exceed four million.

So, it seems pretty clear that the first step is some method of generating the Fibonacci sequence. There are a few ways of thinking about this, but the most straightforward is simply to code up the function fib(0) = 0, fib(1) = 1, fib(n) = fib(n-1) + fib(n-2):

(defn fib [n]
  (cond
    (= n 0) 0
    (= n 1) 1
    :else (+ (fib (- n 1)) (fib (- n 2)))))

Now, we want a sequence of the values of fib for increasing values of n, and we'll filter out the evens from that. Since we don't know how many values we are going to need up-front, we will use a lazy sequence, which is potentially infinite. The basic idea behind lazy sequences is that the sequence is represented by a cons cell which contains the first element of the sequence in its left half and a function that can generate the rest of the sequence in its right half. When this function is evaluated, it returns a new cons cell containing the second element of the list and a new function that can, again, generate the rest of the list and on and on, ad infinitum. For instance, a lazy sequence that can generate the natural numbers might look something like this psuedo-code:

(def natural-numbers
  (cons-cell 0
             (fn next []
               (fn [n]
                 (lazy-cons-cell (+ n 1)
                                 (next (+ n 1)))))))

A consumer of the sequence would pull the 0 and then evaluate the next function, which produces a cons cell that looks like (1 . (next 1)). Now, (next 1) returns a function, which, when evaluated, returns a cons cell looking like (2 . (next 2)), etc. So, the effect is that you get a sequence of cons cells where the first element in each successive cons cell is the next natural number. The &quot;lazy-cons-cell&quot; function (macro, really) just delays execution of (next (+ n 1)) until the consumer requests its value. That's what makes this a lazy sequence, computation happens on demand, which is awesome for large, infinite and compute-intensive sequences.

Okay, so now that we have the notion of lazy sequences down, we can try to come up with a lazy sequence based on our fib function. In Clojure, lazy sequences are born out of the use of lazy-cons in place of cons when building up a list. So, a lazy sequence of the Fibonacci numbers would be:

(def fib-seq
  ((fn f [i] (lazy-cons (fib i) (f (inc i)))) 0))

The magic here is in the inner lazy-cons expression, (lazy-cons (fib i) (f (inc i))). This builds our cons cell with the current Fibonacci value on the left and a function that can generate the rest of the sequence on the right. What is going on in the entire expression is that we are defining a global Var fib-seq to be (f 0) where f is defined to be the result of our lazy-cons expression. So, when we pull the first value out of fib-seq, we should get the left side of the cons cell, which is (fib 0), or 0. The next value will be (fib 1), which is 1, and so on.

The rest of the problem is easy and will be much like our solution to Problem 1. The only new thing here will be a function called take-while, which takes a predicate and a sequence and will take values out of the sequence as long as the predicate returns true for the current value.

(apply + (take-while #(< % 4000000)
           (filter #(even? %) fib-seq)))

So, we just pull even values out of fib-seq, thanks to our even? filter, as long as the value is less than 4,000,000 and sum the results. Tada!

There are better ways at generating the Fibonacci sequence, this just seemed the most straitforward to start with. A few good options are listed on the Clojure wiki here. In particular, check out the last one, which builds on a solution Rich proposed and is quite nice.

That's it for this problem. The code is available on GitHub. As always, comments, questions, problems and suggestions are welcome.

Project Euler in Clojure - Problem 1

I've been keenly interested in Clojure for awhile now and have been slowly adding to my skill set. In order to continue my progress and, hopefully, help others that are new to the language, I thought it would be worthwhile to have a series of posts where I tackle the Project Euler problems in Clojure, comparing and contrasting various approaches where appropriate. I'll start out pretty introductory and will hopefully build up to more advanced concepts quickly. While I pursue this I'll be committing the solutions to a github repository.

This is the first post in this series and I'll tackle Problem 1. From the Project Euler site:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

So, the first effort will be the straightforward route. We need to generate the natural numbers between 1 and 1000 and filter those that are divisible by 3 and/or 5. Then, we'll sum all of the numbers that make it through the filter.

In Clojure, a series of values like this, a subset of the natural numbers, is represented by a sequence. A sequence is a view over a collection, not a concrete data structure in its own right. The sequence of natural numbers from 0 to 1000 is created using the range function:

user=> (range 1000)
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...)

Now we need to pull out the numbers that are divisible by 3 and/or 5. In other words, we need to "filter" the sequence to create a new sequence (which is just a different view of the current sequence) that only contains the numbers we want:

user=> (filter #(or (zero? (rem % 3))
                    (zero? (rem % 5)))
               (range 1000))
(0 3 5 6 9 10 12 15 18 20 21 24 25 27 30 33 35 36 39 40 ...)

The #(...) syntax is just sugar to stand in for (fn [] (...)) to ease the creation of anonymous functions, or lambdas. The first argument passed to the lambda can be accessed by % or %1 and, in general, the nth argument passed to the lambda can be accessed by %n. So, our lambda is just getting passed the next item from the sequence to filter and it will return true if that item is divisible by 3 or 5.

Now, all that's left is to sum these up, so we'll apply the + function to all of the items in our filtered sequence, which will keeping a running total until it runs out of items:

user=> (apply + (filter #(or (zero? (rem % 3))
                             (zero? (rem % 5)))
                        (range 1000)))
233168

Voilá!

Certainly there are other ways to attack this problem and perhaps I'll revisit it in the future if any are particularly interesting.

Questions, comments, suggestions and hints are all welcome in the comments!