10.2 Web Aggregators: Introducing Meerkat
Meerkat
was the first truly open-to-all, searchable, categorized aggregator.
Developed by Rael Dornfest, Chair of the RSS 1.0 Working
Group and researcher for O'Reilly (the publisher of
this book), Meerkat (http://www.oreillynet.com/meerkat/) takes the
RSS feeds from a whole range of sites, throws all of the
items into a big pot, and makes the resulting mix
searchable. Aggregating feeds in this way allows for custom feeds to
be made—"The latest news on Java," say, or "Everything
containing the keyword `sausages'
written in the past 30 minutes."
Meerkat is primarily usable via its web
interface, shown in Figure 10-2. This introduces two
main concepts for the application: profiles
and mobs. To use either of these features, you
must first sign up for a user ID with the
O'Reilly
Network (http://www.oreillynet.com).
Profiles are named sets of query parameters. There are
global parameters already loaded into
Meerkat, based on subject matter (Apache, Perl, P2P, etc.), and you
can save your own from the web interface.
Mobs, on the
other hand, are collections of stories. You use a mob like a
universally retrievable bookmark list. You can add stories to it via
the web interface, send its URL to other people, or even query and
display the mob via the Meerkat API.
The what? Well, let me tell you . . .
10.2.1 The Meerkat API
Like Syndic8, Meerkat offers an API
to allow other applications access to its database. Unlike Syndic8,
however, Meerkat is not a web service in the XML-RPC/SOAP mold.
Rather, it relies on the
REST-architectural system of
passing the entire query encoded into a
URL. This is very useful in the RSS
world, as it allows people to swap URLs of custom feeds from within
the existing framework of behavior—not just emailing them or
instant messaging to friends, but also by inclusion in blogrolls and
mySubscription lists. As such, the REST-based web aggregator provides
a different extended service to the directory or the desktop reader.
To query the Meerkat API, you pass it a URL, built up
from http://meerkat.oreillynet.com/?, and then a
query made out of the following parameters (any spaces in URLs are
replaced by %20):
- s= (Search For)
-
Instructs Meerkat to search for something in the
item's title
or description. This can be either a list of
keywords separated with a plus sign (+) or a
regular expression enclosed in //.
Example: http://meerkat.oreillynet.com/?s=eggs+ham
returns "any stories whose title or description
contains either `eggs' or
`ham'."
- sw= (Search What)
-
Ordinarily, Meerkat's s=
parameter will only search through title and
description elements. The sw=
option instructs Meerkat to search other specified fields. Currently,
Meerkat supports only the simpler Dublin Core elements, so
sw= can be either blank (hence: title,
description) or a combination of
dc_title, dc_creator,
dc_subject, dc_description,
dc_publisher, dc_contributor,
dc_date, dc_type,
dc_format, dc_identifier,
dc_source, dc_language,
dc_relation, dc_coverage, or
dc_rights.
Example: http://meerkat.oreillynet.com/?s=bod@exampleurl.com&sw=dc_contributor
returns all the item elements for which
bod@exampleurl.com is listed as contributor.
- c= (Channel)
-
This tells Meerkat to display only the requested channel. It takes
the numerical channel ID.
Example: http://meerkat.oreillynet.com/?c=1243 returns
only "stories from the
`oreillynet.python'
newsgroup."
- t= (Time Period)
-
This controls the maximum age of stories that Meerkat
displays. It takes a number, followed by: MINUTE,
HOUR, DAY, or
ALL. The number is optional (and meaningless) when
choosing ALL. The default setting is
1HOUR, so you must set this parameter to get
anything older.
Example: http://meerkat.oreillynet.com/?t=7DAY means
"show me stories from the past seven
days."
- p= (Profile)
-
This displays the stories in the manner chosen by a set
profile within Meerkat. You only
need to pass it the numerical ID of an existing profile.
Example: http://meerkat.oreillynet.com/?p=563 shows
"all stories caught by profile number 563 (the
O'Reilly Network)."
- m= (Mob)
-
Very similar to the p= parameter, but displaying
stories associated with a particular mob. You pass it the numerical
ID of the mob in question.
Example: http://meerkat.oreillynet.com/?m=123 gets you
"stories grouped under mob number
123."
- i= (ID)
-
This parameter displays a particular story. Each
item in the Meerkat database is assigned a
numerical ID. If you know the number, you can point directly at the
story. (To find it, go to the web interface, and hold your mouse over
the mob icon (ring of dots) to see the story's ID.)
Example: http://meerkat.oreillynet.com/?i=456 will
display only story number 456.
So
far, we've seen all the parameters needed to filter
exactly which stories to display. If you've been
adventurous, you will have found that retrieving the URL query in a
browser gives you a fancy HTML result with lots of Meerkat logos and
links to the rest of the O'Reilly Network site.
These are pretty, and provide much good reading, but they are of
little use to someone wanting to grab an RSS feed or another type of
output.
Happily, Meerkat introduces the concept of
flavors. By setting the parameter
_fl to a certain string, you get results back in
various ways:
- _fl=meerkat
-
The default setting, providing the full bells and whistles of the
Meerkat page.
- _fl=tofeerkat
-
A lighter version of the Meerkat page.
- _fl=minimal
-
A very light version of the Meerkat page.
- _fl=rss
-
Provides the results as a simple
RSS 0.91 feed.
- _fl=rss10
-
Provides the results as an RSS 1.0 feed.
- _fl=xml
-
Provides the results in a bespoke
XML format.
- _fl=js
-
Provides the results in a
JavaScript file, which, when
parsed, displays the results in an XHTML format.
- _fl=php
-
Provides the results in a PHP-serialized string.
So, now not only can we query the Meerkat database of feeds, but we
can also get feedback out again. It gets better. Meerkat offers finer
control over exactly what it produces with some Boolean switches
(0 = off, 1 = on) that turn
various output features on or off:
- _de= (Descriptions)
-
Turns on or off story descriptions or blurbs. You lose some of the
story detail but gain a compact display for easy scanning.
Example: http://meerkat.oreillynet.com/?_de=0 means
"without descriptions."
- _ca= (Categories)
-
Meerkat places each feed into a category hierarchy of its own, and
certain flavors display this. If you don't want to
use these for anything, you can turn them off.
Example: http://meerkat.oreillynet.com/?_ca=0 means
"no categorization."
- _ch= (Channels)
-
Turns the channel display on or off for the flavors that care about
it.
Example: http://meerkat.oreillynet.com/?_ch=0 means
"turn off the display of channels."
- _da= (Dates)
-
Turns on or off the display of the date Meerkat first saw the story.
Example: http://meerkat.oreillynet.com/?_da=0 means
"dates? I don't need no
dates."
- _dc= (
Dublin Core Metadata)
-
The RSS 1.0 flavor contains mod_dc information.
You can remove this information with this parameter.
Example: http://meerkat.oreillynet.com/?_dc=0 means
"plain and simple, DC-free is for
me."
So,
you're asking, how do we stick all these together to
make something cool? Well, we separate the parameters with an
& character.
For example, a query that produces an RSS 1.0 feed of the keyword
search for "Ben" for stories up to
a week old looks like this:
- http://meerkat.oreillynet.com/?s=Ben&_fl=rss10&t=1WEEK
Whereas a query for anything on Java in the past hour, in RSS 1.0 but
without Dublin Core Metadata or Categories, looks like
this:
- http://meerkat.oreillynet.com/?s=Java&_fl=rss10&_dc=0&_ca=0
|