[Its.an.education.project] Fwd: Its.an.education.project curiosity
Gary C Martin
gary at garycmartin.com
Sun May 18 19:53:03 CEST 2008
Hi List,
I sent this to Marco, off-list, as a curiosity (I'm not currently
subscribed to its.an.education.project so please cc me if needed).
It's basically a self organising map (SOM) visualisation of the new
list activity up to a day or two ago (more description of the
visualisation technique in my email below). Here's a link to the image:
http://garycmartin.com/som/2008-May-its-an-education-project-list-som.jpg
I was trying to get an idea for the topics being covered here without
burning my time by reading all of the archive, but if folks find it of
some use I could automate the process. Perhaps monthly/weekly list
views, wiki content, etc?
--Gary
Begin forwarded message:
> This is very nice! I'd actually post it on the list, you don't need to
> be subscribed to do so...
>
> Cheers,
> Marco
>
> On Sat, May 17, 2008 at 5:42 AM, Gary C Martin
> <gary at garycmartin.com> wrote:
>> Hi Marco,
>>
>> Just a random aside, didn't want to post formally as this is just a
>> rough,
>> but though you might find it vaguely interesting/curious.
>>
>> I recently noticed the its.an.education.project, but I'm reticent
>> to join
>> yet another list that potentially another set of distracting rants
>> or a
>> talking shop – be they good or bad, my opinion reading bandwidth is
>> pretty
>> saturated now with the existing lists... Anyway, I've been working
>> on a Self
>> Organising Map (SOM) that uses a geographic like landscape metaphor
>> for
>> visualisation***, and recently hooked up a text front end for
>> extracting
>> word distance metrics from bodies of text – I've been testing it on
>> works of
>> literature from Project Gutenberg up to now, but have wanted to try
>> it on
>> bulk mail feeds for a while.
>>
>> *** Code is all Python/PIL but too CPU intensive for an XO, though
>> it might
>> make nice visual index/content pages for wiki content and such,
>> with URL
>> links...
>>
>> The SOM acts as a kind of spacial summariser visualisation of the
>> content,
>> where height indicates strong connections between terms, proximity
>> represents term association, and label size is a rough guide to basic
>> frequency of the term. Now there are many "correct" maps for the
>> same set of
>> data, each generation will usually settle into a slightly different
>> set of
>> local minima, but the associations are no less valid for each.
>>
>> It's currently picking the top ~200 terms by frequency, after
>> removing
>> linguistic junk words. Here's the map that generated for the
>> Its.an.education.project May archives, as of yesterday. Note that
>> the map is
>> continuous (wraps around North/South and East/West, surface of a
>> torus
>> actually).
>>
>> Probably just a curiosity, but might be more useful on your disk
>> than mine.
>>
>> --Gary
>
More information about the Its.an.education.project
mailing list