[Its.an.education.project] Fwd: Its.an.education.project curiosity

Sun May 18 19:53:03 CEST 2008

Hi List,

I sent this to Marco, off-list, as a curiosity (I'm not currently  
subscribed to its.an.education.project so please cc me if needed).  
It's basically a self organising map (SOM) visualisation of the new  
list activity up to a day or two ago (more description of the  
visualisation technique in my email below). Here's a link to the image:

	http://garycmartin.com/som/2008-May-its-an-education-project-list-som.jpg

I was trying to get an idea for the topics being covered here without  
burning my time by reading all of the archive, but if folks find it of  
some use I could automate the process. Perhaps monthly/weekly list  
views, wiki content, etc?

--Gary

Begin forwarded message:

> This is very nice! I'd actually post it on the list, you don't need to
> be subscribed to do so...
>
> Cheers,
> Marco
>
> On Sat, May 17, 2008 at 5:42 AM, Gary C Martin  
> <gary at garycmartin.com> wrote:
>> Hi Marco,
>>
>> Just a random aside, didn't want to post formally as this is just a  
>> rough,
>> but though you might find it vaguely interesting/curious.
>>
>> I recently noticed the its.an.education.project, but I'm reticent  
>> to join
>> yet another list that potentially another set of distracting rants  
>> or a
>> talking shop – be they good or bad, my opinion reading bandwidth is  
>> pretty
>> saturated now with the existing lists... Anyway, I've been working  
>> on a Self
>> Organising Map (SOM) that uses a geographic like landscape metaphor  
>> for
>> visualisation***, and recently hooked up a text front end for  
>> extracting
>> word distance metrics from bodies of text – I've been testing it on  
>> works of
>> literature from Project Gutenberg up to now, but have wanted to try  
>> it on
>> bulk mail feeds for a while.
>>
>> *** Code is all Python/PIL but too CPU intensive for an XO, though  
>> it might
>> make nice visual index/content pages for wiki content and such,  
>> with URL
>> links...
>>
>> The SOM acts as a kind of spacial summariser visualisation of the  
>> content,
>> where height indicates strong connections between terms, proximity
>> represents term association, and label size is a rough guide to basic
>> frequency of the term. Now there are many "correct" maps for the  
>> same set of
>> data, each generation will usually settle into a slightly different  
>> set of
>> local minima, but the associations are no less valid for each.
>>
>> It's currently picking the top ~200 terms by frequency, after  
>> removing
>> linguistic junk words. Here's the map that generated for the
>> Its.an.education.project May archives, as of yesterday. Note that  
>> the map is
>> continuous (wraps around North/South and East/West, surface of a  
>> torus
>> actually).
>>
>> Probably just a curiosity, but might be more useful on your disk  
>> than mine.
>>
>> --Gary
>