[IAEP] Feedback request for new SOM variant

Gary C Martin gary at garycmartin.com
Sun Feb 1 20:44:14 EST 2009


On 1 Feb 2009, at 21:00, Frederick Grose wrote:

> Gary, I enjoy this feature.  Do you have a narrative of the map  
> assembly to reference?

I think I've written a fair narrative at the top of http://sugarlabs.org/go/Sugar_Labs/SOM 
  but I can expand if that's not enough detail describing the process  
(ping if I've fluffed over some detail you're interested in). Will  
read through again just to make sure it's up-to-date given I move on  
with these changes.

> By density, do you mean the strength of association of the word  
> among all others in the set?

By density I'm trying to describe mean term use proximity, after  
correcting for term frequency (e.g 100 'sugar' next to 10 'activity'  
is a link equal in weight to 50 'developer' next to 5 'activity').  
This is how the current (and proposed) data set is generated (as per  
the last months worth) prior to clustering by the SOM, this allows the  
SOM to better cluster (I think) the fine details of usage, rather than  
having large areas drowned out by some very high frequency term (e.g.  
sugar, think, use, work, activities...).

The new map rendering pass is taking that clustered data, no changes,  
but using (above) density * term frequency so that usage frequency can  
pull up/down the link weight topography (i.e. above example 100  
'sugar' vs 50 'developer', should indicate 'sugar' as a stronger term  
than 'developer').

> The topography of the two images certainly changes my sequence of  
> examination for the same data.  From the first map, if 'irc' &  
> 'freenode' are most associated with the week's words

The first map highlights a tightly bunched (but relatively low  
frequency) set of common terms in communications about the first  
ActivityTeam IRC meet up on Friday. For comparison, the term 'sugar'  
was used _way_ more in communications related to all manor of misc  
terms, but not with enough consistency to pull 'sugar' up onto a  
topographic peak for that first map. The second map adjusts the  
topographic peaks for link weights * frequency, so infrequent but  
strongly linked terms are dialled down (somewhat), and high frequency  
but diversely linked are dialled up (somewhat).

Regards,
--Gary

> perhaps words from the post headers should be considered among the  
> junk words.
>
> Thank you for your contributions!
>        --Frederick
>
> On Sat, Jan 31, 2009 at 9:12 PM, Gary C Martin  
> <gary at garycmartin.com> wrote:
> I've been putting in some more time on the SOM code the last couple  
> of weeks (high resolution contour lines being one improvement). I  
> wanted to bounce 2 (rather low-res) maps of this weeks IAEP content  
> off the list for feedback. The first (2009-January-24-30-3-mean)  
> uses the same fully normalised procedure as per the last month of  
> maps (i.e height == term link density); the second (2009- 
> January-24-30-3-corrected) has an identical cluster layout, but has  
> its topographic height re-adjusted to bring back in the term  
> frequency (i.e height == term link density * term frequency). I  
> think this 2nd map gives us the undistorted term layout as seen in  
> maps of the past month, while allowing term frequency to have some  
> influence again over the topography (maps last year all had term  
> frequency frequency).
>
> Best of both worlds?
>
> I'd like to roll with the new version (2009-January-24-30-3- 
> corrected), any strong opinions/objections? I'm not sure who is  
> looking/using these (the main goal for them was as an overview and/ 
> or allow reflection of the previous week's discussion).
>
> Regards,
> --Gary
>
> P.S. Two maps below are identical other than the topographic  
> colorisation and a different auto-selected centre point.
>
> P.P.S Past SOMs and information at http://sugarlabs.org/go/Sugar_Labs/SOM
>
>
>
>
>
>
>
> _______________________________________________
> IAEP -- It's An Education Project (not a laptop project!)
> IAEP at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/iaep
>



More information about the IAEP mailing list