[IAEP] Feedback request for new SOM variant
Gary C Martin
gary at garycmartin.com
Sun Feb 1 20:44:14 EST 2009
On 1 Feb 2009, at 21:00, Frederick Grose wrote:
> Gary, I enjoy this feature. Do you have a narrative of the map
> assembly to reference?
I think I've written a fair narrative at the top of http://sugarlabs.org/go/Sugar_Labs/SOM
but I can expand if that's not enough detail describing the process
(ping if I've fluffed over some detail you're interested in). Will
read through again just to make sure it's up-to-date given I move on
with these changes.
> By density, do you mean the strength of association of the word
> among all others in the set?
By density I'm trying to describe mean term use proximity, after
correcting for term frequency (e.g 100 'sugar' next to 10 'activity'
is a link equal in weight to 50 'developer' next to 5 'activity').
This is how the current (and proposed) data set is generated (as per
the last months worth) prior to clustering by the SOM, this allows the
SOM to better cluster (I think) the fine details of usage, rather than
having large areas drowned out by some very high frequency term (e.g.
sugar, think, use, work, activities...).
The new map rendering pass is taking that clustered data, no changes,
but using (above) density * term frequency so that usage frequency can
pull up/down the link weight topography (i.e. above example 100
'sugar' vs 50 'developer', should indicate 'sugar' as a stronger term
than 'developer').
> The topography of the two images certainly changes my sequence of
> examination for the same data. From the first map, if 'irc' &
> 'freenode' are most associated with the week's words
The first map highlights a tightly bunched (but relatively low
frequency) set of common terms in communications about the first
ActivityTeam IRC meet up on Friday. For comparison, the term 'sugar'
was used _way_ more in communications related to all manor of misc
terms, but not with enough consistency to pull 'sugar' up onto a
topographic peak for that first map. The second map adjusts the
topographic peaks for link weights * frequency, so infrequent but
strongly linked terms are dialled down (somewhat), and high frequency
but diversely linked are dialled up (somewhat).
Regards,
--Gary
> perhaps words from the post headers should be considered among the
> junk words.
>
> Thank you for your contributions!
> --Frederick
>
> On Sat, Jan 31, 2009 at 9:12 PM, Gary C Martin
> <gary at garycmartin.com> wrote:
> I've been putting in some more time on the SOM code the last couple
> of weeks (high resolution contour lines being one improvement). I
> wanted to bounce 2 (rather low-res) maps of this weeks IAEP content
> off the list for feedback. The first (2009-January-24-30-3-mean)
> uses the same fully normalised procedure as per the last month of
> maps (i.e height == term link density); the second (2009-
> January-24-30-3-corrected) has an identical cluster layout, but has
> its topographic height re-adjusted to bring back in the term
> frequency (i.e height == term link density * term frequency). I
> think this 2nd map gives us the undistorted term layout as seen in
> maps of the past month, while allowing term frequency to have some
> influence again over the topography (maps last year all had term
> frequency frequency).
>
> Best of both worlds?
>
> I'd like to roll with the new version (2009-January-24-30-3-
> corrected), any strong opinions/objections? I'm not sure who is
> looking/using these (the main goal for them was as an overview and/
> or allow reflection of the previous week's discussion).
>
> Regards,
> --Gary
>
> P.S. Two maps below are identical other than the topographic
> colorisation and a different auto-selected centre point.
>
> P.P.S Past SOMs and information at http://sugarlabs.org/go/Sugar_Labs/SOM
>
>
>
>
>
>
>
> _______________________________________________
> IAEP -- It's An Education Project (not a laptop project!)
> IAEP at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/iaep
>
More information about the IAEP
mailing list