[sugar] [Wikireader] english wikireaders and 0.7

Samuel Klein sj
Sun Sep 7 19:11:19 EDT 2008


Where is the code for this?  Lede-detection code is a priority for me,
and I'd like to work on it.  It should be easy to sense the start of
the first H2 and drop the rest of the article.

Is there some way to estimate the size impact on the whole of adding
one template (given how often it is referenced)?  If we could rank
templates by their footprint, it would be easier to "fill up" a space
allocation for them, as we do for images.

SJ

On Sun, Sep 7, 2008 at 7:02 PM, Chris Ball <cjb at laptop.org> wrote:
> Hi SJ,
>
>   > To Andrew -- thank you.  The 2% vandalism stat is very valuable!
>   > CJB, would it be possible to grab revision ids from this page,
>   > wherever there is a simple newline/title/oldid= ?
>
> Possible, yeah, but I'm not sure it'll be the best use of the time I
> have remaining to work on this once the work-week starts up again and I
> get back to blockers for the release.  We'd have to switch over from the
> "current versions" archive to the "all versions" archive, and then write
> scripts to create a new archive with the versions we want.
>
>   > Other replies inline: I am working on an article list here:
>
>   >   http://en.wikipedia.org/wiki/User:Sj/en-g1g1#D
>
>   > Agreed.  It seems that removing extraneous references to Harry
>   > Potter frees up another thousand articles or so...
>
> Can't tell whether this was humor.  ;-)
>
>   > en:wp articles tend to grow without shrinking.  Like you, I'm
>   > worried about not having enough articles to make a valuable
>   > reference work, especially in the sense of having a solid network
>   > of internal links.  I also see in this snapshot a lot of articles
>   > that are interesting but don't need to be nearly so detailed for
>   > our audience (and may simply bore).
>
>   > Can we try 6000 articles + 21000 ledes, to include every article in
>   > Martin's list?
>
> In principle, yeah, but like the revisions work it requires new work
> for detecting leads and putting them into their own articles.  My
> gut feeling is that this work just isn't important enough for this
> particular snapshot where our users have access to the net if they
> need it.  (Given time constraints.)
>
>   > I'm also happy with making this larger than 100MB for g1g1, perhaps
>   > even 150MB.  In the future our goal can be to expand coverage while
>   > reducing size... with less time pressure.
>
> Absolutely.
>
>   > We definitely need a template blacklist again.  How about the top
>   > 5000, excluding certain template categories?
>
> Another 5000 (small) articles is going to have a big impact on disk
> space, I think.  We'll see how it looks.
>
> Oh, Mad reminded me that you wanted to see a list of the 2k articles
> that are in the 10k slice and not the 8k slice.  Here it is:
>
>   http://dev.laptop.org/~cjb/enwiki/8k-10k-diff
>
> - Chris.
> --
> Chris Ball   <cjb at laptop.org>
>



More information about the Sugar-devel mailing list