[Bugs] #1348 UNSP: infoSlicer not able to download new articles

Sugar Labs Bugs bugtracker-noreply at sugarlabs.org
Wed Oct 28 19:22:42 EDT 2009


#1348: infoSlicer not able to download new articles
------------------------------------------+---------------------------------
    Reporter:  walter                     |          Owner:  walter                     
        Type:  defect                     |         Status:  new                        
    Priority:  Unspecified by Maintainer  |      Milestone:  Unspecified by Release Team
   Component:  InfoSlicer                 |        Version:  Unspecified                
    Severity:  Blocker                    |       Keywords:                             
Distribution:  Unspecified                |   Status_field:  Unconfirmed                
------------------------------------------+---------------------------------
Changes (by jpichon):

 * cc: jpichon (added)


Comment:

 I'm attaching a patch that fixes the article retrieval issue. I noticed
 afterwards that most headings were gone from articles from the English
 wikipedia and a few headings went missing in the other wikipedias as well,
 the 2nd patch would fix this by treating more tags as having relevant
 content.

 There's still another aesthetic problem, whereby there're a few blank
 lines at the top of newly downloaded articles. I haven't been able to fix
 that yet, I just know that it's related to the pre_parse function in
 HTML_Parser.py. The only workaround I have for now is to reinitialise
 self.input with BeautifulSoup after calling pre_parse(), I'm not sure if
 that would be appropriate for a patch. I still hope to figure out what's
 the real problem.

-- 
Ticket URL: <http://bugs.sugarlabs.org/ticket/1348#comment:1>
Sugar Labs <http://sugarlabs.org/>
Sugar Labs bug tracking system


More information about the Bugs mailing list