[Systems] Excessive scraper queries

Bernie Innocenti bernie at sugarlabs.org
Wed Apr 2 14:15:46 EDT 2014


On 04/01/2014 06:36 PM, Ignacio Rodríguez wrote:
> Hi Bernie, 
> 
> Thanks, I forget to stop the script -Ops- 
> (I'll stop it)
> By the way, this check the aslo page, don't download..
> (Look: http://people.sugarlabs.org/ignacio/.aslo_scrap/)

Maybe it didn't save the files anywhere, but it did send GET requests
for _all_ the activities, as the logs below show.


> 2014-04-01 17:01 GMT-03:00 Bernie Innocenti <bernie at sugarlabs.org
> <mailto:bernie at sugarlabs.org>>:
> 
>     Hello Ignacio,
> 
>     while I was investigating a logs bloat issue on sunjammer, I noticed
>     that apache is being hit by about 1 million requests per day from a
>     python script running locally:
> 
>       activities.sugarlabs.org:80 <http://activities.sugarlabs.org:80>
>     2001:4830:134:7::11 - -
>     [23/Mar/2014:01:59:36 -0400] "GET /es-ES/sugar/addon/4461 HTTP/1.1"
>     200 3
>     7522 "-" "Python-urllib/2.6"
>      activities.sugarlabs.org:80 <http://activities.sugarlabs.org:80>
>     2001:4830:134:7::11 - -
>     [23/Mar/2014:01:59:38 -0400] "GET /es-ES/sugar/addon/4467 HTTP/1.1"
>     200 8
>     55 "-" "Python-urllib/2.6"
>      [...]
> 
>     This is not necessarily forbidden, but I'd like to understand what this
>     script does and whether it *really* needs to run so aggressively.
> 
>     Note that the files are available on the local filesystem, there's no
>     need to read them periodically over http.
> 
>     --
>     Bernie Innocenti
>     Sugar Labs Infrastructure Team
>     http://wiki.sugarlabs.org/go/Infrastructure_Team
> 
> 
> 
> 
> -- 
> Ignacio Rodríguez


-- 
Bernie Innocenti
Sugar Labs Infrastructure Team
http://wiki.sugarlabs.org/go/Infrastructure_Team


More information about the Systems mailing list