[Systems] Excessive scraper queries
Bernie Innocenti
bernie at sugarlabs.org
Wed Apr 2 14:15:46 EDT 2014
On 04/01/2014 06:36 PM, Ignacio Rodríguez wrote:
> Hi Bernie,
>
> Thanks, I forget to stop the script -Ops-
> (I'll stop it)
> By the way, this check the aslo page, don't download..
> (Look: http://people.sugarlabs.org/ignacio/.aslo_scrap/)
Maybe it didn't save the files anywhere, but it did send GET requests
for _all_ the activities, as the logs below show.
> 2014-04-01 17:01 GMT-03:00 Bernie Innocenti <bernie at sugarlabs.org
> <mailto:bernie at sugarlabs.org>>:
>
> Hello Ignacio,
>
> while I was investigating a logs bloat issue on sunjammer, I noticed
> that apache is being hit by about 1 million requests per day from a
> python script running locally:
>
> activities.sugarlabs.org:80 <http://activities.sugarlabs.org:80>
> 2001:4830:134:7::11 - -
> [23/Mar/2014:01:59:36 -0400] "GET /es-ES/sugar/addon/4461 HTTP/1.1"
> 200 3
> 7522 "-" "Python-urllib/2.6"
> activities.sugarlabs.org:80 <http://activities.sugarlabs.org:80>
> 2001:4830:134:7::11 - -
> [23/Mar/2014:01:59:38 -0400] "GET /es-ES/sugar/addon/4467 HTTP/1.1"
> 200 8
> 55 "-" "Python-urllib/2.6"
> [...]
>
> This is not necessarily forbidden, but I'd like to understand what this
> script does and whether it *really* needs to run so aggressively.
>
> Note that the files are available on the local filesystem, there's no
> need to read them periodically over http.
>
> --
> Bernie Innocenti
> Sugar Labs Infrastructure Team
> http://wiki.sugarlabs.org/go/Infrastructure_Team
>
>
>
>
> --
> Ignacio Rodríguez
--
Bernie Innocenti
Sugar Labs Infrastructure Team
http://wiki.sugarlabs.org/go/Infrastructure_Team
More information about the Systems
mailing list