[IAEP] Get Internet Archive Books Activity available soon

Sayamindu Dasgupta sayamindu at gmail.com
Tue Jun 30 08:18:00 EDT 2009


On Mon, Jun 29, 2009 at 4:51 AM, Jim Simmons<nicestep at gmail.com> wrote:
> Sayamindu,
>
> On the OPDS issue with only linking to PDF the Internet Archive uses
> some pretty rigid file naming conventions, so if you want a DJVU and
> the URL for PDF is given it could be as simple as changing the
> filename suffix from .pdf to .djvu.
>

Yes, indeed - it looks that way :-)

> I especially hope that you will have time to give my Activity a try.
> I'm also interested to know what you think would be possible with OPDS
> that I'm not already doing.
>

I was dogfooding Sugar yesterday, and did try out your activity. I
think it can be used nicely as a basis for what we are trying to build
eventually. Some of the problems I have been thinking about include

* Searching by various metadata elements: eg Search by Genre, Search
by Author, etc
* Combining feeds from various sources : eg what if we want to merge
the data from Project Gutenberg and IA and then provide a single data
set to the user
* Caching and updating (crucial for limited bandwidth situations)


> Project Gutenberg has a huge XML file in "Dublin Core" format that
> tells you everything about their books except the URL to download them
> from, which makes their far simpler offline catalog a better deal for
> what I'm trying to do.  I'm a lot better pleased with the IA Advanced
> Search.  It seems to give your everything they have, though at times I
> wish that was more.  For instance, they have a field "publication
> date".  But it isn't the *books* publication date, it's the date the
> *ebook* became available.  And some of the books have decent
> descriptions but most just say who scanned and uploaded it and where
> they got the original book.
>

http://ia331315.us.archive.org/3/items/artofcaricaturin006061mbp/OMETA.xml
seems to provide a digitalpublicationdate element and a date element.
Dublin core normally handles this in the form:

<dc:date opf:event="original-publication">1869</dc:date>

<dc:date opf:event="ops-publication">Thu Jan 11 14:59:08 +0100 2007</dc:date>



> PG's contents are also available through the Internet Archive, so it
> might be possible to use my new Activity to download PG books in epub
> format, when you have that working.
>
> I read a book last week about the MIT Media Lab written by Stewart
> Brand back in the 1980's and back then the buzzword was convergence.
> That's how I feel now: lots of stuff *that close* to converging.  And
> when it does look out.  We'll bury those kids in books.
>

Yes - and that is what I'm slightly worried about :-) Having a way to
easily search through and categorise these books is important, and as
part of Sugar/OLPC - we have the additional unique challenge of doing
this keeping in mind low bandwidth or even 0 bandwidth or sneaker-net
type situations.


Thanks,
Sayamindu

> James Simmons
>
>
> On Sun, Jun 28, 2009 at 4:08 PM, Sayamindu Dasgupta<sayamindu at gmail.com> wrote:
>> On Mon, Jun 29, 2009 at 2:31 AM, Jim Simmons<nicestep at gmail.com> wrote:
>>> I uploaded the first version of Get Internet Archive Books to ASLO
>>> about an hour ago, so perhaps by the time you read this it will be
>>> available to try out there.  I'm sending this email to IAEP because
>>> I'd like some feedback from those who might use the Activity.  What
>>> this Activity does is to provide a front end to the Advanced Search
>>> function of the Internet Archive website.  In essence it gives you a
>>> nice GUI to search through the archive, get information about books,
>>> then download the books you choose to the Journal.  It's very similar
>>> to the offline catalog feature of Read Etexts, but better, because it
>>> has much more information on the books.  The screenshots at ASLO tell
>>> the story so I won't give more details here.  Suffice it to say if you
>>> are looking for books with pictures, or books in languages other than
>>> English, then this Activity will be of interest.  If you've ever
>>> dreamed of reading the works of Jules Verne in Yiddish then this
>>> Activity will make those dreams come true.
>>>
>>> Currently the Activity can only download the DJVU format.  This format
>>> is an alternative to PDF for documents consisting of scanned in book
>>> pages.  It gives better results than PDF in less than half the disk
>>> space.  You can use Read to view these files.  Unfortunately, Read's
>>> support for DJVU is flaky, at least in .82 on the XO,  I'm pretty sure
>>> I'm downloading the books correctly, but it's possible I'm to blame
>>> for this.  I'll need to do some more testing to know for sure.  Future
>>> versions will support downloading PDFs and other formats offered by
>>> this website.
>>>
>>
>> http://dev.laptop.org/~sayamindu/Read-56.xo will give much better
>> performance in 8.2.x OLPC OS releases.
>>
>> On a related note - you will probably be interested to know that the
>> Internet Archive has started work on experimental OPDS support:
>> http://bookserver.archive.org/ (unfortunately they only link to the
>> PDF variants from that catalogue)
>>
>> Cheers,
>> Sayamindu
>>
>>
>> --
>> Sayamindu Dasgupta
>> [http://sayamindu.randomink.org/ramblings]
>>
>



-- 
Sayamindu Dasgupta
[http://sayamindu.randomink.org/ramblings]


More information about the IAEP mailing list