PDF to EPUB (or any other eBook format)

So i’ve come across a large collection of ebooks in PDF format (I won’t say from where) which is awsome, except for one thing. There is not an ebook reader in the world (besides Adobe Digital Editions, but meh…) that reads PDFs. So I sat about on a journey to find a decent way to convert PDF to the popular EPUB format. Why epub and not something else like amazon kindle’s mobi format? Because, I have an iPhone, and iphone has Stanza (teh best ebook reader on the platform) and Stanza uses epub.

Long story short, there’s a lot of software out there that claims to do this all for you, but a lot of them jsut don’t cut it when it comes to images. The books I’m interested in are of a technical nature mostly, so there will be diagrams.

After testing out all of the free alternatives, I cam across Calibre. Calibre is an open source ebook management program that reads just about every format you can imagine, and can convert from one ebook format to another.  It’s also free and open source, so woot!

Calibre CAN do PDF to epub, but you lose a lot of formatting and all of your bookmarks.  In other words, no table of contents.  That sucks.

To make a long story short, I tried a few combinations of softwares to get this working, and here is the most reliable way I have found to convert PDF to epub while retaining the table of contents, images, and MOST of your original formatting (some will still be lost, but if you’re a freak about stuff like that you can always fix it before the last step).

First, the requirements:

  • Adobe Acrobat – $300.00 - No, not Acrobat READER, that’s something else.  You need to have Adobe Acrobat, or some other program that can export PDF’s as compliant HTML 3.2.  I’m sure there’s something free out there, but this is what I used as I had it at work :)
  • Calibre – FREE
  • Notepad++ (optional) – FREE – this is used for some code cleanup if needed (for you formatting freaks)

Process couldn’t be simpler really:

  1. Load the PDF in acrobat
  2. FILE > EXPORT > HTML > HTML 3.2
  3. Click on “SETTINGS”
  4. uncheck “Generate Bookmarks”
  5. check “Generate tags for untagged files”
  6. check “generate images”
  7. uncheck “use sub-folder”
  8. click OK
  9. Save
  10. OPTIONAL – Edit the exported HTML file in Notepad++.  get rid of any style tags in the body element, and any font colors, as they will mess with some readers (like stanza)
  11. Open Calibre
  12. Click “Add Book”
  13. select the exported html file
  14. Right click on the imported book and select CONVERT E-BOOKS > CONVERT INDIVIDUALLY
  15. Input format will be ZIP, output will be epub
  16. edit meta info as needed
  17. On the “Structure Detection” tab, set “detect chapters…” to //h:h1
  18. On the table of contents page, set level 1 to //h:h1, level 2 to //h:h2, and level 3 to //h:h3
  19. Click OK

That’s it.  the PDF will be converted to an epub that you can view in Calibre, or save to another location and read in another reader, or send to a device.

Note that all PDFs are not created equal, so some of these settings will need to be tweaked for different books, but this is what worked for me with a quality I was happy with.

Comments (4) »

  • Raf says:

    FANTASTIC!!!!!! Your post was very very very helpful! Thank you so much. I will see how much pdf will be used at the iPad, otherwise I can just convert the way you showed here!

  • Sune says:

    Thank you – this makes for some wonderful pdf’s conversions on the Sony Reader!

    In the process of exporting out to html leading tabs/spaces seem to be nuked. I mainly read books on Python, where indentation is of the essence. Would you know how to keep these?

  • thatjoshguy says:

    It really depends on the source. I’ve since discovered that this process loses a lot of formatting alone the way. Calibre uses regular expressions to match content and formating, so if you view the source of the HTML that is output and find what tag is used for indenting you may be able to get it to look right. Personally, I have yet to get a technical manual to export with propped formatting (mainly php and asterisk manuals, where formatting is key as well). :(

  • David says:

    I used Adobe’s online conversion site where you just email your pdf to pdf2html@adobe.com and they email you back the html file. Followed your guidelines on the table of contents with Calibre and it works like a champ! Just what I was looking for. Thanks

Leave a comment

XHTML– Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>