linux下如何将有声pdf转成mp3
google了许久,终于找到一个办法可以将有声pdf转成mp3。 很遗憾,这个方法在linux下才能用,我考虑有时间时会重新写一个可以在windows下运行的.原贴在这里:
http://yhager.com/content/extracting-embedded-audio-pdf
Extracting embedded audio from a PDF
Submitted by yhager on Mon, 12/13/2010 - 19:30
I recently purchased a PDF with embedded sound files from Piano For All The PDF also contains videos, but they were provided as separate files.
For the life of me, I was unable to find an application that can play those media files from the PDF. I tried Adobe Reader, which just sends me to a web page with a “no plug-ins to your OS exist”. Then I tried Okular, my favourite PDF reader, and it too didn’t work. Same results with evince, xpdf and gv.
I also tried to convert the PDF to PS, but that just ignored the embedded audio files, so it didn’t help either.
So I turned to find a solution myself. Analysing the PDF structure, I was able to find the stream objects and their “file names” within the PDF rather easily. I quickly wrote a C program that allowed me to extract those and save them in the file system as files (attached here).
The problem with that was that the files within the PDF were compressed in some way I was unable to detect.
Looking a bit for PDF structure and compression algorithms, and the PDF file itself (I just opened it with a text editor), I found that the compression used is called “FlateDecode”. I then found a Ghostscript utility that can extract those decoded streams and replace them with their uncompressed version.
To use it, save it in the same directory as your PDF file and run:
$ gs -- pdfinflt.ps original.pdf uncompressed.pdfAssuming your original PDF file is named original.pdf, you now have the uncompressed media files embedded within uncompressed.pdf.
Now all I need is to run the stream extraction code again:
$ ./pdf_extract_embedded uncompressed.pdfThe files will be written to the current directory, maintaining the name the original PDF author gave to them when she embedded them in the PDF.
You can find pdfinflt.ps online in the Ghostscript distribution, but I attached it here, for convenience.
The C code for extracting the streams is also attached. Not the best pieces of software around if you want to learn how to code, but it does the work for me. You will need to compile it yourself though. Use something like
$ gcc -o pdf_extract_embedded pdf_extract_embedded
http://www.ebama.net/xwb/images/bgimg/icon_logo.png 该贴已经同步到 欢乐chylli的微博 可惜我用不上,不过也辛苦了 看不懂啊,还要学习 直接看不明白~~
页:
[1]