As part of a project where a pdf or xps document is searched for specific content, without opening in a viewer, I need to know how this might be done in Livecode.
Anyone got a a clue on how to solve this problem?
How can you read alfanumeric text in pdf or xps files
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
Re: How can you read alfanumeric text in pdf or xps files
a-revuser ,
A quick search using my fav search engine resulted in the following:
http://qurl.tk/ka
http://qurl.tk/kb
Let me know if this works for you.
Best,
Mark
A quick search using my fav search engine resulted in the following:
http://qurl.tk/ka
http://qurl.tk/kb
Let me know if this works for you.
Best,
Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode
-
- Posts: 101
- Joined: Wed Dec 22, 2010 8:17 pm
Re: How can you read alfanumeric text in pdf or xps files
Hi a-revuser ,
Don't know if you're still interested in this (was some time ago).
As you know, (from your previous post: Can this be done & are you interested? Thu Nov 11, 2010 11:16 am) PDFs are encrypted, so opening them in a text editor reveals unusable gobledegook.
I'm not sure if recent versions of rev/livecode can do more with PDFs or not (I'm thinking not or you would've received more replies). However, if you can get the text to the clipboard then you can use rev's text manipulation features--starting with all the 'offset' functions--to do whatever you want with plain text, and presumably at the end save back into the pdf format.
Another approach would be to put a button on the PDF to export user-entered fields as an .fdf (the data in a pdf form). An fdf is plain text with the form field names followed by the entered data in a consistent format. You can then write some script to extract the entered info from this format.
From your earlier post I'm thinking that what you ultimately want is a big ask, though.
Good luck
Steve
Don't know if you're still interested in this (was some time ago).
As you know, (from your previous post: Can this be done & are you interested? Thu Nov 11, 2010 11:16 am) PDFs are encrypted, so opening them in a text editor reveals unusable gobledegook.
I'm not sure if recent versions of rev/livecode can do more with PDFs or not (I'm thinking not or you would've received more replies). However, if you can get the text to the clipboard then you can use rev's text manipulation features--starting with all the 'offset' functions--to do whatever you want with plain text, and presumably at the end save back into the pdf format.
Another approach would be to put a button on the PDF to export user-entered fields as an .fdf (the data in a pdf form). An fdf is plain text with the form field names followed by the entered data in a consistent format. You can then write some script to extract the entered info from this format.
From your earlier post I'm thinking that what you ultimately want is a big ask, though.
Good luck
Steve
Re: How can you read alfanumeric text in pdf or xps files
Some PDF files have text objects in them, unencrypted. Check out the Wiki entry on Portable Document Format: http://en.wikipedia.org/wiki/Portable_Document_Format. I have succesfully parsed some PDFs as binaries this way.
There is also a good command line toolkit, called, oddly enough, PDFTK, at http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/.
Walt
There is also a good command line toolkit, called, oddly enough, PDFTK, at http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/.
Walt
Walt Brown
Omnis traductor traditor
Omnis traductor traditor