How can you read alfanumeric text in pdf or xps files

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
a-revuser
Posts: 23
Joined: Fri Dec 04, 2009 10:23 am

How can you read alfanumeric text in pdf or xps files

Post by a-revuser » Tue Nov 09, 2010 7:05 pm

As part of a project where a pdf or xps document is searched for specific content, without opening in a viewer, I need to know how this might be done in Livecode.
Anyone got a a clue on how to solve this problem?

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: How can you read alfanumeric text in pdf or xps files

Post by Mark » Wed Nov 10, 2010 12:59 am

a-revuser ,

A quick search using my fav search engine resulted in the following:
http://qurl.tk/ka
http://qurl.tk/kb

Let me know if this works for you.

Best,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

Steve Denney
Posts: 101
Joined: Wed Dec 22, 2010 8:17 pm

Re: How can you read alfanumeric text in pdf or xps files

Post by Steve Denney » Fri Dec 24, 2010 10:22 pm

Hi a-revuser ,
Don't know if you're still interested in this (was some time ago).
As you know, (from your previous post: Can this be done & are you interested? Thu Nov 11, 2010 11:16 am) PDFs are encrypted, so opening them in a text editor reveals unusable gobledegook.
I'm not sure if recent versions of rev/livecode can do more with PDFs or not (I'm thinking not or you would've received more replies). However, if you can get the text to the clipboard then you can use rev's text manipulation features--starting with all the 'offset' functions--to do whatever you want with plain text, and presumably at the end save back into the pdf format.
Another approach would be to put a button on the PDF to export user-entered fields as an .fdf (the data in a pdf form). An fdf is plain text with the form field names followed by the entered data in a consistent format. You can then write some script to extract the entered info from this format.
From your earlier post I'm thinking that what you ultimately want is a big ask, though.
Good luck
Steve

WaltBrown
Posts: 466
Joined: Mon May 11, 2009 9:12 pm

Re: How can you read alfanumeric text in pdf or xps files

Post by WaltBrown » Sat Dec 25, 2010 8:02 am

Some PDF files have text objects in them, unencrypted. Check out the Wiki entry on Portable Document Format: http://en.wikipedia.org/wiki/Portable_Document_Format. I have succesfully parsed some PDFs as binaries this way.

There is also a good command line toolkit, called, oddly enough, PDFTK, at http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/.

Walt
Walt Brown
Omnis traductor traditor

Post Reply