Stripping text from word documents
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
Stripping text from word documents
Is it possible to read into text fields the contents of .DOC, and .PDF stripping out the formatting on import. I just want the actual text with CR/LF and none of the other garbage that comes with the read anyfile demo.
Jeff G potts
Re: Stripping text from word documents
Hi Jeff,
no this is not possible without extreme efforts!
DOC and PDF are NOT plain text files, as you have seen, so this is not possible right "out of the box".
If you are on OS X you could use SHELL and "textutil" to convert a DOC or PDF to plain or rtf text and work with that one.
Best
Klaus
no this is not possible without extreme efforts!
DOC and PDF are NOT plain text files, as you have seen, so this is not possible right "out of the box".
If you are on OS X you could use SHELL and "textutil" to convert a DOC or PDF to plain or rtf text and work with that one.
Best
Klaus
Re: Stripping text from word documents
The WordLib library can import Word files. Its forte is the newer DOCX format (Word 2007) and OpenOffice, but it does provide basic support for legacy Word DOC files, and it seems that's just what you're after. It does a pretty good job of stripping out the text.
To get the plain text with no styles, just import and then "put field 1 into field 1" for example, to clear any formatting.
(I hope to provide full formatting support for the legacy DOC files in a future version, and the more registered users I have, the more I will be able to develop the library!)
To get the plain text with no styles, just import and then "put field 1 into field 1" for example, to clear any formatting.
(I hope to provide full formatting support for the legacy DOC files in a future version, and the more registered users I have, the more I will be able to develop the library!)
Best wishes,
Curry Kenworthy
LiveCode Development, Training & Consulting
http://livecodeconsulting.com/
WordLib: Conquer MS Word & OpenOffice
SpreadLib: "Excel-lent" spreadsheet import/export
http://livecodeaddons.com/
Curry Kenworthy
LiveCode Development, Training & Consulting
http://livecodeconsulting.com/
WordLib: Conquer MS Word & OpenOffice
SpreadLib: "Excel-lent" spreadsheet import/export
http://livecodeaddons.com/