Page 1 of 1
Extracting data from Word documents
Posted: Fri Oct 19, 2007 9:56 am
by Andycal
Bit left-wing this, but has anyone got any ideas if it's possible to use RunRev to extract data from Word documents?
Specifically, I've noticed some recruitement sites will actually take a CV and then extract address, phone number and other information automatically.
Posted: Fri Oct 19, 2007 12:56 pm
by Mark
On Windows, use the follosing VB Script:
Code: Select all
Option Explicit
Dim objWord
Dim strFile
If WScript.Arguments.Count < 1 Then
WScript.Echo("Usage: doc2txt.vbs C:\file.doc")
WScript.Quit
End If
strFile = Wscript.Arguments(0)
Set objWord = WScript.CreateObject("Word.Application")
objWord.Documents.Open strFile
objWord.ActiveDocument.SaveAs strFile&".rtf", 6
objWord.ActiveDocument.Close
objWord.Quit
and call it using the shell function.
On Mac OS X, you can use TextUtil. Look here for more info:
<
http://www.hmug.org/man/1/textutil.php>
or type "man textutil" in the terminal.
Best,
Mark
Posted: Fri Oct 19, 2007 1:18 pm
by Andycal
Gotcha, nice one!
Posted: Fri Oct 19, 2007 7:04 pm
by FourthWorld
This post on the Rev discussion list describes a new library for pulling text from Word 2007 documents:
http://lists.runrev.com/pipermail/use-r ... 03526.html