Page 1 of 1

Extracting data from Word documents

Posted: Fri Oct 19, 2007 9:56 am
by Andycal
Bit left-wing this, but has anyone got any ideas if it's possible to use RunRev to extract data from Word documents?

Specifically, I've noticed some recruitement sites will actually take a CV and then extract address, phone number and other information automatically.

Posted: Fri Oct 19, 2007 12:56 pm
by Mark
On Windows, use the follosing VB Script:

Code: Select all

Option Explicit

Dim objWord
Dim strFile
If WScript.Arguments.Count < 1 Then
	WScript.Echo("Usage: doc2txt.vbs C:\file.doc")
	WScript.Quit
End If

strFile = Wscript.Arguments(0)

Set objWord = WScript.CreateObject("Word.Application")

objWord.Documents.Open strFile
objWord.ActiveDocument.SaveAs strFile&".rtf", 6
objWord.ActiveDocument.Close

objWord.Quit
and call it using the shell function.

On Mac OS X, you can use TextUtil. Look here for more info:
<http://www.hmug.org/man/1/textutil.php>
or type "man textutil" in the terminal.

Best,

Mark

Posted: Fri Oct 19, 2007 1:18 pm
by Andycal
Gotcha, nice one!

Posted: Fri Oct 19, 2007 7:04 pm
by FourthWorld
This post on the Rev discussion list describes a new library for pulling text from Word 2007 documents:

http://lists.runrev.com/pipermail/use-r ... 03526.html