MatchText Issue
Posted: Mon Aug 04, 2014 12:03 pm
I must have slept through the RegExp on LC class but...
I read in a file as binary, then want to parse it manually. In one test I read in an XML file. So the first line of the file is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
which dumps into a field correctly. So I first want to find any printable character strings, which should be the whole line in this case. I use this call with the regexp set for the entire set of ASCII printable characters via their Unicode value:
The incomplete string I get back is:
1.0" encoding="UTF-8" standalone="yes
Pilot error somewhere? Note that I get the same result with non-Unicode "([\x20-\x7e]*)" regexp as well.
If I use the simpler "([A-Za-z]*)" it is even more interesting - I get back "yes". I would have expected to get "xml" instead.
Thanks, Walt
I read in a file as binary, then want to parse it manually. In one test I read in an XML file. So the first line of the file is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
which dumps into a field correctly. So I first want to find any printable character strings, which should be the whole line in this case. I use this call with the regexp set for the entire set of ASCII printable characters via their Unicode value:
Code: Select all
matchChunk(gFileContents, "([\x{0020}-\x{007e}]*)", tStart, tEnd)
1.0" encoding="UTF-8" standalone="yes
Pilot error somewhere? Note that I get the same result with non-Unicode "([\x20-\x7e]*)" regexp as well.
If I use the simpler "([A-Za-z]*)" it is even more interesting - I get back "yes". I would have expected to get "xml" instead.
Thanks, Walt