I read in a file as binary, then want to parse it manually. In one test I read in an XML file. So the first line of the file is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
which dumps into a field correctly. So I first want to find any printable character strings, which should be the whole line in this case. I use this call with the regexp set for the entire set of ASCII printable characters via their Unicode value:
Code: Select all
matchChunk(gFileContents, "([\x{0020}-\x{007e}]*)", tStart, tEnd)
1.0" encoding="UTF-8" standalone="yes
Pilot error somewhere? Note that I get the same result with non-Unicode "([\x20-\x7e]*)" regexp as well.
If I use the simpler "([A-Za-z]*)" it is even more interesting - I get back "yes". I would have expected to get "xml" instead.
Thanks, Walt