Page 1 of 2

Parsing scripts for comments

Posted: Mon Apr 05, 2021 12:02 am
by kdjanz
I'm doing a stack that analyses text files of scripts from other stacks.

What is the simplest way to detect a comment?
I don't want to have to compare for "#" or "//" or "--" or "/* separately, so I'm sure that there must be a better way.

Any hints?

I'm sure Thierry could make a regex - but I'm not sure I could understand it :D 8-)

Kelly

Re: Parsing scripts for comments

Posted: Mon Apr 05, 2021 2:11 am
by dunbarx
Hi.

Regex can do anything. I never use it.
I don't want to have to compare for "#" or "//" or "--" or "/* separately,
Why not? I assume you know that something like this is required (pseudo)

Code: Select all

repeat with y = 1 to the number of lines of yourScript
  if "--" or "#" or "/" are in the first few non-space chars in line y of yourScript then
  put y & return after accum
But regex notwithstanding, what turns you off the above sort of ordinary LC gadgetry?

Craig

Re: Parsing scripts for comments

Posted: Mon Apr 05, 2021 2:43 am
by kdjanz
repeat with y = 1 to the number of lines of yourScript
if "--" or "#" or "/" are in the first few non-space chars in line y of yourScript then
put y & return after accum
I don't think that sort of pseudo code would work. What I don't like the look of is

Code: Select all

if ("--" is in line1) OR \
 ("#" is in line1) OR \
et. ad nauseam ...

Re: Parsing scripts for comments

Posted: Mon Apr 05, 2021 3:40 am
by dunbarx
Still not sure what the issue is. In a field 1 with:
aasd
-- abcdefg
kjkjhj
// typo
xxxx
# liveCode
dgdg
And this in a button script:

Code: Select all

on mouseup
   get fld 1
   repeat with y = 1 to the number of lines of it
      if char 1 of line y of it is in "/#-" then put y & return after accum
   end repeat
   answer accum
end mouseup
One get 2, 4 and 6, the lines that are comments.

In a sense, this is very mundane, not nearly as sexy as some regex outrage. And there may be some tweaking needed, for example if spaces precede the comment chars. Also, in any LiveCode session, one must choose a single comment character string, so the fact that all three are present in the example above is overkill.

But I find this simple and ordinary, if unexciting, and ask again what disturbs you about doing something similar to this.

Craig

Re: Parsing scripts for comments

Posted: Mon Apr 05, 2021 9:09 am
by stam
Or just use regex;)
Literally this was discussed a couple of days ago here: viewtopic.php?f=7&t=35679&sid=3e7c85626 ... a61#p20405

The limitation in Craig’s solution is that you can’t detect multi-line comments delimited by /* and */, keeping in mind these can be used to also comment in the middle of a line. So just parsing the start of a line won’t do. Also, even for single line comments I tend to add these to the end of a line...

You can of course do all of this with “normal” LC script but that rapidly produces a more complex/lengthy script, where regex would be a lot simpler... in the example in the link above our delim1 would be “//“, “#”, “--“ or “/*”, and your delim2 would be return or “*/“ respectively.

Re: Parsing scripts for comments

Posted: Mon Apr 05, 2021 12:05 pm
by Thierry
I'm sure Thierry could make a regex - but I'm not sure I could understand it :D 8-)
Then, what's the use to make one? :roll:
Regex can do anything. I never use it.
Almost my friend, almost... :wink:


Happy Easter,

Thierry

Re: Parsing scripts for comments

Posted: Mon Apr 05, 2021 3:30 pm
by dunbarx
What Stam says about limitations is all true, as well as the extra work to include all such variants from "ordinary" commenting

But I never knew, or forgot, that the tags /* and */ can be used inside a working line of code. I would never do this, but cool to know...

Craig

Re: Parsing scripts for comments

Posted: Mon Apr 05, 2021 7:42 pm
by kdjanz
The script collection I am working with uses all four variants, with and without a space between the delimiter and the text as well as having extensive multiline comments - ALL within ONE script. The script is 25K, so there are lots of things to comment on, but the stylistic variation really shows how flexible the LiveCode parser is. Conceptually, it may be simple Craig, but the devil is in the details as always!

In the end, I lifted some code from the GXL2 editor and made it work for my situation, and it seems to be good enough for what I need.

Thanks for the suggestions.

PS Thierry - don't give up. On another project I actually used a regex wildcard on a filter and it worked a treat in less than 10 characters. I just have to keep practicing so that I don't forget it before I need it again. :D

Re: Parsing scripts for comments

Posted: Thu Apr 08, 2021 10:14 am
by Thierry
kdjanz wrote:
Mon Apr 05, 2021 12:02 am
I'm doing a stack that analyses text files of scripts from other stacks.
What is the simplest way to detect a comment?
Here is one way with regex.

screenshot 2021-04-08.jpg

the regex on top, the text to parse on the left and the result on the right (black bg)

Code: Select all

   if sunnYmatchAll( T, rex, A, N, "both") then
      repeat for each key K in A
         if A[ K][ 1][ 2] is not empty then
            get  colorString
         else if A[ K][ 2][ 2] is not empty then
            get  colorComment
         end if
         put A[ K][ 0] into Z
         set the forecolor of char Z[ 0] to Z[ 1] of fld "fOUT" to IT
      end repeat
   end if
   
Regards,

Thierry


PS: there are technics to write regex so they are easier to be read....

Re: Parsing scripts for comments

Posted: Thu Apr 08, 2021 4:31 pm
by stam
Thierry wrote:
Thu Apr 08, 2021 10:14 am
(?m)(")(?:(?:[^"\\]|\\(?s).)*")|(?|(\#|//|--)(?:.*)|(/\*)(?s).*?\*/)
Dear Thierry, that is one very fine and mind-bending piece of regex :shock: :shock: :shock:

-------------------
edit: This works really well and picks out all comments; but it also picks up any text within quotation marks, was this the intent?
regex.jpg

Re: Parsing scripts for comments

Posted: Thu Apr 08, 2021 5:30 pm
by Thierry
stam wrote:
Thu Apr 08, 2021 4:31 pm
This works really well and picks out all comments;
but it also picks up any text within quotation marks, was this the intent?
Hi Stam,

Yes, that's how it works without using my sunnYrex library.
This can be avoided, but only if you can manage back references
in the replacement text

The reason behind is I need to parse strings to avoid false positives;
e.g " xxxxx -- # // not a comment "

In my demo, I use this to colorize the strings and the comments with 2 different colors;
thus the proof that we know if it is a comment or a string.

It's possible to filter the resulting array for comments only.
I see if I get some time free to make a variant of this code
to achieve what you would like to have.


And finally, this was an interesting exercice
and at the same time a response to kelly's OP :)

Be well,

Thierry

Re: Parsing scripts for comments

Posted: Thu Apr 08, 2021 5:38 pm
by stam
Thierry wrote:
Thu Apr 08, 2021 5:30 pm
Yes, that's how it works without using my sunnYrex library.
Ah, that makes sense...
Interesting exercise indeed!

Re: Parsing scripts for comments

Posted: Thu Apr 08, 2021 5:49 pm
by FourthWorld
kdjanz wrote:
Mon Apr 05, 2021 12:02 am
I'm doing a stack that analyses text files of scripts from other stacks.
An intriguing challenge, Kelly. What are you going to do with the output?

Re: Parsing scripts for comments

Posted: Thu Apr 08, 2021 10:23 pm
by kdjanz
I'm trying to make charts like this to understand the program flow after parsing the text files generated by Brian Milby's invaluable ScriptTracker:
FlowChart2.png
Do you know of any algorithmic way of sorting out the lines? Right now I manually adjust the nodes and then save their positions so that they will be "pretty" when the script is reopened subsequently.

Re: Parsing scripts for comments

Posted: Fri Apr 09, 2021 3:46 am
by dunbarx
Hi.

When you say "sorting out the lines", what do you mean? I posted a small stack last week in another thread that allows one to move the various nodes and keep the connecting lines intact:
LineDragger.livecode.zip
(2.07 KiB) Downloaded 238 times
But I am not sure this is what you meant.

Craig