Page 1 of 2

Redefine keyword item (word, line, ...)

Posted: Tue Nov 04, 2014 6:00 pm
by [-hh]
The following is written in a rather formal style. No one should take this style as "condescending", please. I tried to write it as simple as possible without becoming imprecise.

This is NOT a discussion about the number of items in current LC.
The post is about a new definition of items. A definition that shall avoid the current inconsistencies with "the number of items" or "is among the items". What's the number of items or the result of "is among the items" *follows* from it.

Below I write "item" and use comma as delimiter.
This is an exemplary simplification. It is all the same with words or lines or other substring chunks instead of "item" and a delimiter that is any fixed string instead of ",".


The current definition of item in the docs is uncomplete.
Nothing in the definition there explains why "1" is an item of "1"!

Yes, the item keyword ignores a comma at end of string. But it doesn't add a comma to the end of a string and then ignores it?

The special case of empty: Currently this is evaluated as
empty is among the items of empty -- false
empty is among the items of "," -- true

Because the last char of a string is ignored if it is the delimiter, the string "," should be handled like "". Is true=false?

There is meanwhile one more problem:
Read or write ",a" one time from left-to-right and then from right-to-left.
Left-to-right there are currently exactly two items: empty and "a".
Right-to-left there is currently exactly one item: "a".
The same problem comes with "a,".

Now a definition that would solve all these problems:
+++++++ DEFINITION +++++++
An item of a string is every contiguous substring that is not containing comma and delimited at left AND right by one of comma or the beginOfString or the endOfString.
From this it follows immediately for any string S (pseudocode):
The number of items of S = 1 + the number of commas in S
This is a mathematical conclusion from the above definition of "item". Every definition of "the number of items" that doesn't fulfil this equation yields a contradiction!

Summary.
Let S be any string. If
  • 1. T is a contiguous substring of S
    2. T is not containing a comma

    then there are four cases for T to be an item of S:
    (beginOfString and endOfString refer to the string S)

    3a. beginOfString & T & endOfString is in S (that is T=S),
    3b. beginOfString & T & comma is in S,
    3c. comma & T & endOfString is in S,
    3d. comma & T & comma is in S.

    Any T not fulfilling (1.+ 2.+ one of 3a-3d) is not an item of S.

    4. The number of items of S = 1 + the number of commas in S
    [No exception for that].
Remarks.
  • T may be empty, that is, because empty fulfills 1. and 2.:
    T=empty is an item of S if and only if T=empty fulfills one of the four cases 3a-3d above.
  • Empty contains one item that is empty (case 3a above).
  • No new problem with right-to-left; the reading/writing direction yields the same number of items (of course the order changes for more than one item).
  • "the items of S" is a counted set (some say list), that contains as element every item of S.
  • The set "the items of S" has at least one element. If exactly one element, then this one element is non-empty, as with S="a", or it is empty as with S="".
    Thus "empty is among the items of empty" holds true.
  • The check "T is among the items of S" is an ELEMENT-check, not a substring-check like "is in".
    The number of items of S = the number of elements of the counted set "the items of S".
Example 1 [Applied to items]
(no matter what's your reading/writing direction):
The three strings ",a,b" and "a,,b" and "a,b," all have the three items: "a" and "b" and empty. Certainly the order of the items changes.

Example 2 [Applied to words]
Empty is the only word of S=empty.
S=tab contains two words: empty and empty.
S=CR contains two words: empty and empty.
S=space contains two words: empty and empty.
S=quote contains one word: quote.
S=quote&quote contains one word: empty.

Example 3 [Applied to lines]
Empty is the only line of S=empty.
S=CR contains two lines: empty and empty.

Everybody has to rewrite code anyway for using the great new features of LC 6.7 and LC 7.0 and later. This is now a golden chance to redefine chunks in a logically consistent way.

"Einmal ist keinmal" (german, roughly: "once is never"): Of course one may individually have decisions like accepting generally 0<>1 and false<>true, except in some cases where one needs 0=1 and false=true. But such things shouldn't be part of the core of a language engine.

[Edit after 20 views. Corrected definition: One of the "delimiters" comma, beginOfString, endOfString must be at left AND at right of an item. Sorry.]

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 5:08 am
by dunbarx
Einmal ist keinmal"
My favorite, from "The Unbearable Lightness of Being"

There has been discussion for decades, literally, about those trailing commas, that "a," and ",a" do not contain the same number of items. I personally think they ought to, but that ship has sailed long ago, and I make sure I know how this works when I code.

But the language is at least consistent, if not symmetrical and therefore "beautiful". It may only be academic that reading a string backwards does not produce the same number of delimited terms. And I can even see a scenario when such a reading might yield different, and therefore erroneous results. But that ship...

So please make sure that Mark WaddingHam, at least, sees your post. He will appreciate your style and effort, and also your point. The problem, though, with changing such a basic part of the structure would be that lots of existing code would break. Nobody will entertain this.

Craig

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 9:15 am
by [-hh]
Hi Craig.

You cannot (generally) code around a logical contradiction. This is impossible.

The dictionary says in its "item" (keyword) entry:
(*) Note: In LiveCode, if the last character of a string is the itemDelimiter, then this character is ignored by the item keyword.

Now follow this rule:
What is the number of items of ",,"? It is the number of items of ",". OK?
What is the number of items of ","? It is the number of items of "". OK?
In general:
The number of items of N commata = the number of items of N-1 commata for every fixed natural number N. Use this to prove by mathematical induction:
If LC's rule (*) above is applicable, then every natural number N is equal to zero.

LC returns
the number of items of ",," is 2
the number of items of "," is 1
the number of items of "" is 0

This is obviously computed the opposite way, by starting from zero and setting:
(**) The number of items of S = the number of commata in S
Rule (**) is close to the conclusion about the number of items in my first post, but it yields a logical contradiction to rule (*).

How do you (for general N) code around "0=1=2=3= ... =N" ?
Explain that a beginner who is able to count down to zero from any natural number N.

We can see from reading other posts about new versions of LC, that nearly nobody uses these to recompile old code, most other motherships ('opinion leader') of LC/MC/HC advice in chorus against that. [Sorry, have you ever been called a mothership? :-)]

Moreover the number of cases where old code could break because of meanwhile correct counting is certainly smaller than the number of cases where old code has to be recoded because numToChar() changed.

Hermann

p.s. One could also introduce the new attribute "old" for the current definitions:
"old item", "old word", "old line", "old number".

Then everybody who would use his old code with a new LC version has simply to replace "item/word/line/number" with "old item/old word/old line/old number" in his stack file and its stack files. The wonderful LC "Find and Replace" (menu Edit) does this in at most 500 millisecs in average for even large projects by one click.
Yes, "add 1 to the old number of old items of last old line of myString" reads weird, if ever anybody reads that old part of a script. But using attribute "effective" for the new code would require even much more weirdness.]

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 10:55 am
by livecodeali
Hi Herman,

Firstly what I would say in general is that definitions in natural language are determined by use. LiveCode as a programming language obviously has slightly different aims to those of other languages, one of which is that it should reflect in some sense the way certain words are generally understood. Also the language has been around for a while. This means people have got used to the way certain things behave - the behaviour is more important than the definition. That is why I don't think we can make these wholesale changes. Nevertheless, let me present a sort of heuristic treatment of the item chunk, based on trying to satisfy the following desiderata for it (and this applies equally to the line chunk):

1) There is a string such that the number of items of that string is 0
2) A trailing delimiter is optional if possible
3) x is among the items of y holds for exactly n values of x, where n = the number of items of y

These three things are I believe uniquely satisfied by LC's item, except in a few cases (which I consider bugs) which I will list in a moment.

Note that the string referred to in 1) can only sensibly be the empty string. Also it is a consequence of 1) that the trailing delimiter for the empty item must always be present. Thus the character is not ignored in the case that the last item is the empty item. In all other cases, the number of items should be the same whether the last character of the string is the itemDelimiter or not.

So you correctly identify the first bug (a documentation bug) - The dictionary entry for the item keyword should say: if the last item of a string is not empty, then a trailing itemDelimiter is ignored by the item keyword.

2) is pretty fundamental to LiveCode - I'll wager a great many scripts rely on it. And since it is incompatible with the maxim

"The number of items of S = 1 + the number of commas in S"

we are forced to discard the latter.

From 3), we have that empty is not among the items of empty, since the number of items of empty is 0.

------------------------------------------------------------------------------------------------------------------------------------------------------

Now, onto the bugs.

1) empty should not be among the items of "a,"

The number of items of "a," is 1, and the result of "x is among the items of y" should be identical to the result of the following function

function isAmongItems x,y
repeat with i = 1 to the number of items of y
if x is item i of y then return true
end repeat
return false
end isAmongItems

2) put x into item n of y should ensure a trailing delimiter if x is empty

This is an issue when n > the number of items of y, or n = the number of items of y and y does not already contain a trailing delimiter.

put empty into item 3 of "a,b" -- results in "a,b," where it should result in "a,b,,"
put empty into item 2 of "a,b" -- results in "a," where it should result in "a,,"

Otherwise the number of items does not reflect the fact that one of the items is the empty item.

3) delete item x of y should ensure a trailing delimiter if (item x - 1) is empty

Similar to 2). Deleting an item should always result in the number of items decreasing by one.

currently,
delete item 3 of "a,,b" -- results in "a,"
which is a change from 3 items to 1 item

4) revDBQueryList should append a trailing delimiter if the last element is empty

In order to be consistent with the rest of the above, empty items must always be followed by a delimiter.

5) The dictionary entry for the item keyword should say: if the last item of a string is not empty, then a trailing itemDelimiter is ignored by the item keyword.

I have filed a bug report here:
http://quality.runrev.com/show_bug.cgi?id=13936

I believe it might be possible to correct these inconsistencies without breaking too much existing code, but I could do with some feedback on that issue.

------------------------------------------------------------------------------------------------------------------------------------------------------

As an aside, the way LiveCode 7 deals with right to left languages is to distinguish between logical order and display order. The items of a given string are calculated using the logical order, and so will be determined correctly, namely the number of items of ",דגכגכת" is 1 (PHPBB seems to want to make that LTR, but if you type it in LiveCode 7 you will get what is expected). Beyond right-to-left languages, I don't see a use case for the number of items being the same when reading right to left. This is also not true of codepoints. If I have a string consisting of (U+0301) and "a" then there are 2 chars read left to right, but only one read right to left.

Ali

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 3:48 pm
by [-hh]
Hello Ali,

it's good that you come in here and especially list a large collection of 'bugs' resulting from the current definition of "item".
__________

You say a trailing delimiter is optional "if possible". This is not unique as long as you can't say what's "possible", but you certainly mean at least this special case:

"a," is handled like "a". OK?

But what about "a"? This is not defined to contain an item. We can't say "a," is handled like "a" is handled like "a," ...

Because empty is not an item of empty you have additionally to define (and add to the dictionary?):
4) a *non-empty* string S that doesn't contain the delimiter has one item: S (the string itself).
__________

To your current efforts let me denote only two examples that shall show what kind of 'problems' is still there *after* applying your remedies.

These examples apply to your new, not yet implemented rule (which is a necessary condition to be able to resolve the 'bugs'):

(+) Empty items must be followed by the delimiter.

[Ex. 1]
Start with S=empty.
Set itemdelimiter to "I" and put empty into item 1 of S -- S="I"
Set itemdelimiter to "L" and put empty into item 1 of S -- S="LI"
Set itemdelimiter to "A" and put empty into item 1 of S -- S="ALI"

That is, you can with your new rule create any string S simply by switching the itemdelimiter and then put *empty* into item 1 of an initially empty string. (Starting with 7.0 at once by setting the itemdelimiter to S.)

With my definition S remains empty, containing 1 item: empty.

[Ex. 2]
Start again with S=empty.
Set itemdelimiter to "A" and put empty into item 2 of S -- S="AA"
Set itemdelimiter to "L" and put empty into item 2 of S -- S="AALL"
Set itemdelimiter to "I" and put empty into item 2 of S -- S="AALLII"

S="AALLII" has now 3 items with the itemdelimiter "A", 3 items with the itemdelimiter "L" and 2 items with the itemdelimiter "I" although you always worked on item 2.

With my definition S becomes "ALI" and has now two items with each one of the itemdelimiters "A","L","I".
__________
As an aside, the way LiveCode 7 deals with right to left languages is to distinguish between logical order and display order.
Your argument is good for the writer, but I wonder how you tell a reader, any reader, the display order of ",1"?
__________

Before I wrote my starting post above I tried, following essentially your current arguments, in sum a whole day to "rescue" the current state, as you do now. I still think it's impossible and the old behaviour should disappear.

Kind regards, Hermann

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 3:51 pm
by Mikey
See my longstanding rant: ALL ITEMS ARE CREATED EQUAL, EVEN THE LAST ONE

I don't bloody care about somebody's legacy code. Deprecate the old functionality as of 7.0 and say that in 8.0 it's gonna change. End of story. Hell, 7 makes certain keywords different.

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 5:30 pm
by FourthWorld
Mikey wrote:...7 makes certain keywords different.
Very few, and mostly in either ways that continue to support older code, or in the case of charToNum/numToChar being augmented with the addition of byteToNum/numToByte, in ways where the user base was given many years' advance notice to rework any affected code.

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 5:57 pm
by Mikey
8 won't be coming out in 6 months, and this is not a new issue. At all. It's been a complaint for a long time, and there are ways to support the legacy code when 8 comes out, via a property that the developer can set, if they choose the goofy way.

All items were created equal. Including the last one.

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 7:22 pm
by FourthWorld
Mikey wrote:8 won't be coming out in 6 months, and this is not a new issue. At all. It's been a complaint for a long time, and there are ways to support the legacy code when 8 comes out, via a property that the developer can set, if they choose the goofy way.

All items were created equal. Including the last one.
It may be helpful to note that I don't have a strong opinion on this either way. But I know many who do, on both sides. 27 years of legacy code is not a small consideration. And it's not like this is the only oddity in the language, or that LiveCode is the only language with historical oddities.

If Mark Waddingham chooses to make the change that's fine with me. But it's not a change that can be taken lightly.

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 8:02 pm
by jacque
Mark Waddingham has addressed the problems here: http://quality.runrev.com/show_bug.cgi?id=10727 Any changes would need to account for the issues he lists.

The current behavior isn't difficult to understand if you stop thinking of delimiters as dividers, and understand that the engine treats them as the final character in a chunk. A trailing carriage return is the final character of the line. A trailing comma is the final character of an item. When asking for "the number of items" or the value of a particular item, the engine strips out the delimiting character ,if it exists, before returning the value.

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 8:10 pm
by dunbarx
Jacque.
The current behavior isn't difficult to understand if you stop thinking of delimiters as dividers, and understand that the engine treats them as the final character in a chunk
That was well said. And ought to be the last word on the subject, like it or not. The issue of course arose with such strings as ",,," and angst created by the emptiness in between. But if you close your eyes and apply what I will now call "Gay's Axiom", then even that can be accepted, like fate itself must be.

Hermann and Mikey may not be mollified, Ali's documentation fixes notwithstanding, and I will always harbor a wistful longing for the "readable" number of items as opposed to the "engine's" number of items. But so be it.

Craig

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 8:25 pm
by Mikey
I think you should use the database calls more often, and try to constantly remember that the last column may or may not be empty. Now apply that to every single time you have to use chunks. Then after you write a wrapper to overcome this behavior, try to remember the thing you kluged to overcome this.

You know how, when you are building a mobile app, when LC encounters some error, it just silently stops executing the script, and the only hint you have that something bad just happened is because you get some bizarre message from some user that your app just did or didn't do something that it should never do or not do?

All items are created equal, including the last one.

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 8:48 pm
by jacque
Mikey wrote:I think you should use the database calls more often, and try to constantly remember that the last column may or may not be empty. Now apply that to every single time you have to use chunks. Then after you write a wrapper to overcome this behavior, try to remember the thing you kluged to overcome this.
Yes, understood. In the QCC report I linked to, Mark Waddingham agrees that the database commands should probably deal with trailing delimiters for you. If that's the primary issue with item delimiters, then his suggestion would be a good compromise and would save you some trouble. I think if I were affected more by this, I'd push him to implement those fixes.

Re: Redefine keyword item (word, line, ...)

Posted: Wed Nov 05, 2014 9:16 pm
by Mikey
Been there, repeatedly done that, and...here we are. Check the dates.

Re: Redefine keyword item (word, line, ...)

Posted: Thu Nov 06, 2014 12:48 am
by [-hh]
@jacque and Craig:

I would prefer to discuss here about facts, not about esoterica:
Is 0=1? The current engine behaviour tries to prove that (see above) and, Mikey is right, abuses the last char of a string for that.

And all developers have to work around it. Funny game.
But is 0=1, or is it not?
Collecting contradictions doesn't remove these, this will be Ali's dilemma (see his post above).