Redefine keyword item (word, line, ...)
Posted: Tue Nov 04, 2014 6:00 pm
The following is written in a rather formal style. No one should take this style as "condescending", please. I tried to write it as simple as possible without becoming imprecise.
This is NOT a discussion about the number of items in current LC.
The post is about a new definition of items. A definition that shall avoid the current inconsistencies with "the number of items" or "is among the items". What's the number of items or the result of "is among the items" *follows* from it.
Below I write "item" and use comma as delimiter.
This is an exemplary simplification. It is all the same with words or lines or other substring chunks instead of "item" and a delimiter that is any fixed string instead of ",".
The current definition of item in the docs is uncomplete.
Nothing in the definition there explains why "1" is an item of "1"!
Yes, the item keyword ignores a comma at end of string. But it doesn't add a comma to the end of a string and then ignores it?
The special case of empty: Currently this is evaluated as
empty is among the items of empty -- false
empty is among the items of "," -- true
Because the last char of a string is ignored if it is the delimiter, the string "," should be handled like "". Is true=false?
There is meanwhile one more problem:
Read or write ",a" one time from left-to-right and then from right-to-left.
Left-to-right there are currently exactly two items: empty and "a".
Right-to-left there is currently exactly one item: "a".
The same problem comes with "a,".
Now a definition that would solve all these problems:
Summary.
Let S be any string. If
(no matter what's your reading/writing direction):
The three strings ",a,b" and "a,,b" and "a,b," all have the three items: "a" and "b" and empty. Certainly the order of the items changes.
Example 2 [Applied to words]
Empty is the only word of S=empty.
S=tab contains two words: empty and empty.
S=CR contains two words: empty and empty.
S=space contains two words: empty and empty.
S=quote contains one word: quote.
S=quote"e contains one word: empty.
Example 3 [Applied to lines]
Empty is the only line of S=empty.
S=CR contains two lines: empty and empty.
Everybody has to rewrite code anyway for using the great new features of LC 6.7 and LC 7.0 and later. This is now a golden chance to redefine chunks in a logically consistent way.
"Einmal ist keinmal" (german, roughly: "once is never"): Of course one may individually have decisions like accepting generally 0<>1 and false<>true, except in some cases where one needs 0=1 and false=true. But such things shouldn't be part of the core of a language engine.
[Edit after 20 views. Corrected definition: One of the "delimiters" comma, beginOfString, endOfString must be at left AND at right of an item. Sorry.]
This is NOT a discussion about the number of items in current LC.
The post is about a new definition of items. A definition that shall avoid the current inconsistencies with "the number of items" or "is among the items". What's the number of items or the result of "is among the items" *follows* from it.
Below I write "item" and use comma as delimiter.
This is an exemplary simplification. It is all the same with words or lines or other substring chunks instead of "item" and a delimiter that is any fixed string instead of ",".
The current definition of item in the docs is uncomplete.
Nothing in the definition there explains why "1" is an item of "1"!
Yes, the item keyword ignores a comma at end of string. But it doesn't add a comma to the end of a string and then ignores it?
The special case of empty: Currently this is evaluated as
empty is among the items of empty -- false
empty is among the items of "," -- true
Because the last char of a string is ignored if it is the delimiter, the string "," should be handled like "". Is true=false?
There is meanwhile one more problem:
Read or write ",a" one time from left-to-right and then from right-to-left.
Left-to-right there are currently exactly two items: empty and "a".
Right-to-left there is currently exactly one item: "a".
The same problem comes with "a,".
Now a definition that would solve all these problems:
From this it follows immediately for any string S (pseudocode):+++++++ DEFINITION +++++++
An item of a string is every contiguous substring that is not containing comma and delimited at left AND right by one of comma or the beginOfString or the endOfString.
This is a mathematical conclusion from the above definition of "item". Every definition of "the number of items" that doesn't fulfil this equation yields a contradiction!The number of items of S = 1 + the number of commas in S
Summary.
Let S be any string. If
- 1. T is a contiguous substring of S
2. T is not containing a comma
then there are four cases for T to be an item of S:
(beginOfString and endOfString refer to the string S)
3a. beginOfString & T & endOfString is in S (that is T=S),
3b. beginOfString & T & comma is in S,
3c. comma & T & endOfString is in S,
3d. comma & T & comma is in S.
Any T not fulfilling (1.+ 2.+ one of 3a-3d) is not an item of S.
4. The number of items of S = 1 + the number of commas in S
[No exception for that].
- T may be empty, that is, because empty fulfills 1. and 2.:
T=empty is an item of S if and only if T=empty fulfills one of the four cases 3a-3d above. - Empty contains one item that is empty (case 3a above).
- No new problem with right-to-left; the reading/writing direction yields the same number of items (of course the order changes for more than one item).
- "the items of S" is a counted set (some say list), that contains as element every item of S.
- The set "the items of S" has at least one element. If exactly one element, then this one element is non-empty, as with S="a", or it is empty as with S="".
Thus "empty is among the items of empty" holds true. - The check "T is among the items of S" is an ELEMENT-check, not a substring-check like "is in".
The number of items of S = the number of elements of the counted set "the items of S".
(no matter what's your reading/writing direction):
The three strings ",a,b" and "a,,b" and "a,b," all have the three items: "a" and "b" and empty. Certainly the order of the items changes.
Example 2 [Applied to words]
Empty is the only word of S=empty.
S=tab contains two words: empty and empty.
S=CR contains two words: empty and empty.
S=space contains two words: empty and empty.
S=quote contains one word: quote.
S=quote"e contains one word: empty.
Example 3 [Applied to lines]
Empty is the only line of S=empty.
S=CR contains two lines: empty and empty.
Everybody has to rewrite code anyway for using the great new features of LC 6.7 and LC 7.0 and later. This is now a golden chance to redefine chunks in a logically consistent way.
"Einmal ist keinmal" (german, roughly: "once is never"): Of course one may individually have decisions like accepting generally 0<>1 and false<>true, except in some cases where one needs 0=1 and false=true. But such things shouldn't be part of the core of a language engine.
[Edit after 20 views. Corrected definition: One of the "delimiters" comma, beginOfString, endOfString must be at left AND at right of an item. Sorry.]