Page MenuHomePhabricator

SIMPLEXpress: Resolve "greedy" problem and negative multiple logic
Closed, CompletedPublic5 Energy Points (d+f*r)


As discussed with @jcmcdonald in the Ratscript chat:

8:59 AM

Hmm, maybe I need to think more about how the negative is supposed to work. I'm guessing the tests aren't very good for it, either. If you can think of a few model/input examples for pass/fail situations it might help?

Also realized the single character literal match test doesn't test for a few things it probably should, going to update that test a bit (much easier to put multiple possible models in a test now that I have a static match function ;D )

11:02 AM

Model: ^d!+/ and ^d+/

ABC and 123
... and 5
 One and 2
A and 42

The model ^d+!/ and ^d+/ should be functionally identical.

....eep, that is , ^d!+/ means the same thing as ^d+!/. I just realized that and up there reads weird. Sorry.

11:59 AM

Hmm, I'll think about it. The way multiples are set up, if I wrote the negative multiple the same, ^d+!/ would match ABC and , ... and , One and , A and , then the match would probably fail because it would already consume everything not a digit before the digit unit.

I'm not quite sure how to resolve the problem of multiples over-matching when they overlap with the next piece of the model without doing any kind of look-ahead, to be honest

12:07 PM

Ahh, yes, it's the "greedy" problem.

What you *could* do to resolve that is check if the character being inspected is the next LITERAL.

That is, if you encounter a space, it might still be in the unit, or it might be in the literal. Let that remain ambiguous somehow until it can be resolved, ergo hitting 1, which fails both the first unit and the literal.


Task Type
Proposed Urgency
g4: Significant
f2: Street
r2: Low
Volatility (Caught At)
Not a Bug
Not a Bug/Unknown

Revisions and Commits

Event Timeline

ardunster triaged this task as p3: Next priority.Dec 2 2020, 1:17 PM
ardunster created this task.

thinking 'out loud':

if attr.multiple on model @ model index:
   while matched > 1 and next():
   do something to compare, next++
   evaluate comparison vs original matched and make a decision

I wonder if each individual unit could be marked with a trilean as "satisfied" or not. (In this example, I'm thinking of a chain of literal characters as one unit.) Then, you could have a simple set of rules:

  1. Each unit starts as maybe.
  2. If the current focus unit n is maybe, check current character against n.
  3. If the current character does not match against n, check against n+1.
    • If it does NOT match n+1 , mark n as false instead.
    • If it DOES match n+1, change n to true and advance current focus unit to n+1.

In order to do this, I imagine the following is needed:

  • Each Unit needs a trilean for "satisfied".
    • true indicates it is FULLY satisfied.
    • maybe that it is NOT fully satisfied (either CAN or MUST have more characters passed.).
    • false means that it FAILED, and therefore the entire model fails.
  • Each Unit must have a check_satisfied() function, which can check if a satisfied=maybe Unit can be converting to satisified=true in its current state. ^d+/ is an example of this: no matter how many numbers it receives, it remains satisfied=maybe, but if it receives even one number, it can be converted to satisfied=true.

To put that another way, given this model:

^l+/ & ^d+/

That breaks down to three units:

  1. '^l+/'
  2. ' & '
  3. '^d+/'

While checking A & 1...

  • (1), (2), and (3) are maybe.
  • Check A against leftmost maybe — (1)...
    • (1) Matches. Unit (2) is still maybe (not fully satisified), leave as maybe.
  • Check (space) against leftmost maybe — (1)...
    • (1): Does not match. Check next unit.
    • (2): Matches. Unit (2) is still maybe (not fully satisified), leave as maybe.
    • Because character matches (2), check if (1) can now be marked true (fully satisfied). (It can.)
  • Check & against leftmost maybe — (2)...
    • (2): Matches. Unit (2) is still maybe (not fully satisfied), leave as maybe.
  • Check (space) against leftmost maybe — (2)...
    • (2): Matches. Unit (2) is fully satisifed, mark as true.
  • Check 1 against leftmost maybe — (3)...
    • (3): Matches. Unit (3) is still maybe (not fully satisfied), leave as maybe.
  • End of string. Check if remaining maybe units can be marked fully satisfied.
    • (3) can now be marked true (fully satisfied).

While checking A+ 1...

  • (1), (2), and (3) are maybe.
  • Check A against leftmost maybe — (1)...
    • (1) Matches. Unit (2) is still maybe (not fully satisified), leave as maybe.
  • Check + against leftmost maybe — (1)...
    • (1): Does not match. Check next unit.
    • (2): Does not match. Mark leftmost maybe — (1) — as false, failing the unit.

Overall it sounds good, but would require rewriting the entire logic of match() I think, and seems like it would be hard to keep track of inputs longer than matches.

Also would require to be reset after every match, because you'd be changing the Units in the model.

(Also, for whatever it's worth, ^l+/ & ^d+/ is parsed as 5 units, including 3 literals in the middle)

Mm, true.

Also, this might be an interesting corner case:


That would match XaX3, but the first X would leave both ^l+/ and X as maybe. It wouldn't even process the second, since the first would match.

Meanwhile, with X1, processing the X would leave both ^l+/ and X as maybe, but processing 1 would only satisfy the third unit, so you'd have to have some way of recognizing that X matched the first two units, but satisified the second one, because the first could be empty.

So you'd need some clear rule for when a character satisifes both n and n+1, and how optionality fits into that.

I'm going to experiment with my current line of thinking for now, but if I can't get it to behave the way I expect then I'll take a second look at rewriting all of match() ;)

my current thought process as written in terrible comments in a code block:

  • temp model index, starts at current index+1
  • while (copy of) matched > 0 and next(temp index):
    • check model on slice of buffer not including first character:
      • if matched to next model, proceed: ++ temp index and move to next item in buffer, store matches; store index of match to next unit
        • if matches longer than the matched section, subtract quantity of matches from index of the match to the next unit, then edit matched variable for the unit pass at the original model index to only include matches before the next literal
    • if its not matched to the next model, iterate without changing temp model index, until we run out of nexts or run out of matches on the multiple unit

Also added those interesting cases to my test.

Let me know how that goes! I'm curious to see how this turns out.

P.S. Flowcharts are often helpful here, especially on whiteboard or paper for maximum flexibility.

Ah, good idea. About out of time for today but I'll look at putting one together later. Maybe start with all the existing logic 😬

ardunster edited projects, added Unknown Object (Project); removed Unknown Object (Project).

@jcmcdonald If we want to be able to assign a unit model in any order (as in, "^d!+/ means the same thing as ^d+!"), unit parsing will have to be substantially rewritten. As is, neither is correct, and only ^!d+/ will produce the intended, negated model. If you want to consider this a bug, not a feature, then we'll make a separate task for rewriting UnitParser::parse() accordingly.

Hmm...I rather like the negation being at the front, actually.

That's the way it's been since I started messing with it, and it makes more sense to me also.

jcmcdonald raised the priority of this task from p3: Next to p4: Now.Dec 15 2020, 8:17 AM

This should be p4 since it's your current task. Your "next" task(s) should be p3.

oh, yeah. I had originally prioritized it lower but then realized that match() really needs to work right for the stuff that supposedly already works before I move on from it, and didn't update accordingly.

ardunster set the point value for this task to 5.Dec 16 2020, 12:27 PM
jcmcdonald edited projects, added SIMPLEXpress [Project]; removed Unknown Object (Project).Jun 19 2021, 10:25 AM