Page MenuHomePhabricator

Replace ICU U_Char and UnicodeString with PawLIB onechar and onestring
ClosedPublic

Authored by bdlovy on Jan 16 2020, 4:54 PM.

Details

Summary

Differential Revision to tackle T1125 and part of T993.

Test Plan

No tests have yet been written for this project. The tester CLI builds, however.

Revert Plan

Reset to commit f0a4d2381f7d | D267.

Diff Detail

Repository
rS SIMPLEXpress
Lint
Lint Not Applicable

Event Timeline

bdlovy created this revision.
Restricted Application completed remote builds in Restricted Buildable.Jan 16 2020, 4:54 PM

Just a repo-wide sweep, no real change yet.

bdlovy updated the revert plan for this revision. (Show Details)EditedJan 16 2020, 4:56 PM

simplexpress-source/src/simplex.cpp:78 is my entry point for the translation.

StringPiece docs.

  • Merge remote-tracking branch 'origin/master' into remove-icu
  • Merge remote-tracking branch 'origin/master' into remove-icu
  • Successfully build with onechar/onestring
Restricted Application completed remote builds in Restricted Buildable.Jan 18 2020, 11:59 AM

Successfuly build with "make ready" and "make tester". Tester parses latin basic strings, breaks on extended Unicode as expected "simplexpress-source/src/utf_tools.cpp:138" with a segmentation fault. Valgrind shows memory leaks, haven't started hunting.

@jcmcdonald one option could be to move forward with only ASCII for now and come back to full Unicode as PawLIB evolves? ASCII is sufficient to begin building a rR Ratscript lexer.

bdlovy edited the test plan for this revision. (Show Details)

That sounds smart. You have my go-ahead to only use ASCII right now.

bdlovy retitled this revision from Rip out ICU in favor of PawLIB to Replace ICU U_Char and UnicodeString with PawLIB onechar and onestring.Jan 18 2020, 12:40 PM
  • Separate utf_tools from specifier

SIMPLExpress builds, but the last link stage of make tester fails. Unfortuntely I'm mildly stumped and it's bedtime.

Restricted Application completed remote builds in Restricted Buildable.Jan 21 2020, 7:12 PM

It's darn close...

  • D269 successful build

Found the relevant part of the CMakeLists config...

Restricted Application completed remote builds in Restricted Buildable.Jan 25 2020, 9:39 AM
bdlovy edited the summary of this revision. (Show Details)
bdlovy edited the test plan for this revision. (Show Details)
bdlovy added a project: Unknown Object (Project).

@jcmcdonald I might need some guidance on how to get this revision ready to land. Am I close? The biggest issue I see is the commented out sections that I'm leaving for reference when we come back to T1278. Is there a better way to handle that?

jcmcdonald added a subscriber: mduminer.

This is really just one of those commits that I expect to be pretty fairly broken. You're not so much adding a feature as you are preparing for a refactor; the kitchen looks like a mess because you're demolishing, removing, and replacing some structural elements. Because of that, it's pretty reasonable to suspend P1 on this Differential.

The commented-out code is acceptable here for the same reason. This isn't production code. You're just getting the code into a state where you and @mduminer can work on it.

README.md
11–13

Go ahead and invert these. I prefer Authors to be in alphabetical order by surname, just to ensure there's no "politics" surrounding who's first.

simplexpress-source/include/simplexpress/utf_tools.hpp
52

I suspect StringPiece will still just be a onestring.

simplexpress-source/src/char_sets.cpp
3

These namespaces turned out to be a mistake on my part. This just doesn't look nice...

simplexpress::char_sets::arr_greek_lower

...and this isn't particularly convenient...

using simplexpress::char_sets

char_sets::arr_greek_lower

Let's drop the simplexpress namespace from the whole project, and then just come up with good class names (or specific namespaces) for things that need to be grouped, like char_sets.

139

Isn't ^[ 0-9 <AEBTXaebtx> ]/ inherently Latin anyway?

simplexpress-source/src/unit.cpp
161–162

...at least it shouldn't be. If it is, something needs to change in onechar.

As it stands, this will only look at the first part of a multibyte Unicode character.

This revision is now accepted and ready to land.Jan 30 2020, 8:24 AM

Hrm - @jcmcdonald I'm having trouble landing this, and was unable to clone a new copy. Permission denied. Did anything change with the poermissions for this repo, or is this problem likely local on my end?

 TARGET  Landing onto "master", the default target under git.
 REMOTE  Using remote "origin", the default remote under git.
 FETCH  Fetching origin/master...
git@phabricator.mousepawmedia.net: Permission denied (publickey,keyboard-interactive).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Usage Exception: Fetch failed! Fix the error and run "arc land" again.

You have push permissions; I double-checked on that just now.

Notice this line...

Permission denied (publickey,keyboard-interactive).

That means your SSH key is not being accepted. That almost alwas means you don't have your SSH key added to your Phabricator profile. Make sure you've created an SSH key on your computer (you can use an existing one if you prefer), and then upload your public key to your Phabricator profile.

Hm, that's odd - I've successfuly pushed before! I'll have to take a look after this pair session, thank you.

There you go! What'd you do different?

I just had to run ssh-add again! Moshe mentioned he needs to re-run it after reboots sometimes.

That's very odd. You might put that in a Ponder question. I doubt you'll be the last person to have that issue.

jcmcdonald edited projects, added SIMPLEXpress [Project]; removed Unknown Object (Project).Jun 19 2021, 10:26 AM