Page MenuHomePhabricator

Storage
Updated 750 Days AgoPublic

This needs to be reevaluated and rewritten. Several major changes to Ratscript's syntax and behavior will necessitate one or more name tables, with references (either immutable or mutable) to values.

This system is designed to accomplish the following goals:

  1. Diminish space wasted due to packing.
  2. Prevent fragmentation.
  3. Reduce dynamic allocation calls.

Catalog Headers

Every variable has a header, which stores its name, type, and data address (see Data Pools). Entries are stored alphabetically by name, to permit binary search. These are allocated as large sectors of memory, referred to as "pages".

There are three sizes:

  • Large Catalog: These are allocated as 65536 B (64 KB) pages, with a maximum of 2048 full-sized entries. These are intended for general use.
  • Medium Catalog: These are allocated as 32768B (32 KB) pages, with a maximum of 1024 full-sized entries. These are intended as the default for classes.
  • Small Catalog: These are allocated as 4096 B (4KB) pages, with a maximum of 128 full-sized entries. They are intended for use in items that rarely have very many items, such as functions. (Thus, simultaneous functions are going to tend to slow down more than single functions and/or straight linear code.)

Each time a new scope is declared, it will request a dedicated catalog. If an inactive one of an appropriate size exists, it will be marked for use. One large and one small catalog is created initially and kept alive for normal use. The initial large catalog would be the global catalog, and the initial small catalog would be used and cleared frequently for nearly all functions encountered in standard use, thus potentially avoiding dynamic allocation completely.

Each entry has the following layout. Sizes listed are maximums.

Name (25B)0x1FType (2B)0x1FData/Address (2B)0x1E
  • Name (25 bytes)
    • The variable name is limited to 25 characters.
    • No null terminator is required - we terminate with US.
  • Type (2 bytes)
    • The first byte of the type is based on the Ratscript data type enumerations.
    • The second byte of the type is used for class IDs on user-defined classes. (Thus, an upper limit of 255 classes can exist at once.)
    • Constants are indicated by the two's complement notation on the data type number.
  • Data/Address (2 bytes)
    • Data types with a capacity 16b and below can be stored directly in this spot.
    • Memory slots (see below) are numbered using a 2-byte hexadecimal address.
  • Separators
    • We use the unit separator US between segments, and record separator RS at the end of each entry.
    • Anything after the last entry should be 0-set.

Each entry has the following layout. Sizes listed are maximums.

Name (25B)0x1FType (2B)0x1FData/Address (2B)0x1E

Creating: We binary search for the new name, and assuming it doesn't turn up, we create the new entry at the last search location...

  1. Last byte should not equal 0x1F, otherwise the page is full. (See below.) If not full...
  2. All values at right >> 32
  3. Write new entry.

Searching: We search for a variable by name, using a binary search, starting on the middle US.
Deleting: We delete a variable by searching for it, and then...

  1. Last byte should not equal 0x1F, otherwise we may need to work across pages. (See below.) If not full...
  2. 0-write entry to be deleted.
  3. All values at right << 32

If we run out of room on a page, we can dynamically allocate another one. We will have to now factor in these additional pages.
Creating:

  1. If the current page [N] is full, >> 32 last page [L].
  2. Copy last entry of [L-1] to front of [L].
  3. Repeat the last two steps for each page approaching current page.
  4. Write last byte of [N] to front of [N+1]
  5. >> 32 [N]

Searching: Binary search pages by their first byte. If current page's first byte is >= target, and next page's first byte < target, work in current page.
Deleting:

  1. 0-write entry to be deleted.
  2. All values at right << 32 on current page [N].
  3. Copy first entry of [N+1] to last entry of [N].
  4. Repeat last two steps for each page approaching [L].

NO PADDING! Use defragment function instead.

  1. Find first unprocessed RS (use iterator).
  2. If there is a 0-fill byte, remove and << 32 everything at right.
  3. Append 0-fill to end.
  4. Repeat until reached end of page.
  5. If there is less than 32 empty bytes available, mark page as full. (How?)

Data Pools

We need to allocate a large pool for anything greater than 16 bits. These would likewise consist of 32-bit entries. The main feature of this pool would be to defragment this memory. The initial size should be 64 KB.

Each 32-bit entry would have a hexadecimal address, which would be stored in the header entry (see above).

Technical Notes and Other Questions

Right, as if the rest of that wasn't technical enough already...

  • We're avoiding STL here, so each pool needs to be stored in a dynamic array by type, NOT a vector.
  • Where do strings get parked? Seeing as they're an array of 8- or 16-bit chars, we can blow through our pools entirely too quickly. Perhaps those pools need to be larger?
  • Any variable longer than 64b would need to either be stored in concurrent spots, or dynamically allocated anyway.
  • How are data structures stored?
Last Author
jcmcdonald
Last Edited
Jul 20 2020, 12:17 PM