Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testbed: 16 bit memory layout for buffer #1855

Closed
wants to merge 25 commits into from

Conversation

jerch
Copy link
Member

@jerch jerch commented Dec 21, 2018

Counterpart to #1854 but with UTF16 codepoint and attr indirection.

Implements:

  • 16 bit buffer layout as discussed in Buffer performance improvements #791
  • utf8 input encoding
  • direct buffer access in InputHandler.print (therefore the JS array buffer will not work here)

Buffer layout:

|              uint16_t               |              uint16_t               |
| attr pointer(13) comb(1) wcwidth(2) |          BMP codepoint(16)          |
|                                  uint32_t                                 |
|                                   cell                                    |

Other than in the test described in #791 (comment) I changed the following things here:

  • access is done by a single 32 bit value cell
  • attr storage is now a simple JS array, matching is done by indexOf instead of a RB tree

The changes show much better runtime behavior than the old test, its still slower than the 32 bit version. Memory usage is the same as in the old test. The attr storage can be further optimized by using another typed array.

More detailed test results will follow.

/cc @Tyriar, @mofux, @bgw

@jerch jerch added the work-in-progress Do not merge label Dec 21, 2018
@jerch
Copy link
Member Author

jerch commented Dec 21, 2018

First results (tested with ls -lR /usr/lib in demo app with default settings):

grafik

grafik

grafik

grafik

@jerch
Copy link
Member Author

jerch commented Dec 21, 2018

The results above are much more promising now regarding the runtime compared to the same test run in the 32 bit version #1854 (comment). Saved memory is ~ 0.5 MB.

Note that my test call only created max 5 attr sets for all rows, thus indexOf does not show the bad O(n) runtime yet. This is likely to change for more colors in a row. Same goes for the char content - since the combined workaround kicks in earlier than in the 32 version, high unicode chars will see a much worse runtime.

So things to address in further tests:

  • replace attr storage JS array with typed array
  • more different attrs per row
  • high unicode + combined char tests

@jerch
Copy link
Member Author

jerch commented Dec 22, 2018

Results in xterm-benchmark:

            Case "#1" : 10 runs - average runtime: 1728.82 ms
            Case "#1" : 10 runs - average throughput: 27.63 MB/s

@jerch
Copy link
Member Author

jerch commented Dec 23, 2018

The attr storage is still flawed as it will hog memory over time for many ongoing cell changes which might even survive a buffer roundtrip due to the recycling 😊. No way around the ref counting I guess.

Also with js array the attr storage is already at 200 bytes for just 2 one single number entries (due to preallocing empty slots), while the line content typed array for 87 cols is just 532 bytes per line.

Much room for improvements still.

@Tyriar
Copy link
Member

Tyriar commented Dec 27, 2018

attr storage is now a simple JS array, matching is done by indexOf instead of a RB tree

Are they packed something like 24 bg, 24 fg, 5 remaining attributes? The fact that number is actually a float64 would probably also slow things down? The webgl renderer needs fast access to bg and fg as they're used as keys for the glyph cache.

@jerch
Copy link
Member Author

jerch commented Dec 28, 2018

@Tyriar Currently the attr storage does not deal yet with RGB values, it just stores and compares the single attr number we currently use. With true color support the storage will have to match against 2 numbers. Imho its not possible to squeeze them into one number (we need around 58 bits, but a number can only store 53 exact bits).
Thus the final attr storage will have to do 2 number comparisons (slightly slower). This would change with Uint64 support, which is in draft (v8 supports this since summer).

@jerch
Copy link
Member Author

jerch commented Dec 29, 2018

Closing for now and marking as reference as we might want to address this in the future again.

Lessons learned from the 16bit vs. 32 bit version:

  • saves ~1 MB memory for 1k lines with typical terminal input
  • memory footprint depends highly on needed indirections (worst case: all cells contain high unicode + different colors - memory will almost double compared to the 32 bit version)
  • UTF16 handling is ~5 MB/s more costly than UTF32 during input
  • UTF16 is slightly faster for translateToString due to being directly mappable to JS string charCodes
  • tracing and matching of attr usage is a big deal memory- and runtime-wise, something like ref counting will impact runtime further (not tested here)
  • attr storage/indirection access is further optimizable by holding/comparing the ref in renderers (needs changes in access logic)
  • no further reliable predictions possible without a fully working attr storage with true color support

Currently the 32 bit version wins due to being faster and much easier to implement with true color in mind.

@jerch jerch closed this Dec 29, 2018
@jerch jerch added reference A closed issue/pr that is expected to be useful later as a reference and removed work-in-progress Do not merge labels Dec 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference A closed issue/pr that is expected to be useful later as a reference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants