Skip to content

Serialization Design

Ilya Sher edited this page May 31, 2022 · 8 revisions

NGS Serialization Design (WIP)

Requirements

  • Forward and backward compatibility
  • Easily processable (as much as practical) by existing text processing tools -> Tools that would convert to and from the serialized format
  • Support graph of objects
  • Support transient fields/values
  • Support streaming read/write
  • Support enum (when added to the language)

Design

The Format

All values except for strings are little endian.

Overall layout:

  • The string NGS-SERIALIZED--. (two dots are padding to 16 bytes)

  • Format version - 2 bytes int with value 1

  • Any amount of type-meta-length-value

    • Format
      • Type - 2 bytes int
      • Meta - 2 bytes int
        • Bit 0 - action for a filter program that doesn't recognize the type (only if bit 1 is 0)
          • 0 - keep
          • 1 - remove
        • Bit 1 - it's an error if the reader doesn't recognise the type
          • 0 - not an error
          • 1 - an error
        • Bits 2 till 15 - reserved, must be set to 0
      • Length - 4 bytes, can be zero
    • Types (type & meta)
      • 1 & 2 - end marker
      • 2 & 0 - type definition chunk
      • 3 & 1 - cryptographic algorithm and parameters (TBD, at beginning of the stream)
      • 4 & 1 - cryptographic signature (TBD, at end of stream)
      • 16 & 2 - object start
      • 17 & 2 - object end
      • 3 till 255 - reserved types
      • 256 till 32767 - predefined types
  • End marker: type=1, meta=2, len=0

type definition chunk (data section)

  • type - 2 bytes int - the new integer being assigned to a type
  • length - 4 bytes
  • data - JSON array, specifying

The Tools

  • For compatibility with external tools
    • Line-based format
    • JSON will be used
    • JSON parts will be easily extractable
    • JSON parts will convey information that is of interest to external tools: the main data.
    • Easily extractable JSON parts will not convey information that is of interest mostly to NGS, such as types and view options.
  • For forward and backward compatibility
    • Each metadata item (key-value pair) will be classified into one of the following categories, specifying behaviour of an unserializer that doesn't know how to handle the item.
      • "error" - unserializer must know how to process the given item, otherwise it's an error.
      • "keep" - keep the item for further processing down the line
      • "remove" - remove the item

Open Issues / Unfinished thoughts

  • Support type's versions like Java's serialVersionUID?
  • Consider a place for external tools to place their data which will be ignored by and preserved by NGS
  • Cryptographically sign locally generated serialized data so it could be more "trusted"?
    • If yes, JWT is probably the best signature format
    • Allow several signatures? Should allow easy certificates rotation, etc.
  • Network friendliness (frames)
  • echo() on non-tty will output serialized data?
  • Track/keep all commands that were involved in creation of the data?

JSON Serialization for UI

Note: this section is not related to the sections above and is motivated by the urgent need for serialization for communicating with UI and is probably not that well thought through.

Top Level

{
  "ngs-serialization": "0.1",
  "data": ...
}

An Object

{
  "type": "UNIQUE-TYPE",
  "id": "UNIQUE-ID",
  "fields": {...},
  "items": [...],
  "value": ...
}
  • type - ngs:type:ngs-lang.org/types/xxx (the resource at the URL does not have to exist)
  • id - ngs:id:1:random-id (version 1 of ids is purely random globally unique id)
  • Only one of fields, items, or value can be present. For some types, both can be omitted.
    • fields is used for map/hash-like objects
    • items is used for list/array-like objects
    • value is used for scalars such as numbers, booleans, strings