Skip to content
NNB edited this page Sep 11, 2021 · 18 revisions

NGS Data Model Design - WIP

This page describes data model that NGS will be using to manipulate resources. Resource is any kind of external object. Examples: EC2 instance, Terraform logical resource, File (On local machine or in S3 bucket), a line for specific machine in ssh "known hosts" file.

Requirements

  • Unified resource manipulation API. The work has been started in the AWS library, see Resource API methods below.
  • Unified view of a resource, combining information about the resource that comes from different sources. This is especially important for the UI. For example, an AWS Instance can have information about it coming from different sources:
    • EC2 (describe-instances API call)
    • EC2 (instance load balancer health information)
    • CloudTrail (created time, created by user, modified time, modified by user fields)
    • CloudWatch logs
    • Consul (and service checks)
    • Monitoring system (instance services status)
    • Graphing system
    • Terraform/CloudFormation (logical resource name)
    • Output (probably processed) of an SSH command executed against the server
    • Billing (Example: "aggregated cost of listed instances last month: xxx" or alike)
    • Reserved instances coverage
    • Spot instances pricing
  • Generating textual representation of resources
    • There should be a way to generate textual representation of both
      • An existing resource
      • Steps taken (using interactive shell for example) to create a resource (these will include more semantic information as opposed to existing resource)
    • Textual representation examples:
      • NGS code
      • AWS CLI code
      • Terraform/CloudFormation definition
    • Handle specific vs abstract. Examples:
      • is it specifically ami-xxxxxx or is it latest Debian 9.X AMI for the region?
      • is it subnet-xxxxxx and subnet-yyyyyy or these are subnets tagged with role=front?
      • is it operation on instance i-xxxxxx and i-yyyyyy or on instances with specific tags or AMIs?
  • Synchronous APIs adapters for asynchronous APIs: support polling commands after update
    • Polling commands should be able to report following situations:
      • Should continue waiting
      • Permanent error
      • Success (expected value is set)
  • Support for asynchronous API
    • Support for "wait" commands, such as SSH availability after EC2 instance creation
  • Change reporting (did converge actually modified the resource; if yes, which properties and how)
  • Modular design. It should be easy to add:
    • New Schema pieces
    • New modules/connectors to work with new APIs
  • Manipulating schema
    • Schema should be easily imported/exported
    • API/CLI to view/edit the schema (the data model)
  • Support API calls quotas.
  • Support resource limits.
  • Ability to make diffs (betwneen statuses of the syste in different points in time). Allows:
    • Generate code (NGS / Terraform)
    • Functionality such as "apply this to other region(s) / account(s)".
  • Support logical aliases which resolve to different resources depending on context. For example per-region AMIs for same logical image. Debian-9 AMI alias should then resolve to different AMIs, based on region.
  • Support modifying the resources using external programs to manipulate the resources.

Design

  • Each resource type will have schema definition
  • Schema will contain
    • CLI (and other?) commands used to create and delete particular resource type
    • Fields names and types
    • Fields values format. This will allow to for example to understand
      • That i-xxxxxx means EC2 instance type, unique id. Command line tools could be greatly improved.
      • That digits with dots are IPs, which can be on instances, load balancers, in DNS records, Elastic IPs, etc.
      • s3://xxxx/yyyy is a unique id of an object in S3 bucket.
      • arn:... formats
    • Fields properties (such as: read-only, auto-generated, field is unique id)
      • These will be NGS types? Will allow something like if f is UniqueId { ... }.
    • CLI (and other?) commands used to retrieve and update any particular field
    • Relations between resources (between types as well as between individual resources)
      • Directed arrows with types. Examples:
        • EC2 instance -> is in VPC -> VPC
        • Subnet -> is in VPC -> VPC
        • EC2 instance -> has disk -> EBS volume
        • ELB -> has instance -> EC2 instance
        • AWS Lambda -> uses artifact -> S3 Object
      • For each relation type:
        • Required? For example EC2 VPC instance is required to have "is in VPC" arrow
        • API methods to establish and tear down relation ???
  • Resource API methods
    • create - creates a resource with given properties
    • delete - deletes given resource
    • converge - create or update a resource to converge to specified properties.
      • have a way to specify properties that will not be converged to but only used during creation

This approach is somewhat similar to configuration management and orchestration systems. Important differences are:

  • I'm viewing it mostly from scripting and UI angle where small steps are the focus, not where all the information is passed to the system and then there is phase in which all the resources are converged to the desired state. Somewhat similar is Puppet RAL.
  • Another difference from "classical" configuration management systems is that most of the resources are expected to be manipulated via APIs. While the former focuses on "providers" which do the work, the proposed model focuses on the data and it's semantic meaning.
  • Easy extensibility as all information gathering and manipulation operations are done via running external commands (or maybe calling APIs).
  • Classical configuration tools don't deal with relations between resource types.

Schema (examples based)

Object types schema (example 1)

  • Type name - restype:aws/ec2/instance ???
  • Display name - such as AWS EC2 Instance
  • CLI name - i (because of i-xxxxxx) ???

Object types schema (example 2)

  • Type name - restype:aws/vpc/eip
  • CLI name - eip
  • Id property - PublicIp
  • Commands
    • List - aws ec2 describe-addresses

Resource types' relations (example 1):

  • Relation name - reltype:aws/vpc/in-vpc
  • Display name - is in VPC
  • Reverse relation name - reltype:aws/vpc/vpc-contains
  • CLI name - in-vpc ???
  • Left type - restype:aws/ec2/instance
  • Right type - restype:aws/vpc/vpc
  • Commands
    • Connect - n/a

Resource types' relations (example 2):

  • Relation name - reltype:aws/vpc/has-eip
  • Display name - 'has Elastic IP'
  • Reverse relation name - reltype:aws/vpc/eip-is-on
  • CLI name - has-eip ???
  • Left type - restype:aws/ec2/instance
  • Left id property name - InstanceId
  • Right id property name - PublicIp
  • Right type - restype:aws/vpc/eip
  • Commands
    • List - aws ec2 describe-addresses
      • Note that "Domain": "vpc" is a required filter
    • Connect - aws ec2 associate-address --allocation-id ${right.AllocationId} --instance-id ${left.InstanceId}
    • Disconnect - aws ec2 disassociate-address --association-id ${link.AssociationId}

Remarks

  • AWS botocore has JSON files which describe different services and their API.
    • Included
      • Data structures to pass when calling API
      • Data structures returned from API calls
    • Not included
      • The model - what are the fields and types of the objects.
        • The data structures which describe API call results include pagination, idempotency tokens, etc. Then you need to guess where the resources are in the data structure.
        • Not all semantic data is present. For example, InstanceId is just a String. In practice it's (1) auto-generated (2) unique id and (3) has special string format.
      • Filters description. That's also the reason that AWS CLI does not have filters completion.

TODO

  • "wait" commands
    • schema
    • methods
  • Design command line tools for manipulating resources, similar in spirit to na but more generic.

Open Issues / Maybe TODO

  • Data driven fields/columns. Example: spot instance pricing is only relevant for spot instances.
  • Maybe: support queries batching (for very low API calls quotas such as route53)
    • Related: Handle API calls that can update/create multiple resources at a time.
  • Bring ideas to make development of na easier.
  • converge is not allowed to delete and the create a replacing resource in order to "modify" read-only properties such as AWS Instance AMI. Maybe this should be allowed under the condition that it will not surprise the caller, maybe allow_delete=true or delete_policy=... or something similar.
  • Servers and containers
  • Ability to create DAG of described resources and parallelize similar to Terraform?
    • Doesn't fit the simple top-to-bottom execution model.
    • Doesn't fit "it's just a library for simpler more powerful coding" model.
    • Great performance boost.
    • Need to think about general DAG facility in NGS, maybe then it will fit well.
  • Ability to list and handle together a resources of different types.
    • Examples
      • Resources based on their tags and not type
        • CLI - aws resourcegroupstaggingapi get-resources
        • Web
      • all resources from Terraform
      • all resources in a specific region
      • all resources in a specific VPC
      • all "failing" resources from monitor
  • Should these also be resources? Looks too much like a configuration management system then:
    • A user
    • A service
  • Which standards and tools can help?
    • RAML / Swagger would be of any use?
  • Maybe puppet RAL can be used by NGS is yet another API for manipulation of resources? Will there be enough schema information?
  • Completion support
  • If a file is say Nginx configuration file, when the content is set, should it be validated (by running Nginx syntax check nginx -t for example) ?
  • Sometimes change status (whether a resource of a particular property of a resource was updated) is important, sometimes it is not. Maybe different API calls or commands can be used.
  • It would be nice to handle creation/deletion and per-property changes specifically (meaning having different events triggered) as opposed to classical systems where any kind of change triggers update of another resource.
  • Tools to manage the data model?
    • It would be nice to create big chunks of AWS model from the botocore JSON files for example
    • Central exchange?
    • Imagine running an AWS CLI command + wizard to generate additional piece of schema
  • Collaboration?
    • Locking support (Terraform style)?
    • Central logging?
  • Transport support for remote resources
    • A file on remote machine can be created using SSH transport for example
  • How to track which properties are OK to change and which will be overridden by Terraform/CloudFormation? This situation needs at least to cause a warning.
    • A property which is in lifecycle > ignore_changes should be OK
    • Would this require complete understanding of Terraform data model (all providers)? That's quite a task.
    • When used interactively, it should be suggested to modify the definition (Terraform/CloudFormation) when trying to modify an instance of a resource.
    • A good starting point with relatively small amount of work is just to notify that a resource at hand has Terraform/CloudFormation definition.
  • Support import/export of data
    • Examples
      • Input from hosts.csv list of machines to be operated on by ip or InstanceId column.
      • Output to hosts.json when including data (columns) from EC2 and Nagios.
  • Schema for schema?
  • Remember that different values in a property might require different commands to manipulate and not all transitions are possible. Think about stopping and running EC2 instance (changing the State property).
  • Warning: description of translating property change into a command may turn into turing complete DSL.
    • Would it be OK to have lambdas associated with some of the properties changes? Hinders interoperability.
  • Note that some commands will require data structures as command line arguments (route53 rrset manipulation for example)
  • Take a good look into https://github.com/GoogleCloudPlatform/magic-modules
  • In case of AWS consider the almost 1:1 relation between API methods and the AWS CLI. Maybe schema can have mapping for both? Because of AWS CLI startup time, it should be much faster to just do the API request. On the other hand AWS CLI commands can be shown, edited, reused, etc.
  • Types inheritance? Elastic IP -> EC2 Elastic IP ; Elastic IP -> VPC Elastic IP
  • Support hooks? Could be used for example for chat notifications when resources are created/destroyed/updated.
  • Locking for teamwork?
  • In-shell chat with team members? If every object will have id it should be easy to use non-integrated chats anyway.
  • Some kind of changelog? (or rely on CloudTrail?)
    • Sync among team members so they know what others are doing?
      • Just work in the shell and all update commands go into the log?
    • Maybe just push the changes description into a chat channel?
      • Which format will be convenient for copy+paste into the shell to get more info about the changed object?
  • Views. Should be configurable. That's how one can define which information and in which detail to show. Examples:
    • "capacity" (or "operational"?) - instance types, ebs sizes, reserved iops, CPU usage, load balancer perspective on instances health, etc.
    • "security" - security groups with detailed rules, etc
    • "billing" - costs
  • Provide resource provenance information - when and how it was created.
  • For CloudFormation - provide changes preview.