Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-81] - Chart creation without a dataset #19953

Closed
eschutho opened this issue May 4, 2022 · 3 comments
Closed

[SIP-81] - Chart creation without a dataset #19953

eschutho opened this issue May 4, 2022 · 3 comments
Labels
preset-io sip Superset Improvement Proposal

Comments

@eschutho
Copy link
Member

eschutho commented May 4, 2022

[SIP-81] Proposal for Chart creation without a dataset

Motivation

Currently a user needs to create a dataset for each chart that they want to create. Many times these charts aren’t kept for long, usually either never making it to a dashboard or because someone just wants a quick view of their data to share for feedback or to gain insight into their own queries, tables, etc. A lot of new users don’t understand what a dataset is or why they need it. We want to allow people to progressively move into dataset usage, and allow them to create a chart quickly based on either a query, saved query, table or dataset. When they save we will prompt them to name a dataset, which will be a much lower barrier to visualizing their data quickly.

Proposed Change

Users should be able to create a chart from the chart page, from sql lab, or from a dataset. From explore or SQL Lab, they need to be able to view a chart, apply filters, see a list of columns in their query or table just as they do now, but without creating a dataset. If coming from a dataset view, they should be able to continue to use a dataset to back a chart as they can currently.

This solution is based on the recently approved flow: #18584 Per this flow, users will be able to create a chart from any of the above listed data types. When saving the chart, they would be required to create a dataset. It's possible in the future that we may relax the restriction to save a dataset in the future.

1st PR for chart creation with a query is here: https://github.com/apache/superset/pull/19812/files

As part of SIP 68, we will be creating a mixin that contains all of the necessary functionality to power a chart. By extending that mixin to other models that have the necessary relationships (database, schema, columns) those models can also be used to power a chart.

We currently have two types of datasources in the config, SqlaTable (Dataset) and the Druid Datasource. If a chart connects to something, the proposal is that it should be a datasource. It follows in line with the methodology of what we are trying to achieve and doesn’t add in any complicated middle layers, and will be very extendable. With SIP 68 and Superset 2.0 we are in the process of removing Druid NoSQL Datasource and the datasource as a config and instead limiting the datasources to those classes that have the functionality needed to power a chart.

As part of SIP 68 there is also a PR to convert the ConnectorRegistry which uses the configs to a [DatasourceDAO](#19811). This DatasourceDAO will be used to retrieve any type of object that is configured to be a datasource.

Examples of specific work to be done per datasource type:

  • Charts by Tables:
    • Import/export:
      • since a chart cannot be saved until it has a dataset, this is n/a for now
    • Explore/Dashboard view:
      • When selecting a table as a datasource, we would create a sl_table instance and save it to the chart as a datasource. The sl_table would have all the column information needed to power the explore view.
      • On save, we just create the dataset to point to the already created Table.
    • SQL Lab to explore:
      • This only applies to queries
  • Charts by Queries :
    • Import/export:
      • since a chart cannot be saved until it has a dataset, this is n/a for now
    • Explore/Dashboard view:
      • The Query will store column information needed to power the explore view in a new Columns column as a json blob. Since queries are immutable, we will only need to read this data, which is currently sent all at once to the client in bootstrap-data and then searched/filtered client side.
      • If it saves time/effort we are evaluating the possibility of not having cache for Queries.
    • SQL Lab to explore:
      • A chart will be linked to a Query from this flow. This is the only way that someone can create a chart from a Query.
      • On save, we create a dataset and add the query as the expression
  • Charts by SavedQueries:
    • Import/export:
      • since a chart cannot be saved until it has a dataset, this is n/a for now
    • Explore/Dashboard view:
      • When selecting a SavedQuery as a datasource, we would tie that object to the chart. The SavedQuery would have a new relationship to Column (i.e., sl_columns) for all the column information needed to power the explore view.
      • On save, we create a dataset and add the query as the expression
    • SQL Lab to explore:
      • n/a
  • Charts by Dataset:
    • We need to update the old SqlaTable to a new Sl_dataset as part of SIP68. Everything else will be the same.

New or Changed Public Interfaces

New UI flows are described here:
#18584

New dependencies

None

Migration Plan and Compatibility

We will need to add a relationship to sl_columns for Queries and SavedQueries

Rejected Alternatives

  1. Create a temporary dataset without explicitly asking the user to do anything

    Pros: Simple for engineering, seamless, not much extra work.

    Cons: Users will see a bloated list of datasets in their dataset crud view and won’t know what they are.

    1b. Mark these datasets as hidden and don’t show them on the CRUD page.

    Pros: Simple, easy to build. Users don’t see extra datasets.

    Cons: It gets complicated to have two different types of datasets, especially now that we are cleaning up the virtual vs physical. Now we would have hidden and visible, but we’re saying that the chart is backed by a query table, when in reality it’s not

  2. Create a dataset just during the request cycle

    Pros: Doesn’t bloat the user’s CRUD list; There aren’t two types of datasets that we have to deal with

    Cons: It’s also complicated to create a dataset each time and could slow down performance, especially if we have to query their database too often.

  3. Request the column data from the db each time we need that information

    Pros: We don’t need to store any extra data except on the client side.

    Cons: Poor performance, and could incur extra cost to the user for db usage.

  4. Make a lightweight dataset by storing just column data in redis

    Pros: We don’t need to deal with any database models and/or database

    Cons: We are adding a separate middleware to the models when we don’t need to. Plus we would need to write up all of the logic for storing/retrieving the data.

@superset-github-bot superset-github-bot bot added preset-io Superset-Community-Partners Preset community partner program participants labels May 4, 2022
@eschutho eschutho changed the title SIP- Chart creation without a dataset DRAFT SIP- Chart creation without a dataset May 4, 2022
@eschutho eschutho removed the Superset-Community-Partners Preset community partner program participants label May 4, 2022
@eschutho eschutho changed the title DRAFT SIP- Chart creation without a dataset [SIP-81] - Chart creation without a dataset May 10, 2022
@eschutho eschutho added the sip Superset Improvement Proposal label May 10, 2022
@eschutho eschutho closed this as completed Jun 7, 2022
@simonvanderveldt
Copy link

simonvanderveldt commented Sep 6, 2022

@eschutho This issue is marked as done, I see #19981 is merged (although I don't understand the relation to this SIP tbh) and available in 2.0.0. We're running 2.0.0, but I don't see a way to create a chart without creating a dataset. Just wanted to check if this is really done? Or maybe I am missing a setting somewhere?

@eschutho
Copy link
Member Author

eschutho commented Sep 7, 2022

Hi @simonvanderveldt, charts by queries will be available in version 2.1 which is in the early stages of the release process now. There were a few breaking changes that went into 2.0 that were necessary in order for the charts by queries feature to be built. The charts by table and saved queries features are currently on hold while we work on some other features.

The SIP is marked as done as an indication that it was approved, but not necessarily that the work has been completed. So beginning 2.1 you should be able to go from SqlLab to explore without creating a dataset.

@simonvanderveldt
Copy link

@eschutho All clear, thanks for the clarifications! I'll keep an eye on the 2.1 release then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preset-io sip Superset Improvement Proposal
Projects
Status: Implemented / Done
Development

No branches or pull requests

2 participants