In plain and simple terms, how do I make sure my queries will take less than 1 second to return, taking into account "sharding" considerations? #651

lancejpollard · 2025-01-09T04:36:53Z

lancejpollard
Jan 9, 2025

A year ago I decided on Supabase because it is PostgreSQL (which I'm familiar with, and has amazing fuzzy text search support). Yesterday I realized that is not going to scale beyond like 500 concurrent users hammering the site. Now I'm debating between Neon and PlanetScale. Neon seems nice because it is also PostgreSQL, so I won't have to fully rewrite my app in MySQL, though there is still some porting involved. But it too is limited in scale and performance, and I am unsure it will consistently result in sub 1-second queries.

So here I am looking further into PlanetScale, but my main concern now (other than the fact it has no fuzzy text search, or foreign key constraints), is the sharding considerations you have to take into account, so that your queries are performant (and get to that sub 1-second mark). The docs have too much text and not enough code and bullet points summarizing things like I am 5 years old. But I love the monospaced font!!!

Can you explain in simple (ideally Node.js) code snippets, and simple bullet points, what I need to do to (a) structure my database tables, and (b) to manage/control the sharding process and keys? I don't get it yet, and it's my main hesitation to jumping in all the way.

From my conversations with ChatGPT about PlanetScale, from what I remember, it happens automatically somehow, yet you need to be careful not to have your data be spread across different shards, because (a) read queries across shards are SLOW (2-10x slower), and (b) there are no multi-step transactions supported for queries/inserts/updates/deletes across multiple shards.

Given that, how do I manage my data? For example, I am working with word/language data, and I have about a dozen tables supporting this JSON structure (much of its structure was left out of this photo):

Using that as a rough base/example, what do I need to learn/read to figure out how to reorganize my SQL tables if I were to bite the bullet and migrate from Supabase to PlanetScale?

One query that Supabase is struggling with is fairly simple:

SELECT "ls"."id",
       "lss"."string__id",
       "lss"."text",
       "ls"."role",
       "ls"."is_head",
       "ls"."context__id",
       "ls"."language__id"
FROM   "language_string_search" AS "lss"
       LEFT JOIN "language_string" AS "ls"
              ON "ls"."id" = "lss"."string__id"
WHERE  "lss"."language__id" = $1
       AND "lss"."is_phonetic" IS FALSE
       AND "lss"."is_head" IS TRUE
ORDER  BY "lss"."text" ASC
LIMIT  $2 offset $3

This table has about 200 million records so far.

I am fetching these "search records", then I use string__id to fetch the string records. And from the string records too, I fetch the transcription records by string__id, and pronunciation records, each separate queries, etc.. So not sure how to handle this type of thing taking into account sharding, as a basic example.

lancejpollard · 2025-01-09T05:10:30Z

lancejpollard
Jan 9, 2025
Author

A good choice of shard key can lead to excellent performance, but a poor choice can be detrimental to your database.

😨

https://planetscale.com/sharding

0 replies

mscoutermarsh · 2025-01-09T14:05:38Z

mscoutermarsh
Jan 9, 2025
Maintainer

Hello @lancejpollard! Great question & thank you for including the example query.

When to shard and how to prepare for it

Generally, you won't benefit from sharding until the database is in the range of 250gb-1tb of data. A single (non-sharded) PlanetScale database can scale much higher than that. But, around that data size is when you really start to benefit from the advantages of sharding (such as faster backup & restore, speed of schema changes and resiliency).

Even without sharding, there are many benefits to using vitess. Such as how it can easily handle huge amounts of connections (there is a built in proxy layer known as "vtgate").

Preparing for sharding
If your application does not need sharding yet, but will in the future, it is good to plan for it in your schema. Since making changes only gets more difficult as data size grows.

A few things to know

You do not need to shard every table, it is very common for databases to only shard tables which benefit from it. And leave their other tables unsharded.
It's possible via a workflow to move a single table from one keyspace to another. Allowing just that table to be sharded.

The most important decision, as you've noticed, is picking the shard key. From looking at your schema, it looks like language__id would be the best choice on language_string_search. That would ensure records for the same languages are kept on the same shard.

When vitess sees a query for a sharded table, it looks at the shard key, and then directs the query to the correct shard (there is essentially a mapping between the value and which shard contains the data).

For your application, it would need to know the language__id and append it to all queries for that table. Whether or not that's easy for your application, I don't have enough information to know. For example, github.com could shard based on organization_id. And you can see in each URL that the organization is included github.com/planetscale/.... Since it's always in the URL, their app can easily lookup the id for the planetscale organization and then include it in all queries for a sharded table. That ensures the database is only hitting a single shard to grab the data.

Query performance and getting under 1 second

To be direct with you, 1 second is quite slow for an indexed query. I'd be curious to see the explain for the query that is running.

If that query is already efficient (hitting indexes and no table scans). My guess for why it is slow would be: The database likely does not have enough memory to keep the indexes in RAM and is having to hit disk. If that's the case, I'd expect you'd see quite a performance increase from adding more ram.

I'd recommend looking into that pretty closely before making any changes. I suspect by solving that, you can get quite a lot more headroom for your application using your current database.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In plain and simple terms, how do I make sure my queries will take less than 1 second to return, taking into account "sharding" considerations? #651

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

In plain and simple terms, how do I make sure my queries will take less than 1 second to return, taking into account "sharding" considerations? #651

lancejpollard Jan 9, 2025

Replies: 2 comments

lancejpollard Jan 9, 2025 Author

mscoutermarsh Jan 9, 2025 Maintainer

When to shard and how to prepare for it

Query performance and getting under 1 second

lancejpollard
Jan 9, 2025

lancejpollard
Jan 9, 2025
Author

mscoutermarsh
Jan 9, 2025
Maintainer