In plain and simple terms, how do I make sure my queries will take less than 1 second to return, taking into account "sharding" considerations? #651
Replies: 2 comments
-
😨 |
Beta Was this translation helpful? Give feedback.
-
Hello @lancejpollard! Great question & thank you for including the example query. When to shard and how to prepare for itGenerally, you won't benefit from sharding until the database is in the range of 250gb-1tb of data. A single (non-sharded) PlanetScale database can scale much higher than that. But, around that data size is when you really start to benefit from the advantages of sharding (such as faster backup & restore, speed of schema changes and resiliency). Even without sharding, there are many benefits to using vitess. Such as how it can easily handle huge amounts of connections (there is a built in proxy layer known as "vtgate"). Preparing for sharding A few things to know
The most important decision, as you've noticed, is picking the shard key. From looking at your schema, it looks like When vitess sees a query for a sharded table, it looks at the shard key, and then directs the query to the correct shard (there is essentially a mapping between the value and which shard contains the data). For your application, it would need to know the Query performance and getting under 1 secondTo be direct with you, 1 second is quite slow for an indexed query. I'd be curious to see the If that query is already efficient (hitting indexes and no table scans). My guess for why it is slow would be: The database likely does not have enough memory to keep the indexes in RAM and is having to hit disk. If that's the case, I'd expect you'd see quite a performance increase from adding more ram. I'd recommend looking into that pretty closely before making any changes. I suspect by solving that, you can get quite a lot more headroom for your application using your current database. |
Beta Was this translation helpful? Give feedback.
-
A year ago I decided on Supabase because it is PostgreSQL (which I'm familiar with, and has amazing fuzzy text search support). Yesterday I realized that is not going to scale beyond like 500 concurrent users hammering the site. Now I'm debating between Neon and PlanetScale. Neon seems nice because it is also PostgreSQL, so I won't have to fully rewrite my app in MySQL, though there is still some porting involved. But it too is limited in scale and performance, and I am unsure it will consistently result in sub 1-second queries.
So here I am looking further into PlanetScale, but my main concern now (other than the fact it has no fuzzy text search, or foreign key constraints), is the sharding considerations you have to take into account, so that your queries are performant (and get to that sub 1-second mark). The docs have too much text and not enough code and bullet points summarizing things like I am 5 years old. But I love the monospaced font!!!
Can you explain in simple (ideally Node.js) code snippets, and simple bullet points, what I need to do to (a) structure my database tables, and (b) to manage/control the sharding process and keys? I don't get it yet, and it's my main hesitation to jumping in all the way.
From my conversations with ChatGPT about PlanetScale, from what I remember, it happens automatically somehow, yet you need to be careful not to have your data be spread across different shards, because (a) read queries across shards are SLOW (2-10x slower), and (b) there are no multi-step transactions supported for queries/inserts/updates/deletes across multiple shards.
Given that, how do I manage my data? For example, I am working with word/language data, and I have about a dozen tables supporting this JSON structure (much of its structure was left out of this photo):
Using that as a rough base/example, what do I need to learn/read to figure out how to reorganize my SQL tables if I were to bite the bullet and migrate from Supabase to PlanetScale?
One query that Supabase is struggling with is fairly simple:
This table has about 200 million records so far.
I am fetching these "search records", then I use
string__id
to fetch the string records. And from the string records too, I fetch the transcription records by string__id, and pronunciation records, each separate queries, etc.. So not sure how to handle this type of thing taking into account sharding, as a basic example.Beta Was this translation helpful? Give feedback.
All reactions