Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with large harvest source with high data volatility #5022

Open
FuhuXia opened this issue Dec 26, 2024 · 1 comment
Open

Dealing with large harvest source with high data volatility #5022

FuhuXia opened this issue Dec 26, 2024 · 1 comment
Labels
bug Software defect or bug

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Dec 26, 2024

Catalog harvester gets overwhelmed by larger harvest jobs such as NOAA ioos, one harvest job might have to harvest 30k records. This kind of job might run for days and create multiple issues including catalog downtime as observer here. We need to find way to cope with it to minimize its impact to the catalog app.

We have set its schedule to monthly. But ideally we want to run at the weekend/holidays. I would suggest we set it to manual and set a O&M schedule to run every 4 weeks at Friday afternoons.

On the new harvester 2.0, hopefully we don't have to deal with the same issue. If we do, we might have to build better/finer control on how to schedule particular harvest sources.

@FuhuXia FuhuXia added the bug Software defect or bug label Dec 26, 2024
@nickumia
Copy link

Just a random question from an outside perspective (mostly related to making the 2.0 harvester better), is the case of iterative harvesting handled any better? Like if a harvest is 100k and it fails at 10k, is there a way to keep those 10k and "continue where it left off"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug
Projects
Status: 🧊 Icebox
Development

No branches or pull requests

2 participants