-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minimum_running_time_in_minutes #4076
Comments
The scale down will kill instances that are detected as orphan, which means instances that are running in aws but not registered in GitHub. The variable runner_boot_time_in_minutes allows you to configure the scale down to ignore instances that are still booting. The default is 5 minutes. In your case it seems you have to set this variable to 6 minutes (or so). |
Is the purpose of The choices of the word "minimum" is funny. Would this not also explain it?
"max" is descriptive. Ok, how can it be said with "minimum"
I have set it to 15 minutes. Actual boot up times are variable and unknown. The current default is really cutting it close, and actually causing a failure. Why not allow instances plenty of time to boot up?
Of course, you may approve or reject either, or both, of the ideas. :-) OK. Tangent: If someone has properly configured their environment, an instance is only launched when it is needed. And then, for ephemeral runners, it is really important that it succeeds, or else the jobs will be missing 1 runner. There is no fall-back or recovery, if you have auto-scaled ephemeral runners. Maybe this is another github issue. What if you want to avoid always-on dedicated servers, and you want to depend on autoscaling. But if one autoscaler runner fails, due to the boot time issue being discussed here, the job will fail, for lack of a runner, and it will not recover. |
I had not realized! A "job retries" feature just got added. Awesome! Still on the topic of the above mentioned variables, another thing that might help is to add a couple sentences in the documentation, unambiguously explaining the difference between Where is If there is an "orphan" runner, is that from either of the above variables? Is it possible to distinguish two types of "orphan" runners, or they are the same in the cloudwatch logs? Could the cloudwatch logs distinguish which timer killed the instance, |
Would be great if you have a bit of time to improve the docs and explain the difference via a PR. You find the variable Orphan runners are tagged by scale down before they will be deleted. This is a change done in one of the latest release. You find logs of runners marked orphan in the scale-down log. |
Hi,
I just encountered a situation where some GHA jobs failed because they lacked runners.
These are ubuntu 22.04 machines, without too much customization. The AMI is pre-installed with standard packages.
After debugging I believe that I discovered the problem.
The full time period of the boot-up takes around 5 minutes. Including "Runner update in progress, do not shutdown runner." However, if the time is 5:15 or something, guess what happens... The scale-down function kills the instance.
I have just set
minimum_running_time_in_minutes: 15
. Theoretically, this will solve it. What are the reasons to not have a longerminimum_running_time_in_minutes
by default? It would avoid this problem. What are the pros and cons? This could happen to others.What is the meaning of the similarly named variable
runner_boot_time_in_minutes
? It says "The minimum time for an EC2 runner to boot and register as a runner." This definition might make sense to someone who already understands very well what the variable does. But for me, if I don't know, that does not explain. Consider this analogy:"The minimum time you may be at the Starbucks. 20 minutes."
Then, what happens if I go into Starbucks, order a coffee, and leave within 5 minutes?
I have violated the "minimum time". What are the consequences?
"The minimum time for an EC2 runner to boot". What if I boot within 5 minutes? It is less than the "minimum" time of 20 minutes. The explanation should say more.. For example: "the minimum time... before the scale-down function will consider this instance for termination." However, if that is the definition, it sounds like
minimum_running_time_in_minutes
, so why are there two identical variables? So, it must be something else.Thanks.
The text was updated successfully, but these errors were encountered: