Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO: Add 50-fs-inotify.conf to increase limits #18832

Merged
merged 2 commits into from
May 8, 2024

Conversation

nirs
Copy link
Contributor

@nirs nirs commented May 7, 2024

This avoids random failures starting kubevirt VMs like:

{"component":"virt-handler","level":"error","msg":"Error starting vhost-net device
plugin","pos":"device_controller.go:70","reason":"failed to creating a fsnotify
watcher: too many open files","timestamp":"2024-05-06T12:59:30.009620Z"}

Fixes #18831

@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 7, 2024
@k8s-ci-robot k8s-ci-robot requested review from afbjorklund and prezha May 7, 2024 20:31
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 7, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @nirs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 7, 2024
@minikube-bot
Copy link
Collaborator

Can one of the admins verify this patch?

This avoids random failures starting kubevirt VMs like:

    failed to creating a fsnotify watcher: too many open files
@nirs nirs force-pushed the iso-inotify-limits branch from 15df03d to a8177b8 Compare May 7, 2024 20:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 7, 2024
@spowelljr
Copy link
Member

ok-to-build-iso

@minikube-bot
Copy link
Collaborator

Hi @nirs, we have updated your PR with the reference to newly built ISO. Pull the changes locally if you want to test with them or update your PR further.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 8, 2024
@nirs
Copy link
Contributor Author

nirs commented May 8, 2024

If someone want to test this use:
https://storage.googleapis.com/minikube-builds/iso/18832/minikube-v1.33.0-1715127532-18832-amd64.iso

(I wonder by the bot adding the iso info cannot add a useful link like this)

@nirs
Copy link
Contributor Author

nirs commented May 8, 2024

Tested starting a new cluster with the iso:

$ minikube start --driver kvm2 --iso-url https://storage.googleapis.com/minikube-builds/iso/18832/minikube-v1.33.0-1715127532-18832-amd64.iso
😄  minikube v1.33.0 on Fedora 39
    ▪ MINIKUBE_HOME=/data/tmp
✨  Using the kvm2 driver based on user configuration
💿  Downloading VM boot image ...
    > minikube-v1.33.0-1715127532...:  314.16 MiB / 314.16 MiB  100.00% 16.78 M
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🔥  Creating kvm2 VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.30.0 on Docker 26.0.2 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: default-storageclass, storage-provisioner
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ minikube ssh
..

$ cat /etc/sysctl.d/50-fs-inotify.conf 
# Avoid failures with kubevirt vms
# https://github.com/kubernetes/minikube/issues/18831
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 65536

$ sysctl fs.inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 65536

@spowelljr
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2024
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2024
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 18832) |
+----------------+----------+---------------------+
| minikube start | 53.0s    | 52.0s               |
| enable ingress | 27.5s    | 26.8s               |
+----------------+----------+---------------------+

Times for minikube start: 53.5s 52.4s 51.8s 54.0s 53.5s
Times for minikube (PR 18832) start: 52.4s 55.3s 51.1s 50.4s 50.7s

Times for minikube ingress: 27.1s 25.0s 28.0s 28.1s 29.1s
Times for minikube (PR 18832) ingress: 28.5s 25.1s 24.1s 28.1s 28.0s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 18832) |
+----------------+----------+---------------------+
| minikube start | 23.9s    | 23.2s               |
| enable ingress | 21.5s    | 21.9s               |
+----------------+----------+---------------------+

Times for minikube start: 24.9s 23.2s 24.4s 24.5s 22.4s
Times for minikube (PR 18832) start: 24.4s 23.9s 24.7s 21.7s 21.2s

Times for minikube ingress: 21.8s 21.8s 21.3s 21.3s 21.3s
Times for minikube (PR 18832) ingress: 21.3s 21.3s 22.3s 22.8s 21.8s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 18832) |
+----------------+----------+---------------------+
| minikube start | 22.3s    | 21.1s               |
| enable ingress | 29.6s    | 30.0s               |
+----------------+----------+---------------------+

Times for minikube (PR 18832) start: 22.2s 20.4s 23.0s 20.6s 19.1s
Times for minikube start: 22.0s 20.4s 22.9s 22.8s 23.3s

Times for minikube (PR 18832) ingress: 32.3s 32.3s 32.3s 21.3s 31.8s
Times for minikube ingress: 31.8s 31.3s 32.3s 31.8s 20.8s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux_crio TestFunctional/parallel/ImageCommands/ImageRemove (gopogh) 2.37 (chart)
Docker_Linux_crio_arm64 TestMultiControlPlane/serial/RestartCluster (gopogh) 5.95 (chart)
Hyperkit_macOS TestErrorSpam/setup (gopogh) 20.16 (chart)
Hyperkit_macOS TestFunctional/parallel/CpCmd (gopogh) 23.44 (chart)
Hyperkit_macOS TestFunctional/serial/LogsCmd (gopogh) 23.44 (chart)
Hyperkit_macOS TestFunctional/serial/LogsFileCmd (gopogh) 23.44 (chart)
Hyperkit_macOS TestForceSystemdFlag (gopogh) 23.81 (chart)
Hyperkit_macOS TestStoppedBinaryUpgrade/MinikubeLogs (gopogh) 24.00 (chart)
Hyperkit_macOS TestJSONOutput/start/parallel/DistinctCurrentSteps (gopogh) 24.03 (chart)
Hyperkit_macOS TestJSONOutput/start/parallel/IncreasingCurrentSteps (gopogh) 24.03 (chart)
Hyperkit_macOS TestFunctional/parallel/NonActiveRuntimeDisabled (gopogh) 24.22 (chart)
Hyperkit_macOS TestRunningBinaryUpgrade (gopogh) 24.60 (chart)
Hyperkit_macOS TestStoppedBinaryUpgrade/Upgrade (gopogh) 24.80 (chart)
Hyperkit_macOS TestImageBuild/serial/Setup (gopogh) 24.81 (chart)
Hyperkit_macOS TestJSONOutput/pause/Command (gopogh) 24.81 (chart)
Hyperkit_macOS TestJSONOutput/start/Command (gopogh) 24.81 (chart)
Hyperkit_macOS TestJSONOutput/unpause/Command (gopogh) 24.81 (chart)
Hyperkit_macOS TestMultiNode/serial/MultiNodeLabels (gopogh) 24.81 (chart)
Hyperkit_macOS TestMultiNode/serial/ProfileList (gopogh) 24.81 (chart)
Hyperkit_macOS TestSkaffold (gopogh) 24.81 (chart)
Hyperkit_macOS TestFunctional/parallel/SSHCmd (gopogh) 25.00 (chart)
Hyperkit_macOS TestFunctional/parallel/TunnelCmd/serial/AccessDirect (gopogh) 25.40 (chart)
Hyperkit_macOS TestFunctional/parallel/TunnelCmd/serial/AccessThroughDNS (gopogh) 25.40 (chart)
Hyperkit_macOS TestFunctional/parallel/TunnelCmd/serial/DNSResolutionByDig (gopogh) 25.40 (chart)
Hyperkit_macOS TestFunctional/parallel/TunnelCmd/serial/RunSecondTunnel (gopogh) 25.40 (chart)
Hyperkit_macOS TestFunctional/parallel/TunnelCmd/serial/WaitService/Setup (gopogh) 25.40 (chart)
Hyperkit_macOS TestNoKubernetes/serial/Start (gopogh) 25.41 (chart)
Hyperkit_macOS TestFunctional/serial/KubeContext (gopogh) 25.58 (chart)
Hyperkit_macOS TestFunctional/serial/StartWithProxy (gopogh) 25.58 (chart)
Hyperkit_macOS TestOffline (gopogh) 25.58 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

Copy link
Member

@spowelljr spowelljr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, nirs, spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@spowelljr spowelljr merged commit c49fab2 into kubernetes:master May 8, 2024
34 of 47 checks passed
nirs added a commit to nirs/ramen that referenced this pull request Sep 7, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 8, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 9, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 9, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 9, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 10, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 10, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 11, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 13, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 13, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 13, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 13, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
@nirs nirs deleted the iso-inotify-limits branch September 15, 2024 18:38
nirs added a commit to nirs/ramen that referenced this pull request Sep 17, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 17, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this pull request Sep 18, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to RamenDR/ramen that referenced this pull request Sep 18, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
asn1809 pushed a commit to asn1809/ramen that referenced this pull request Oct 8, 2024
Currently we have:

    $ sysctl fs.inotify
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 128
    fs.inotify.max_user_watches = 45827

And we see errors like this on managed clusters even with trivial
busybox workloads:

    failed to create fsnotify watcher: too many open files

We use OpenShift worker defaults, already used for minikube[1].

[1] kubernetes/minikube#18832

Signed-off-by: Nir Soffer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fs.inotify limits are too low, can cause failures when starting kubevirt vm
6 participants