-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assisted installer service getting error - chronyc: error while loading shared libraries: libnettle.so.8 #385
Comments
In the mean time, I've been able to work around the error by explicitly setting For example, here is an okd-configmap.yml that works for me today;
|
Hi,
the commit that you mentioned actually fixes the problem we introduced in
2.4.0 where this mount was deleted by mistake.
from
https://github.com/openshift/assisted-installer-agent/blob/v2.3.1/src/commands/actions/ntp_sync_cmd.go#L44
you can see that we have this mount before and it was removed by mistake in
2.4.0 and returned in 2.4.1.
We mount chronyc from this commit
openshift/assisted-service@7ec8448
that took place in Nov'21.
…On Sun, Jun 26, 2022 at 7:23 AM pdfruth ***@***.***> wrote:
In the mean time, I've been able to work around the error by explicitly
setting AGENT_DOCKER_IMAGE:
quay.io/edge-infrastructure/assisted-installer-agent:v2.4.1 when
customizing the sample *okd-config.yml* file here
<https://github.com/openshift/assisted-service/blob/master/deploy/podman/okd-configmap.yml>
For example, here is an *okd-configmap.yml* that works for me today;
apiVersion: v1
kind: ConfigMap
metadata:
name: config
data:
ASSISTED_SERVICE_HOST: 192.168.10.2:8090
ASSISTED_SERVICE_SCHEME: http
AUTH_TYPE: none
DB_HOST: 127.0.0.1
DB_NAME: installer
DB_PASS: admin
DB_PORT: "5432"
DB_USER: admin
DEPLOY_TARGET: onprem
DISK_ENCRYPTION_SUPPORT: "false"
DUMMY_IGNITION: "false"
ENABLE_SINGLE_NODE_DNSMASQ: "false"
HW_VALIDATOR_REQUIREMENTS: '[{"version":"default","master":{"cpu_cores":4,"ram_mib":16384,"disk_size_gb":100,"installation_disk_speed_threshold_ms":10,"network_latency_threshold_ms":100,"packet_loss_percentage":0},"worker":{"cpu_cores":2,"ram_mib":8192,"disk_size_gb":100,"installation_disk_speed_threshold_ms":10,"network_latency_threshold_ms":1000,"packet_loss_percentage":10},"sno":{"cpu_cores":8,"ram_mib":16384,"disk_size_gb":100,"installation_disk_speed_threshold_ms":10}}]'
IMAGE_SERVICE_BASE_URL: http://192.168.10.2:8888
IPV6_SUPPORT: "true"
LISTEN_PORT: "8888"
NTP_DEFAULT_SERVER: ""
POSTGRESQL_DATABASE: installer
POSTGRESQL_PASSWORD: admin
POSTGRESQL_USER: admin
PUBLIC_CONTAINER_REGISTRIES: 'quay.io'
SERVICE_BASE_URL: http://192.168.10.2:8090
STORAGE: filesystem
OS_IMAGES: '[{"openshift_version":"4.10","cpu_architecture":"x86_64","url":"https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/35.20220327.3.0/x86_64/fedora-coreos-35.20220327.3.0-live.x86_64.iso","rootfs_url":"https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/35.20220327.3.0/x86_64/fedora-coreos-35.20220327.3.0-live-rootfs.x86_64.img","version":"35.20220327.3.0"}]'
RELEASE_IMAGES: '[{"openshift_version":"4.10","cpu_architecture":"x86_64","url":"quay.io/openshift/okd:4.10.0-0.okd-2022-06-10-131327","version":"4.10.0-0.okd-2022-06-10-131327","default":true}]'
OKD_RPMS_IMAGE: quay.io/vrutkovs/okd-rpms:4.10
AGENT_DOCKER_IMAGE: quay.io/edge-infrastructure/assisted-installer-agent:v2.4.1
—
Reply to this email directly, view it on GitHub
<#385 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADSNXFS3LHS65JHJQ3HFIR3VQ7LLVANCNFSM5Z3CTPFQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Igal Tsoiref
He / His / Him
Red Hat Israel <https://www.redhat.com/>
34 Jerusalem rd. Ra'anana, 43501
***@***.*** ***@***.***>
@redhat <https://twitter.com/redhat> Red Hat
<https://www.linkedin.com/company/red-hat> Red Hat
<https://www.facebook.com/RedHatInc>
<https://red.ht/sig>
|
It's not so simple as chronyc inside the agent container is communicating through a UDS socket mount with the host's operating system's non-containerized chronyd daemon, and so we're just moving the problem from "Host<->container shared library incompatibilities" to "Chronyc<->Chronyd socket API across versions incompatibility". Sadly the former affects OKD users, the latter affects (or at-least used to affect, maybe with recent RHCOS versions it has been solved) upstream OCP Assisted Installer agent users. I think there is no "right" answer between those two options, they're both bound to break (and have in the past), we've just chosen to solve the latter due to a user complaint a while ago, but we've done so in a problematic manner (mount), creating this issue for OKD users. But we can do something else - ideally the solution here would be to disable the host's chronyd systemd service and have an equivalent, containerized chronyd service, but that's a big change. We should consider this probably |
Temporarily, as a workaround, we can solve it by not doing the bind when running on top of FCOS |
Created https://issues.redhat.com/browse/MGMT-10937 to track the workaround / solution |
cc @vrutkovs |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/lifecycle frozen |
I'm using the self-hosted assisted installer service to install Single Node OKD.
The assisted installer service is running in podman containers, as documented here
This method of doing a single node install of OKD used to work. But, has started to fail recently (within the last 30 days or so).
The host registers with the installer service, but gets stuck on an NTP synchronization failure as seen in the attached screen-shot
Looking into the pod logs of the assisted installer service, I see this message;
level=error msg="Received step reply <ntp-synchronizer-392f0f02> from infra-env <ff4ce4b9-a3cd-4c50-b258-24cfbba8d1e3> host <68b15b04-5cb1-429f-9778-3c8727d0235d> exit-code <-1> stderr <chronyc exited with non-zero exit code 127: \nchronyc: error while loading shared libraries: libnettle.so.8: cannot open shared object file: No such file or directory\n> stdout <>" func=github.com/openshift/assisted-service/internal/bminventory.logReplyReceived file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2992" go-id=9762 host_id=68b15b04-5cb1-429f-9778-3c8727d0235d infra_env_id=ff4ce4b9-a3cd-4c50-b258-24cfbba8d1e3 pkg=Inventory request_id=6a4edac8-f290-4cb2-813e-f6a67ef9c50b
The relevant part of the message being - chronyc: error while loading shared libraries: libnettle.so.8: cannot open shared object file: No such file or directory
I believe the root cause for this is due to the changes introduced by this commit
The code change introduced by that commit mounts the chronyc command binary of the underlying OS (on which the assisted-installer-agent container runs on) into the /usr/bin directory inside the container. In my particular instance that host OS is Fedora CoreOS 35.20220327.3.0. The problem, in this case, is that the chronyc command is a dynamically linked ELF that depends on the libnettle.so.8 shared library... which isn't present in the container. The container does contain libnettle.so.6 tho.
Anyway, IMO this [bind-mounting the chronyc command from the underlying OS] is a containers anti-pattern.
Wouldn't it be a better approach to use the chronyc installed by the
dnf install chrony
in the docker file here, used to build the assisted installer agent container image.@tsorya could you have a look at the change introduced in that commit. This introduces a significant pre-req of same shared library (that which the chronyc binary is dynamically linked) also be present on the assisted installer agent container image. Is there a different approach?
The text was updated successfully, but these errors were encountered: