Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18274: Failed to restart controller in testing due to closed socket channel [2/2] #18337

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

peterxcli
Copy link
Contributor

Enable controller to restart with the same port in ClusterTest.

Tests

In the newly added test testKRaftIsolatedControllerRestart, if we don't have the corresponding changes in PreboundSocketFactoryManager then the admin will keep retrying connect to the restarted controller, cause the controller's port has changed.

[2024-12-28 06:01:46,521] INFO [broker-0-to-controller-forwarding-channel-manager]: Recorded new KRaft controller, from now on will use node localhost:43227 (id: 3000 rack: null isFenced: false) (kafka.server.NodeToControllerRequestThread:66)
[2024-12-28 06:01:46,546] INFO [RaftManager id=0] Node 3000 disconnected. (org.apache.kafka.clients.NetworkClient:1073)
[2024-12-28 06:01:46,546] WARN [RaftManager id=0] Connection to node 3000 (localhost/127.0.0.1:43227) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient:899)
[2024-12-28 06:01:46,547] INFO [controller-3000-to-controller-registration-channel-manager]: Recorded new KRaft controller, from now on will use node localhost:43227 (id: 3000 rack: null isFenced: false) (kafka.server.NodeToControllerRequestThread:66)
[2024-12-28 06:01:46,548] INFO [NodeToControllerChannelManager id=3000 name=registration] Node 3000 disconnected. (org.apache.kafka.clients.NetworkClient:1073)
...
// repeated same log messages

@github-actions github-actions bot added triage PRs from the community tests Test fixes (including flaky tests) small Small PRs labels Dec 28, 2024
Comment on lines 52 to 65
if (socketChannel != null && socketChannel.isOpen()) {
return socketChannel;
} else if (socketChannel != null) {
// bind the server socket with same port
socketAddress = new InetSocketAddress(socketAddress.getHostString(), socketChannel.socket().getLocalPort());
socketChannel = ServerSocketFactory.INSTANCE.openServerSocket(
listenerName,
socketAddress,
listenBacklogSize,
recvBufferSize);
return socketChannel;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please combine the null checks.
Instead of checking for socketChannel != null twice, we can combine them into a single block that handles the non-null case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thanks!

@chia7712
Copy link
Member

chia7712 commented Jan 8, 2025

@peterxcli please fix the conflicts

@peterxcli peterxcli force-pushed the k18274-controller-use-same-port-after-restarting branch from 89f5780 to 687af35 Compare January 8, 2025 13:09
Copy link
Contributor

@frankvicky frankvicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@peterxcli
Copy link
Contributor Author

peterxcli commented Jan 8, 2025

Failed tests are tracked by:

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterxcli thanks for this patch. two minor comments remaining. Please take a look

if (socketChannel.isOpen()) {
return socketChannel;
}
// bind the server socket with same port
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add comments to explain the use cases of binding the same port?


ControllerServer controller = cluster.controllers().values().iterator().next();
controller.shutdown();
controller.awaitShutdown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add test for broker as well? the fix should works for broker, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking,
The broker won't be fixed in this change, I have also filed another PR to address it, and this is the core change to fix the broker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-approved small Small PRs tests Test fixes (including flaky tests)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants