Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a forceIgnoring mechanism and apply it to the plugins (Spring Clo… #689

Merged
merged 11 commits into from
May 17, 2024

Conversation

gzlicanyi
Copy link
Contributor

@gzlicanyi gzlicanyi commented May 13, 2024

Add a forceIgnoring mechanism and apply it to the plugins (Spring Cloud Gateway, HttpClient, JDK-ForkJoinPool, JDK-Threading, JDK-ThreadPool, Toolkit-Trace, Toolkit-WebFlux)

  • Tests(including UT, IT, E2E) are added to verify the new feature.
  • Update the CHANGES log.

proposal

Motivation

context: apache/skywalking#12161.

Generally speaking, if the source Span is ignored, the downstream Span also needs to be ignored, such as in same-thread operations and cross-process operations. However, for Spans created by cross-thread operations, such as in the scenario of Spring Cloud Gateway, there is currently no mechanism to make them ignored along with the parent Span.

The force-ignoring mechanism is a supplement to the ignoring mechanism for cross-thread operations.

Architecture Graph

image

Proposed Changes

The reason why you can only modify within the ContextManager#createLocalSpan method, and not within the ContextManager#continued method, is because the latter cannot change an already created Context and Span.

I haven't found a way to support this feature without disrupting the existing API, because the tracing context is created before the snapshot is continued.

Currently modified plugins are:
httpasyncclient-4.x
httpclient-5.x
jdk-forkjoinpool
jdk-threading
jdk-threadpool
spring-cloud-gateway-2.0.x
spring-cloud-gateway-2.1.x
spring-cloud-gateway-3.x
spring-cloud-gateway-4.x
toolkit-trace-activation
toolkit-webflux-activation

Imported Dependencies libs and their licenses.

No new dependency.

Compatibility

In scenarios where it is necessary to keep the span node of the entire trace intact, such as in Spring Cloud Gateway, the call to the ContextManager#continued method needs to be replaced with the ContextManager#createLocalSpan method that includes a snapshot.

General usage docs

…ud Gateway, HttpClient, JDK-ForkJoinPool, JDK-Threading, JDK-ThreadPool, Toolkit-Trace, Toolkit-WebFlux)
@wu-sheng
Copy link
Member

Please write a proposal about what is a force-ignoring mechanism. The discussion we did was only about the context.

@gzlicanyi
Copy link
Contributor Author

Please write a proposal about what is a force-ignoring mechanism. The discussion we did was only about the context.

Do you have a proposal template?

@wu-sheng
Copy link
Member

For an official proposal, you could refer to SWIP(https://skywalking.apache.org/docs/main/next/en/swip/readme/), which is mandatory for OAP, but not for the agent.
A key part of a proposal is about why you want to change, how you design looks like, what are the impacts. Then we could discuss for more details.

A reminder for you, the way you changes, is making a significant impact for existing APIs, this may not be a good way.

@gzlicanyi
Copy link
Contributor Author

Please write a proposal about what is a force-ignoring mechanism. The discussion we did was only about the context.

Added.

@wu-sheng
Copy link
Member

Add what?

@gzlicanyi
Copy link
Contributor Author

Add what?

here: #689 (comment)

@wu-sheng
Copy link
Member

I think we need to separate the context. Cross thread is different from cross process.
Cross process is defined here, https://skywalking.apache.org/docs/main/latest/en/api/x-process-propagation-headers-v3/. This can't be changed as it crosses all language agents.

About crosee thread, the status should only documented in the snapshot. But I am not clear about how you changed a sampled tracing context into a ignoring context?

@wu-sheng
Copy link
Member

I am feeling you are doing a hijack only, which should not be the official way.

@gzlicanyi
Copy link
Contributor Author

I think we need to separate the context. Cross thread is different from cross process. Cross process is defined here, https://skywalking.apache.org/docs/main/latest/en/api/x-process-propagation-headers-v3/. This can't be changed as it crosses all language agents.

About crosee thread, the status should only documented in the snapshot. But I am not clear about how you changed a sampled tracing context into a ignoring context?

This question can be divided into two parts:

  1. In the scenario of cross-thread, is it necessary to continue the upstream context?
  2. If the first point is necessary, how can we modify it to better align with the official way?

@wu-sheng
Copy link
Member

I don't have direct answer, if I had, I already told you in the last discussion.

The reason it doesn't exist is, the tracing context is created before the snapshot is continued.

You proposal should cover about how.

The cross process is more complex. And sampling is service(process) oriented. There is a flag in the protocol about sampling, but you have to evaluate all agent codes(all languages), about whether there will be an impact.

@wu-sheng
Copy link
Member

You are adding a mechanism for local span creation. This should not happen, if there is a context switching out, it must be controlled by ContextManager.

@wu-sheng
Copy link
Member

When this is about ContextManager, it should not affect TracingContext#CreateLocal span. Only snapshot should be changed, and ContextManager asked TracingContext switching to IgnoringTracingCntext with the same stack. Also, the plugins should not be changed, and all span and context manager APIs should not be changed.

@gzlicanyi
Copy link
Contributor Author

When this is about ContextManager, it should not affect TracingContext#CreateLocal span. Only snapshot should be changed, and ContextManager asked TracingContext switching to IgnoringTracingCntext with the same stack. Also, the plugins should not be changed, and all span and context manager APIs should not be changed.

If the TracingContext switches to IgnoringTracingContext in the continued method, the span reference created by the ContextManager#createLocalSpan method cannot be changed, and the plugin still uses the span before the switch.
Additionally, the already created but useless TracingContext will increment the sampling count of the operationName by one.

@wu-sheng
Copy link
Member

I think you got it wrong. It should be in ContextManager#continued method, which controls the context.

@wu-sheng
Copy link
Member

This is a force sampling(unsampling), we don't need to worry about sampling counter.

@gzlicanyi
Copy link
Contributor Author

I think you got it wrong. It should be in ContextManager#continued method, which controls the context.

Understood. How can the span used by the plugin be changed to a NoopSpan?

@gzlicanyi
Copy link
Contributor Author

gzlicanyi commented May 13, 2024

This is a force sampling(unsampling), we don't need to worry about sampling counter.

ForceSampling might not necessarily be true. I feel that it could still impact the count of the SamplingService.

@wu-sheng
Copy link
Member

This is a force sampling(unsampling), we don't need to worry about sampling counter.

ForceSampling might not necessarily be true. I feel that it could still impact the count of the SamplingService.

It just counts one more, so what? Sampling is never to be accurate.

@wu-sheng
Copy link
Member

wu-sheng commented May 13, 2024

I think you got it wrong. It should be in ContextManager#continued method, which controls the context.

Understood. How can the span used by the plugin be changed to a NoopSpan?

Consider creating AbstractTracerContext#forceIgnoring. TracingContext#forceIgnoring could return a IgnoredTracerContext with the same stack depth, to make sure no tracing context stack leak. IgnoredTracerContext just simply returns this as no change is required.

@gzlicanyi
Copy link
Contributor Author

I think you got it wrong. It should be in ContextManager#continued method, which controls the context.

Understood. How can the span used by the plugin be changed to a NoopSpan?

Consider creating AbstractTracerContext#forceIgnoring. TracingContext#forceIgnoring could return a IgnoredTracerContext with the same stack depth, to make sure no tracing context stack leak. IgnoredTracerContext just simply returns this as no change is required.

IgnoredTracerContext retains the activeSpanStack of TracingContext, do you think there's no problem with it?

@gzlicanyi
Copy link
Contributor Author

I think you got it wrong. It should be in ContextManager#continued method, which controls the context.

Understood. How can the span used by the plugin be changed to a NoopSpan?

Consider creating AbstractTracerContext#forceIgnoring. TracingContext#forceIgnoring could return a IgnoredTracerContext with the same stack depth, to make sure no tracing context stack leak. IgnoredTracerContext just simply returns this as no change is required.

IgnoredTracerContext retains the activeSpanStack of TracingContext, do you think there's no problem with it?

If there is no problem, indeed the 'continued' method can complete the transition of the context. However, for the leftover spans, I am not entirely sure if they might cause other issues.

@wu-sheng
Copy link
Member

Which problem do you mean No matter what spans they are. As they are not reported, they are fine to be GCed.

@gzlicanyi
Copy link
Contributor Author

Which problem do you mean No matter what spans they are. As they are not reported, they are fine to be GCed.

I have made alterations using a new way and submitted another PR. Can you check if there are any problems?

@wu-sheng
Copy link
Member

Please update here.

@@ -34,14 +34,21 @@ public class IgnoredTracerContext implements AbstractTracerContext {
private static final NoopSpan NOOP_SPAN = new NoopSpan();
private static final String IGNORE_TRACE = "Ignored_Trace";

private LinkedList<AbstractSpan> activeSpanStack;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the old span is still in use, if it's discarded, span#prepareForAsync will throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not friendly for GC. We could add ingored flag to the span and skip this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option maybe to create a ForceIgnoreTracerContext to deal with created span cases.
And this only happens in very rare case, so we should process that only when we checked all active spans, which have one running in async mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer replacing the span's context with IgnoredTracerContext for old spans with old context references, or would you rather ignore any operations involving context within the span?

Copy link
Member

@wu-sheng wu-sheng May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't dig in so deeply, as I am feeling this change is not easy.

I just provided two possibilities, no preference. One key part is, what you are adding is not happening as always, you should not make the agent kernel costs too much.

I don't know whether this is possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was why I said, this could be done easily if this only happens in your private fork. But in general, I don't have a good idea could be done easily.

But this has to be very good, as your changes are going to affect every user for sure.

}

public IgnoredTracerContext(LinkedList<AbstractSpan> activeSpanStack) {
this.activeSpanStack = activeSpanStack;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the depth to be initialized based on stack depth, rather than holding all

Comment on lines 44 to 47
this.activeSpanStack = new LinkedList<>();
this.correlationContext = new CorrelationContext();
this.extensionContext = new ExtensionContext();
this.profileStatusContext = ProfileStatusContext.createWithNone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need to call new constructor.

Comment on lines 87 to 94
stackDepth++;
activeSpanStack.addLast(NOOP_SPAN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All logic here should not be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need depth, no further.

@wu-sheng wu-sheng added this to the 9.3.0 milestone May 15, 2024
@wu-sheng
Copy link
Member

It seems the UTs failing. Please recheck and fix.

@wu-sheng
Copy link
Member

Please follow the comments to polish the codes. Generally, this PR is good.

polish the codes.

Co-authored-by: 吴晟 Wu Sheng <[email protected]>
@wu-sheng wu-sheng merged commit b608d74 into apache:main May 17, 2024
191 checks passed
@wu-sheng wu-sheng added the enhancement New feature or request label May 17, 2024
@wu-sheng
Copy link
Member

Thanks for adding this.

@gzlicanyi gzlicanyi deleted the forceIgnoring branch May 17, 2024 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants