Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATS 10 vs 9 perf top in production #11749

Open
bryancall opened this issue Aug 29, 2024 · 4 comments
Open

ATS 10 vs 9 perf top in production #11749

bryancall opened this issue Aug 29, 2024 · 4 comments

Comments

@bryancall
Copy link
Contributor

bryancall commented Aug 29, 2024

These are numbers from 2 Ireland hosts, not the busiest time of the day. They are both getting around 850-870 rps. ATS 10 is doing 69 rps per cpu percentage and ATS 9 is doing 130 rps per cpu percentage.

ATS 10

  13.91%  traffic_server                                           [.] EThread::execute_regular
   3.63%  [kernel]                                                 [k] sort
   2.33%  [kernel]                                                 [k] cmppid
   1.98%  [kernel]                                                 [k] __task_pid_nr_ns
   1.70%  [kernel]                                                 [k] __x86_indirect_thunk_r12
   1.54%  [kernel]                                                 [k] u32_swap
   1.53%  libcrypto.so.1.1                                         [.] __ecp_nistz256_mul_montx
   1.07%  [kernel]                                                 [k] css_task_iter_next
   1.04%  libcrypto.so.1.1                                         [.] __ecp_nistz256_sqr_montx
   1.00%  [kernel]                                                 [k] __x86_indirect_thunk_rax
   0.93%  libjose.so.0.0.0                                         [.] jose_b64_dec_buf
   0.76%  libwaflz.so                                              [.] ns_waflz::ac::find
   0.62%  libc-2.28.so                                             [.] tolower
   0.54%  libjemalloc.so.2                                         [.] free
   0.53%  [kernel]                                                 [k] native_queued_spin_lock_slowpath
   0.52%  libc-2.28.so                                             [.] __memmove_avx_unaligned_erms

ATS 9

   2.37%  libcrypto.so.1.1                                         [.] __ecp_nistz256_mul_montx
   1.62%  libcrypto.so.1.1                                         [.] __ecp_nistz256_sqr_montx
   1.39%  libjose.so.0.0.0                                         [.] jose_b64_dec_buf
   1.20%  libwaflz.so                                              [.] ns_waflz::ac::find
   0.95%  libc-2.28.so                                             [.] tolower
   0.93%  libjemalloc.so.2                                         [.] free
   0.91%  libpcre.so.1.2.13                                        [.] pcre_exec
   0.87%  libpcre2-8.so.0.7.1                                      [.] pcre2_match_8
   0.84%  libcrypto.so.1.1                                         [.] bn_mul_mont
   0.82%  libc-2.28.so                                             [.] __memmove_avx_unaligned_erms
   0.65%  libcrypto.so.1.1                                         [.] bn_sqrx8x_internal
   0.62%  [kernel]                                                 [k] cpuidle_enter_state
   0.61%  libpcre.so.1.2.13                                        [.] match
   0.57%  [kernel]                                                 [k] menu_select
   0.53%  libjemalloc.so.2                                         [.] malloc
   0.52%  libc-2.28.so                                             [.] __strncasecmp_l_avx
@bryancall
Copy link
Contributor Author

bryancall commented Aug 29, 2024

Here is a perf top from 2 hosts in Denver that are getting ~1600 rps. They look much more similar. ATS 10 is doing 107 rps per cpu percentage and ATS 9 is doing 101 rps per cpu percentage.

ATS 10

   2.01%  libcrypto.so.1.1                                         [.] bn_mul_mont
   1.74%  libcrypto.so.1.1                                         [.] rsaz_1024_sqr_avx2
   1.20%  [kernel]                                                 [k] do_syscall_64
   1.04%  libjemalloc.so.2                                         [.] free
   1.04%  libwaflz.so                                              [.] ns_waflz::ac::find
   0.96%  libc-2.28.so                                             [.] __memmove_avx_unaligned_erms
   0.81%  traffic_server                                           [.] huffman_decode
   0.69%  libc-2.28.so                                             [.] tolower
   0.54%  libjemalloc.so.2                                         [.] malloc
   0.52%  libpcre2-8.so.0.11.2                                     [.] pcre2_match_8
   0.48%  libcrypto.so.1.1                                         [.] rsaz_1024_mul_avx2
   0.46%  libcrypto.so.1.1                                         [.] sha256_block_data_order_avx2
   0.46%  libcrypto.so.1.1                                         [.] bn_sqr8x_internal
   0.45%  traffic_server                                           [.] mime_hdr_field_find
   0.44%  libwaflz.so                                              [.] ns_waflz::waf::process_rule_part

ATS 9

   1.68%  libcrypto.so.1.1                                         [.] bn_mul_mont
   1.60%  libpcre.so.1.2.13                                        [.] pcre_exec
   1.44%  libcrypto.so.1.1                                         [.] rsaz_1024_sqr_avx2
   1.01%  [kernel]                                                 [k] do_syscall_64
   0.91%  libc-2.28.so                                             [.] __memmove_avx_unaligned_erms
   0.91%  libjemalloc.so.2                                         [.] free
   0.88%  libwaflz.so                                              [.] ns_waflz::ac::find
   0.82%  libpcre.so.1.2.13                                        [.] match
   0.78%  libpcre2-8.so.0.7.1                                      [.] pcre2_match_8
   0.74%  traffic_server                                           [.] huffman_decode
   0.65%  libc-2.28.so                                             [.] __strncasecmp_l_avx
   0.58%  libc-2.28.so                                             [.] tolower
   0.49%  traffic_server                                           [.] _mime_hdr_field_list_search_by_string
   0.44%  libjemalloc.so.2                                         [.] malloc
   0.42%  libcrypto.so.1.1                                         [.] sha256_block_data_order_avx2

@ywkaras
Copy link
Contributor

ywkaras commented Sep 3, 2024

Are all the (likely inlined) calls to ink_get_hrtime() in EThread::execute_regular() in Yahoo ATS 9?

It seems iffy that clock_gettime() is avoiding the kernel call/thunk in RedHat 8. Looks like the COARSE clock is needed ( https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/8/html/optimizing_rhel_8_for_real_time_for_low_latency_operation/assembly_application-timestamping_optimizing-rhel8-for-real-time-for-low-latency-operation#assembly_application-timestamping_optimizing-RHEL8-for-real-time-for-low-latency-operation ). But I don't know why this would affect one YCPI proxy and not another.

Looks like you can select the (potentially not in userspace) clock type with a config, proxy.config.system_clock . (Definitely a contender for most dubiously useful ATS config variable.) So maybe double check it's consistent between the two proxies running ATS 10?

@ywkaras
Copy link
Contributor

ywkaras commented Sep 3, 2024

Recommend not assigning this to me. Unless you want to be lower priority than binge-watching Fear The Walking Dead.

@cmcfarlen
Copy link
Contributor

Here is a recent perf top from Apple:

   8.66%  libcrypto.so.1.0.0                               [.] _aesni_ctr32_ghash_6x
   3.80%  [kernel]                                         [k] copy_user_enhanced_fast_string
   1.85%  [kernel]                                         [k] nft_do_chain
   1.65%  libc.so.6                                        [.] __memmove_avx_unaligned_erms
   1.25%  libcrypto.so.1.0.0                               [.] rsaz_1024_sqr_avx2
   0.98%  libcrypto.so.1.0.0                               [.] fe_mul_impl.llvm.7949921813875997555
   0.90%  [kernel]                                         [k] skb_release_data
   0.88%  libcrypto.so.1.0.0                               [.] sha256_block_data_order_avx
   0.81%  traffic_server                                   [.] _ZN8PluginVC18process_write_sideEv
   0.79%  [kernel]                                         [k] pskb_expand_head
   0.76%  [kernel]                                         [k] _raw_spin_lock
   0.74%  [kernel]                                         [k] __slab_free
   0.67%  [kernel]                                         [k] tcp_sendmsg_locked
   0.60%  [kernel]                                         [k] tcp_clean_rtx_queue.constprop.0
   0.60%  traffic_server                                   [.] _ZN20Http2ConnectionState15restart_streamsEv
   0.59%  [kernel]                                         [k] __list_del_entry_valid
   0.56%  libcrypto.so.1.0.0                               [.] X25519
   0.56%  [kernel]                                         [k] ixgbe_clean_rx_irq
   0.54%  libjemalloc.so.2                                 [.] free
   0.54%  [kernel]                                         [k] csum_partial
   0.53%  [kernel]                                         [k] tcp_write_xmit
   0.51%  [kernel]                                         [k] __alloc_skb
   0.50%  [kernel]                                         [k] __netif_receive_skb_core.constprop.0
   0.49%  traffic_server                                   [.] _Z19mime_hdr_field_findP11MIMEHdrImplPKci
   0.48%  [kernel]                                         [k] __tcp_transmit_skb
   0.47%  [kernel]                                         [k] kfree
   0.47%  [kernel]                                         [k] ___slab_alloc
   0.46%  traffic_server                                   [.] _ZN11Http2Stream15restart_sendingEv
   0.46%  traffic_server                                   [.] _ZN9MutexLockD2Ev

I counted calls to Http2ConnectionState::restart_streams and found that ATS 10 calls this function 20 times more than 9.2. (so that comes out to 6293121 calls in 10 seconds vs 342508)

We think this increase primarily comes from this change in this PR:
https://github.com/apache/trafficserver/pull/11094/files#diff-00a3e1dd2853b5969f857e4e598b30d7e5c8cca51277c2446b9a5bdd62dd8f90R2241-R2243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants