Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement platform-independent signal lists #1206

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vidraj
Copy link

@vidraj vidraj commented Mar 9, 2023

This pull request implements platform-independent lists of signals, sorted by the signal number at compilation time, as discussed in bug #1200. The change is implemented for all platforms, but I only tested it on Linux, so more testing is definitely required.

The potentially breaking changes I'm aware of are:

  1. The existing Solaris code does something wacky with #include <signal.h> and I'm not sure how it currently works (it appears not to include signal.h at all), so my change might break that.
  2. NetBSD defines the real-time signals directly in the platform list, while all other platforms only add them when creating the panel. My code removes the special case.
  3. Only signals which differ between platforms are define-protected. It is possible the code will refuse to compile on some platforms. I could define-protect all signals, but didn't, to make sure the compilation fails when the signal.h header is broken or unavailable. It is possible to define-protect all signals save e.g. SIGINT, which should be defined everywhere.
  4. The code relies on signals being actually #defined as numbers, not e.g. as #define SIGINT (SIGHUP+1). If this assumption is incorrect, the code would issue a compile-time warning and produce an unsorted list.

Stylistic and nitpicky issues:

  1. I've never used Autotools before, so while the new code in configure.ac and Makefile.am works on my machine, it may not be entirely correct or idiomatic.
  2. In particular, the test for sort is probably unnecessary, since the test doesn't stop the build if sort is not present, instead setting the SORT variable to just sort instead of the path. The test for sed works the same, but it at least tries to find a version of sed that actually works. It might be better to write it as
    AC_PATH_PROG([SORT], [sort], [no])
    AS_IF([test "x$SORT" = xno], [AC_MSG_ERROR([sort not found in PATH])])
    but I'm not sure what the preferred way is.
  3. The sed scripts are arguably ugly. However, it's a tool that's available everywhere and should just work, unlike e.g. Python or Perl. Perhaps AWK would be a better choice, but I don't speak that language.

vidraj added 2 commits March 9, 2023 17:50
It is possible to get the list of signals with their associated signal
numbers in a platform-agnostic way using just the C preprocessor.
However, the preprocessor is nor able to sort the list by signal
number. Three solutions are viable:
1. Leave the list unsorted, or sort by signal name.
2. Sort the list on program startup.
3. Sort the list at compile-time using gross hacks.

In the discussion of bug htop-dev#1200, a preference for compile-time sorting
emerged. To do that, a two-step approach is needed: A C source with
the signal definitions is preprocessed, the output filtered through
`sed` and `sort` and then included from another, final, C source file.

As of this commit, the created list does not replace signal lists in
from the `Platform.c` files in individual platform subdirectories, and
therefore is not used in the final binary.
Previously, each platform hardcoded its own signal list. However, at
least on Linux, the signal numbers and even the available names vary
between architectures, requiring a more clever approach. A platform
independent code now replaces the individual lists for all platforms.

This may cause minor changes in the UI. For example, Solaris used to
display duplicate signal names as "22 SIGPOLL/IO", but now the panel
will show two entries, "22 SIGIO" and "22 SIGPOLL", matching behavior
on other platforms.

Fixes htop-dev#1200

SUFFIXES = .i
# Preprocessing a list of signals to get their numbers from the headers.
.c.i:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.c.i:
signals/SignalList.in.i: signals/SignalList.in.c

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the suggested form, this won't work with out-of-tree builds. Why not use the suffix rule? I believe that suffix rules are idiomatic for Automake and the recipe is generic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix rule would affect all files with the same extension in the build tree. Since we have just one file that needs such preprocessing, a suffix rule would be a bad idea.

And yes, the dependency line should be $(srcdir)/signals/SignalList.in.c instead.

signals/ParseSignals.sed Show resolved Hide resolved
@Explorer09
Copy link
Contributor

Explorer09 commented Mar 10, 2023

Firstly, thanks for your work that proves my idea would work. However I think you made your C preprocessor and sed script a little more complicated than necessary.

I think the HTOP_SIGENTRY macro and the first sed script can be simplified to something like this (the code is untested):

// It is unlikely that braces would appear in SIG* macro
// definitions; hence we use braces as field delimiters.
#define HTOP_SIGENTRY(x) __htop_sigentry__ { x }{ #x }
# In sed

/^$/ d;
# Normalize whitespaces
s/[[:space:]][[:space:]]*/ /g;
# Discard "#line" directives
/^ *#/ d;
/^ *%:/ d;

:start
/__htop_sigentry__ *{ *[^}]* *}{ *[^}]* *}/ {
   s/__htop_sigentry__ *{ *\([^}]*\) *}{ *\([^}]*\) *}/\n__htop_sigentry__{\1}{\2}\n/1;
   D;
   P;
   D;
   b start;
}

# GNU sed prints the last line when 'N' command reaches
# end of file (non-POSIX behavior). Need this workaround before 'N':
$ d;
N;
s/\n/ /1;
b start;

@vidraj
Copy link
Author

vidraj commented Mar 12, 2023

Thank you for your swift reaction and the comments!

I think the HTOP_SIGENTRY macro and the first sed script can be simplified to something like this (the code is untested):

// It is unlikely that braces would appear in SIG* macro
// definitions; hence we use braces as field delimiters.
#define HTOP_SIGENTRY(x) __htop_sigentry__ { x }{ #x }

That was my initial implementation as well, however, unlikely is not impossible and once I wrote code to at least detect extraneous braces and report an error, it quickly grew beyond the current implementation.

I'd like the code to be robust, especially since braces can enclose complex expressions with multiple statements inside. I can see that somebody could try to use that to e.g. track uses of deprecated signals, so it is not entirely unlikely for such code to appear. At the very least, the code should not break when that happens. So I still prefer my solution, even though it is longer, I believe the error checking is worth it.

That being said, I can redo the pull request with your suggestions if you want – let me know what your preferences are, you're the maintainer here. :-)

# In sed

/^$/ d;
# Normalize whitespaces
s/[[:space:]][[:space:]]*/ /g;
# Discard "#line" directives
/^ *#/ d;
/^ *%:/ d;

:start
/__htop_sigentry__ *{ *[^}]* *}{ *[^}]* *}/ {
   s/__htop_sigentry__ *{ *\([^}]*\) *}{ *\([^}]*\) *}/\n__htop_sigentry__{\1}{\2}\n/1;
   D;
   P;
   D;
   b start;
}

# GNU sed prints the last line when 'N' command reaches
# end of file (non-POSIX behavior). Need this workaround before 'N':
$ d;
N;
s/\n/ /1;
b start;

As written, this requires quadratic space to reach the first sigentry, though a /__htop_sigentry__/ !d fixes that, and the D command starts a new cycle, it has to be replaced by s/[^\n]*\n//. Quick fixes aside, the output needs further processing – these are the first two entries it prints for me:

__htop_sigentry__{# 13 "signals/SignalList.in.c" 3 4 6  # 13 "signals/SignalList.in.c" }{"SIGABRT" }
__htop_sigentry__{# 15 "signals/SignalList.in.c" 3 4 14  # 15 "signals/SignalList.in.c" }{"SIGALRM" }

A possible fix is to add

s/\n[[:space:]]*#[^\n]*//
s/\n[[:space:]]*%:[^\n]*//
s/\n[[:space:]]*??=[^\n]*//

after the last N.

@Explorer09
Copy link
Contributor

That was my initial implementation as well, however, unlikely is not impossible and once I wrote code to at least detect extraneous braces and report an error, it quickly grew beyond the current implementation.

I'd like the code to be robust, especially since braces can enclose complex expressions with multiple statements inside. I can see that somebody could try to use that to e.g. track uses of deprecated signals, so it is not entirely unlikely for such code to appear. At the very least, the code should not break when that happens. So I still prefer my solution, even though it is longer, I believe the error checking is worth it.

How is it possible for a SIG* macro to contain braces without causing a syntax error upon use?

I see the possibility for a signal name macro to be defined like (SIGHUP + 1) or even (__do_something _internally__() [[attribute_foo]], 2), but unless it's an array or struct definition, it makes no sense to use braces at all. Correct me if I'm wrong.

As written, this requires quadratic space to reach the first sigentry, though a /__htop_sigentry__/ !d fixes
, and the D command starts a new cycle, it has to be replaced by s/[^\n]*\n//.

Okay. Thanks for catching those.

Quick fixes aside, the output needs further processing – these are the first two entries it prints for me:

__htop_sigentry__{# 13 "signals/SignalList.in.c" 3 4 6  # 13 "signals/SignalList.in.c" }{"SIGABRT" }
__htop_sigentry__{# 15 "signals/SignalList.in.c" 3 4 14  # 15 "signals/SignalList.in.c" }{"SIGALRM" }

A possible fix is to add

s/\n[[:space:]]*#[^\n]*//
s/\n[[:space:]]*%:[^\n]*//
s/\n[[:space:]]*??=[^\n]*//

after the last N.

Yes. When I wrote the draft code I forgot to remove the lines beginning with # after joining them with N. But I would write like this:

s/[[:space:]][[:space:]]*/ /g;
s/\n *#[^\n]*//
s/\n *%:[^\n]*//
s/\n *??=[^\n]*//

@vidraj
Copy link
Author

vidraj commented Mar 12, 2023

How is it possible for a SIG* macro to contain braces without causing a syntax error upon use?

I see the possibility for a signal name macro to be defined like (SIGHUP + 1) or even (__do_something _internally__() [[attribute_foo]], 2), but unless it's an array or struct definition, it makes no sense to use braces at all. Correct me if I'm wrong.

There is (as usual :-) ) a GNU extension for this:
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

@Explorer09
Copy link
Contributor

How is it possible for a SIG* macro to contain braces without causing a syntax error upon use?
I see the possibility for a signal name macro to be defined like (SIGHUP + 1) or even (__do_something _internally__() [[attribute_foo]], 2), but unless it's an array or struct definition, it makes no sense to use braces at all. Correct me if I'm wrong.

There is (as usual :-) ) a GNU extension for this: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

Fine, but I have another idea to make it:

#define HTOP_SIGENTRY(x) __htop_sigentry__ $ x $ #x

`, @, $ are the three characters in the ASCII printable set that are unused in the C language (they might appear in string or character literals, but they have no use outside them).

@cgzones
Copy link
Member

cgzones commented Jun 1, 2023

An alternative might be to test a set of signal IDs for availability, e.g. via sigaddset(3), and then use the ones not returning EINVAL with the name from strsignal(3).

@Explorer09
Copy link
Contributor

@cgzones Checking signal availability via such API calls would add overhead in run time; plus configure script can't check those when cross-compiling.

@BenBE
Copy link
Member

BenBE commented Jun 2, 2023

@cgzones Checking signal availability via such API calls would add overhead in run time;

You only have to do this once at startup and it's not like we are iterating millions of signal names there. I think that cost is negligible.

plus configure script can't check those when cross-compiling.

Which should just trigger a static fallback, we would need anyway in case that API was unavailable for some reason.

Overall, IMO the current version of the patch is a bit over-engineered and a simpler "check for target arch, switch to appropriate list depending on that" would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants