Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use r_str_scanf as a safe alternative to fix gdb reg profile parsing bugs #22562

Closed

Conversation

Crispy-fried-chicken
Copy link
Contributor

scanf("%s") uses are not safe, which may cause OOB, so we need to limit the number of characters read.

@@ -293,4 +293,7 @@ static inline void *untagged_pointer_check(void *x) {
}
#endif

#define RZ_STR_DEF(s) RZ_STR(s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont use rizinisms please

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ those macros don’t really provide much goodness and their name is from rizin so it should be removed

@@ -12,6 +12,13 @@ typedef struct {
#define UNSUPPORTED 0
#define SUPPORTED 1

#define PROC_NAME_SZ 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those constants actually coming from the kernel? If not i would prefer to have no limits on any of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they all come from the kernel which add in the patch of CVE-2023-27590, that is rizinorg/rizin@d619670. So maybe it should be there?

#define PROC_REGION_SZ 100
// PROC_REGION_SZ - 2 (used for `0x`). Due to how RZ_STR_DEF works this can't be
// computed.
#define PROC_REGION_LEFT_SZ 98
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precomputed values and assumptions like this are prompt to become maintainance problems for later. Comments shouldnt be needed if the code is solid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe I should delete the comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defines shouldnt exist either. Those constants are arbitrary and the value for that one can be computed from another one. Otherwise its a maintainance complication

@@ -240,8 +247,8 @@ static RList *r_debug_gdb_map_get(RDebug* dbg) { // TODO
// 7ffff7dda000-7ffff7dfd000 r-xp 00000000 08:05 265428 /usr/lib/ld-2.25.so
// Android
// "12c00000-12c40000 rw-p 00000000 00:00 0 [anon:dalvik-main space (region space)]";
ret = sscanf (ptr, "%s %s %"PFMT64x" %*s %*s %[^\n]", &region1[2],
perms, &offset, name);
ret = sscanf(ptr, "%" RZ_STR_DEF(PROC_REGION_LEFT_SZ) "s %" RZ_STR_DEF(PROC_PERM_SZ) "s %" PFMT64x " %*s %*s %" RZ_STR_DEF(PROC_NAME_SZ) "[^\n]",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont use rizinisms

@@ -22,8 +25,8 @@ static RList *__io_maps(RDebug *dbg) {
}
char *ostr = str;
ut64 map_start, map_end;
char perm[32];
char name[512];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres no need to have a define for a size that is tied to an array. Thats defined in the secure coding guidelines as a bad practice because it makes the code more xlobbered and harder to maintain and error probe

@@ -47,7 +50,7 @@ static RList *__io_maps(RDebug *dbg) {
if (_s_) {
memmove (_s_, _s_ + 2, strlen (_s_));
}
sscanf (str, "0x%"PFMT64x" - 0x%"PFMT64x" %s %s",
sscanf(str, "0x%" PFMT64x " - 0x%" PFMT64x " %" RZ_STR_DEF(IO_MAPS_PERM_SZ) "s %" RZ_STR_DEF(IO_MAPS_NAME_SZ) "s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sscanf is an awful function and there are actually many other issues than the size of the data stored after parsing. The return value should be checked for example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to just check for the space and use rnum to parse the string and strdup to catch the name. This way no limits in stack bounds are needed and will make it more flexible and simple

@@ -293,4 +293,7 @@ static inline void *untagged_pointer_check(void *x) {
}
#endif

#define RZ_STR_DEF(s) RZ_STR(s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ those macros don’t really provide much goodness and their name is from rizin so it should be removed

@@ -194,7 +195,7 @@ static RList *r_debug_gdb_map_get(RDebug* dbg) { // TODO
int unk = 0, perm, i;
char *ptr, *pos_1;
size_t line_len;
char name[1024], region1[256], region2[100], perms[5];
char name[1025], region1[101], region2[101], perms[6];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those extra unaligned bytes will make stack to have holes and wont optimize it properly, also as said those are arbitrary numbers. not tied to gdb definitions

// "12c00000-12c40000 rw-p 00000000 00:00 0 [anon:dalvik-main space (region space)]";
ret = sscanf (ptr, "%s %s %"PFMT64x" %*s %*s %[^\n]", &region1[2],
perms, &offset, name);
ret = sscanf (ptr, "%" RZ_STR_DEF(98) "s %" RZ_STR_DEF(5) "s %" PFMT64x " %*s %*s %" RZ_STR_DEF(1024) "[^\n]", &region1[2], perms, &offset, name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use those ugly RZ_ things, and dnot hardcode the sizes here, the proper way to use that is to use sizeof(varname) instead of using a define or duplicating the value, this way the final code is much easier to maintain. this is specified in https://wiki.sei.cmu.edu/confluence/display/c if you are curious, also makes the code more readable. But again, ideally i would prefer not to use sscanf, but im fine to merge the fix when complete

@@ -22,8 +22,8 @@ static RList *__io_maps(RDebug *dbg) {
}
char *ostr = str;
ut64 map_start, map_end;
char perm[32];
char name[512];
char perm[33];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the aligned values, 32 and 512

@@ -3,6 +3,10 @@
#include <r_reg.h>
#include <r_lib.h>

#define GDB_NAME_SZ 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, those defines are arbitrary, and duplicate the logic below making the code harder to read.

@trufae
Copy link
Collaborator

trufae commented Feb 6, 2024

I had some time to check your PR in detail. and i think the only sane way to handle sscanf with dynamic string formating in a clean and maintainable way is to not use sscanf or build the format string at runtime, because %*s is not supported by sscanf (this is printf-specific), and its not possible to use sizeof() and concat that result into the formatting string. Therefor this is probably the cleanest way I can think of right now:

#include <stdio.h>
#include <r_util.h>
#include <stdlib.h>

int main() {
	char str[4];
	char str2[4];
	char buf[] = "hello world";
	r_strf_var (format, 64, "%%%ds %%%ds", (int)sizeof (str), (int)sizeof (str2));
	int ret = sscanf (buf, format, str, str2);
	printf ("FMT %s\n", format);
	printf ("RET %d == 2\n", ret);
	printf ("-> %s\n", str);
	return 0;
}

i dont like the fact that the sizeof and str are decoupled into separate statements, and that could be probably wrapped up with amacro that constructs the string and runs sscanf, also, considering modern compilers can do printf at compile time the final code should be as simple as the original, and the construction of the format string should be done outside any loop.

But again, if its possible i would prefer to avoid the use of sscanf, and just strchr or use r_str to split the string by spaces and such instead of depending on it.

Also this PR raises concern for other bad usecases of sscanf:

0$ git grep sscanf| grep '%s'
libr/debug/p/debug_gdb.c:		ret = sscanf (ptr, "%s %s %"PFMT64x" %*s %*s %[^\n]", &region1[2],
libr/debug/p/debug_io.c:			sscanf (str, "0x%"PFMT64x" - 0x%"PFMT64x" %s %s",
libr/debug/p/debug_native.c:		if (sscanf (line, "%s %s %d %d 0x%s %3s %d %d",
libr/debug/p/debug_native.c:		i = sscanf (line, "%s %s %08"PFMT64x" %*s %*s %[^\n]", &region[2], perms, &offset, name);
libr/debug/p/native/linux/linux_coredump.c:		sscanf (buff,  "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu"
libr/debug/p/native/linux/linux_coredump.c:		sscanf (buff, "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu"
libr/io/p/io_self.c:		sscanf (line, "%s %s %*s %*s %*s %[^\n]", region + 2, perms, path);
libr/reg/profile.c:		ret = sscanf (ptr, " %s %d %d %d %d %s %s", name, &number, &rel,
0$

and also:

$ git grep sscanf| grep '%\*s'
libr/debug/p/debug_gdb.c:		ret = sscanf (ptr, "%s %s %"PFMT64x" %*s %*s %[^\n]", &region1[2],
libr/debug/p/debug_native.c:		i = sscanf (line, "%s %s %08"PFMT64x" %*s %*s %[^\n]", &region[2], perms, &offset, name);
libr/io/p/io_self.c:		sscanf (line, "%s %s %*s %*s %*s %[^\n]", region + 2, perms, path);

thanks f

@trufae
Copy link
Collaborator

trufae commented Feb 6, 2024

importing this https://github.com/tusharjois/bscanf is probably the best option :) ill look into that and get it as part of r_util asap

@Crispy-fried-chicken
Copy link
Contributor Author

Crispy-fried-chicken commented Feb 6, 2024

importing this https://github.com/tusharjois/bscanf is probably the best option :) ill look into that and get it as part of r_util asap

OK, But why I can't pass the check, I don't kown how to fix it, can you tell me the reason so that I can update :)

// "12c00000-12c40000 rw-p 00000000 00:00 0 [anon:dalvik-main space (region space)]";
ret = sscanf (ptr, "%s %s %"PFMT64x" %*s %*s %[^\n]", &region1[2],
perms, &offset, name);
r_strf_var (format, 64, "%%%ds %%%ds %%PFMT64x %%*s %%*s %%%d [^\\n]", (int)sizeof (region1[2]), (int)sizeof (perms), (int)sizeof (offset), (int)sizeof (name));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PFMT64x is a define, and needs to be defined outside the string itself, so that wont work, also the %*s is not supported in sscanf, so that wont work either because that seems to be a nonportable gnuism

Suggested change
r_strf_var (format, 64, "%%%ds %%%ds %%PFMT64x %%*s %%*s %%%d [^\\n]", (int)sizeof (region1[2]), (int)sizeof (perms), (int)sizeof (offset), (int)sizeof (name));
r_strf_var (format, 64, "%%%ds %%%ds %%PFMT64x %%*s %%*s %%%d [^\\n]", (int)sizeof (region1[2]), (int)sizeof (perms), (int)sizeof (offset), (int)sizeof (name));

Copy link
Collaborator

@trufae trufae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait for me to push the new sscanf implementaion, it will make things much cleaner and safe and i'll add unit tests for all that, so we wont need to use PFMT64x and such anymore and %*s will work, and %s will be banned

@trufae
Copy link
Collaborator

trufae commented Feb 6, 2024

Thanks for the patience and time updating this PR. it's a good finding, but i want it to be addressed in the cleanest way as possible. Good work!

@Crispy-fried-chicken
Copy link
Contributor Author

Crispy-fried-chicken commented Feb 6, 2024

Thanks for the patience and time updating this PR. it's a good finding, but i want it to be addressed in the cleanest way as possible. Good work!

Thank you for your review and looking forward to your implementaion. Besides, due to the harm this vulnerability may cause, is it necessary to apply for a CVE for this vulnerability?

@trufae
Copy link
Collaborator

trufae commented Feb 6, 2024

Found a couple of bugs in their bscanf implementation. fixed them in my fork and extended it to support %.s (%*s is kept for backward compat), so we cant use the attribute check sadly. added some tests that may probably be helpful for those cases :)

#22567

let me know what do you think?

i still want to cleanup this code more, and add a lot of more tests. and probably integrate it with RNum, so we can use it properly with extended math operation expressions and other useful format strings

@trufae
Copy link
Collaborator

trufae commented Feb 6, 2024

The new r_str_scanf function is merged in, do you wanna give it a try to rewrite this function using the new API instead? will be good to test the patch and maybe add some more unit tests for it. thanks!

@Crispy-fried-chicken
Copy link
Contributor Author

Crispy-fried-chicken commented Feb 7, 2024

@trufae hi, I've seen the implementation of r_str_scanf and I can't find any implement of the %[\n], and find there is a TODO in r_str_scanf function just like:

/* TODO: '[': match a non-empty sequence of characters from a set. */
				_BSCANF_CHECK(0);

				/* String conversion requires a width. */
				_BSCANF_CHECK_STRING();
				/* '[' conversion specifiers DO NOT consume whitespace. */

so maybe the statement below can't rewrite by the api r_str_scanf?
ret = sscanf (ptr, "%s %s %"PFMT64x" %*s %*s %[^\n]", &region1[2],perms, &offset, name);
and Please tell me if my modifications are correct so that I can verify whether my understanding is accurate.
the origin statement is

r_strf_var (format, 64, "0x%%PFMT64x - 0x%%PFMT64x %%%ds %%%ds", (int)sizeof (perm), (int)sizeof (name));
			sscanf (str, format, &map_start, &map_end, perm, name);

the rewrite statement is

r_str_scanf (str, "0x%I64x - 0x%I64x %.s %.s", &map_start, &map_end, sizeof (perm), perm, sizeof (name), name);

Thank you!

Copy link
Collaborator

@trufae trufae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see inline comments

@@ -1,6 +1,7 @@
/* radare - LGPL - Copyright 2009-2023 - pancake, defragger */

#include <r_core.h>
#include <r_util.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r_util is included by r_core so its not necessary

// "12c00000-12c40000 rw-p 00000000 00:00 0 [anon:dalvik-main space (region space)]";
ret = sscanf (ptr, "%s %s %"PFMT64x" %*s %*s %[^\n]", &region1[2],
perms, &offset, name);
r_strf_var (format, 64, "%%%ds %%%ds %%PFMT64x %%*s %%*s %%%d [^\\n]", (int)sizeof (region1[2]), (int)sizeof (perms), (int)sizeof (offset), (int)sizeof (name));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm will be good to implement this [^\\n] part in r_str_scanf, or change the code to not make it depend on this.. for several reasons.

The code you should write to make this format string should be like this:

Suggested change
r_strf_var (format, 64, "%%%ds %%%ds %%PFMT64x %%*s %%*s %%%d [^\\n]", (int)sizeof (region1[2]), (int)sizeof (perms), (int)sizeof (offset), (int)sizeof (name));
r_strf_var (format, 64, "%%%ds %%%ds %%"PFMT64x" %%*s %%*s %%%d [^\\n]", (int)sizeof (region1[2]), (int)sizeof (perms), (int)sizeof (offset), (int)sizeof (name));

but still. scanf scansets introduce a bunch of issues

  • output buffer is not size bounded (so it will cause overflows)
  • not portable, not all libc implementations support this syntax
  • even if portable, does not support wide strings and is affected by locales
  • scanset size is not defined by the standard, and char ranges may vary depending on implementation

in other words, scanfs scansets are a bad idea because they introduce an unnecessary complexity to scanning strings.

After some thoughs and tries i end up implementing it in the new RStr.scanf() function you can find out the implementation+ test in here:

#22572

@@ -2,6 +2,7 @@

#include <r_debug.h>
#include <r_asm.h>
#include <r_util.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r_util already included by r_asm iirc

sscanf (str, "0x%"PFMT64x" - 0x%"PFMT64x" %s %s",
&map_start, &map_end, perm, name);
r_strf_var (format, 64, "0x%%PFMT64x - 0x%%PFMT64x %%%ds %%%ds", (int)sizeof (perm), (int)sizeof (name));
sscanf (str, format, &map_start, &map_end, perm, name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use r_str_scanf. some notes here:

  • no need to cast (int)sizeof, because our scanf handles size_t properly
  • the 0x%PFMT64x thing is just 0x%Lx

please add tests for those strings before updating this, so we can be sure the formatting and parsing works well acrsoss all platforms, this is safer and cleaner than depending on libc

@@ -3,6 +3,7 @@
#include <r_userconf.h>
#include <r_drx.h>
#include <r_core.h>
#include <r_util.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary

&region[2], &region2[2], &ign, &ign,
unkstr, perms, &ign, &ign) != 8) {
r_strf_var (format, 64, "%%%ds %%%ds %%d %%d 0x%%%ds %%%ds %%d %%d", (int)sizeof (region1[2]), (int)sizeof (region2[2]), (int)sizeof (unkstr), (int)sizeof (perms));
if (sscanf (line, format, &region[2], &region2[2], &ign, &ign, unkstr, perms, &ign, &ign) != 8) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use r_str_scanf instead of rstrvar

@@ -1109,7 +1109,8 @@ static RList *r_debug_native_map_get(RDebug *dbg) {
#else
ut64 offset = 0;
// 7fc8124c4000-7fc81278d000 r--p 00000000 fc:00 17043921 /usr/lib/locale/locale-archive
i = sscanf (line, "%s %s %08"PFMT64x" %*s %*s %[^\n]", &region[2], perms, &offset, name);
r_strf_var (format, 64, "%%%ds %%%ds %%08PFMT64x %%*s %%*s %%%d[^\n]", (int)sizeof (region[2]), (int)sizeof (perms), (int)sizeof (name));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, use the new r_str_scanf

@Crispy-fried-chicken
Copy link
Contributor Author

@trufae hi, I've modified it, but I don't know why it can't pass the linux-test, please tell me why so that I can fix it. Thank you!

Copy link
Collaborator

@trufae trufae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sizeof where wrong. i fixed them now. let's hope it passes the tests now. i may add more tests for the scanf thing to cover all those usecases

@Crispy-fried-chicken
Copy link
Contributor Author

the sizeof where wrong. i fixed them now. let's hope it passes the tests now. i may add more tests for the scanf thing to cover all those usecases

thank you! but can you tell me why the sizeof should be sizeof (region1)-2 but not sizeof (region[2]) ?

@trufae
Copy link
Collaborator

trufae commented Feb 14, 2024

sizeof (region[2]) is 1 byte, which is clearly not enough to store a string, the argument expects to take sizeof(region1) substracting the "0x" which are the first two bytes written in that string

@trufae
Copy link
Collaborator

trufae commented Feb 14, 2024

let me jump into the linux machine and tryout your pr and see what else could be

Co-authored-by: pancake <[email protected]>
libr/debug/p/debug_native.c Outdated Show resolved Hide resolved
libr/debug/p/debug_native.c Outdated Show resolved Hide resolved
sscanf (buff, "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu"
"%"PFMT64x" %"PFMT64x" %ld %lu",
&no_num, no_str, &no_char, &no_num, &no_num, &no_num,
r_str_scanf (buff, "%d %.s %c %d %d %d %d %d %u %lu %lu %lu %Lx %Lx %ld %lu", &no_num, sizeof (no_str), no_str, &no_char, &no_num, &no_num, &no_num,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can get rid of all those no* variables if we use %*d in the arguments instead

@trufae
Copy link
Collaborator

trufae commented Feb 15, 2024

i just tested your pr in local and i missed another wrong sizeof(region[2]). i have squahed the 32 commits into this separate PR and merged it under your name.

Thank you!

@trufae trufae closed this Feb 15, 2024
@Crispy-fried-chicken
Copy link
Contributor Author

i just tested your pr in local and i missed another wrong sizeof(region[2]). i have squahed the 32 commits into this separate PR and merged it under your name.

Thank you!

Can you tell me what the pr id is ? and due to the harm this vulnerability may cause, is it necessary to apply for a CVE for this vulnerability?

@trufae
Copy link
Collaborator

trufae commented Feb 15, 2024

This is the PR #22599

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants