[newlib-cygwin/main] Cygwin: regex: wgetnext: Re-add kludge to be more glibc compatible

Corinna Vinschen corinna@sourceware.org
Thu Mar 16 12:55:01 GMT 2023


https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=0bdc764b421b56ac2961ce54f538d4a71f38b724

commit 0bdc764b421b56ac2961ce54f538d4a71f38b724
Author:     Corinna Vinschen <corinna@vinschen.de>
AuthorDate: Thu Mar 16 12:44:32 2023 +0100
Commit:     Corinna Vinschen <corinna@vinschen.de>
CommitDate: Thu Mar 16 13:46:01 2023 +0100

    Cygwin: regex: wgetnext: Re-add kludge to be more glibc compatible
    
    Add comment to explain.
    
    Signed-off-by: Corinna Vinschen <corinna@vinschen.de>

Diff:
---
 winsup/cygwin/regex/regcomp.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/winsup/cygwin/regex/regcomp.c b/winsup/cygwin/regex/regcomp.c
index 3c735931040f..59da896a90a1 100644
--- a/winsup/cygwin/regex/regcomp.c
+++ b/winsup/cygwin/regex/regcomp.c
@@ -1528,6 +1528,18 @@ wgetnext(struct parse *p)
 	wint_t wc;
 	size_t n;
 
+#ifdef __CYGWIN__
+       /* Kludge for more glibc compatibility.  On Cygwin as well as on
+	  Linux, mbrtowc returns -1 if the current local's codeset is ASCII
+	  and the character is >= 0x80.  Nevertheless, glibc's regcomp allows
+	  any char value, even stuff like [\xc0-\xff], if the locale's codeset
+	  is ASCII, so in regcomp it ignores the fact that chars >= 0x80 are
+	  invalid ASCII chars.  To be more Linux-compatible, we align the
+	  behaviour to glibc here.  Allow any character value if the current
+	  local's codeset is ASCII. */
+	if (*__current_locale_charset () == 'A') /* SCII */
+	  return (wint_t) (unsigned char) *p->next++;
+#endif
 	memset(&mbs, 0, sizeof(mbs));
 	n = mbrtowi(&wc, p->next, p->end - p->next, &mbs);
 	if (n == (size_t)-1 || n == (size_t)-2) {


More information about the Cygwin-cvs mailing list