[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Takashi Yano takashi.yano@nifty.ne.jp
Wed Sep 2 13:06:00 GMT 2020


On Wed, 2 Sep 2020 08:26:04 +0200 (CEST)
Johannes Schindelin wrote:
> Hi Takashi,
> 
> On Wed, 2 Sep 2020, Takashi Yano via Cygwin-patches wrote:
> 
> > On Wed, 2 Sep 2020 10:30:14 +0200
> > Corinna Vinschen wrote:
> > > On Sep  1 18:19, Johannes Schindelin wrote:
> > > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is
> > > > correct, but after that (at least if Pseudo Console support is enabled),
> > > > we try to find the default code page for that `LCID`, which is ASCII
> > > > (437). Subsequently, we set the Console output code page to that value,
> > > > completely ignoring that we wanted to use UTF-8.
> > > >
> > > > Let's not ignore the specifically asked-for UTF-8 character set.
> > > >
> > > > While at it, let's also set the Console output code page even if Pseudo
> > > > Console support is disabled; contrary to the behavior of v3.0.7, the
> > > > Console output code page is not ignored in that case.
> > > >
> > > > The most common symptom would be that console applications which do not
> > > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text
> > > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x.
> > > >
> > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974,
> > > > https://github.com/msys2/MSYS2-packages/issues/2012,
> > > > https://github.com/rust-lang/cargo/issues/8369,
> > > > https://github.com/git-for-windows/git/issues/2734,
> > > > https://github.com/git-for-windows/git/issues/2793,
> > > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a
> > > > few others.
> > > >
> > > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > > > ---
> > > >  winsup/cygwin/fhandler_tty.cc | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > >
> > > Ok guys, I'm not opposed to this change in terms of its result,
> >
> > I am sorry, but I cannot agree with Johannes's patch.
> >
> > For example, code page in Japan is CP932 by default.
> > In this case, cmd.exe, netsh.exe and so on are generate
> > messages in Japanese.
> >
> > If the code page is set to CP_UTF8, messages from such
> > commands changes to english. I guess similar things happen
> > for other locales.
> >
> > I do not prefer this result.
> >
> > Furthermore, I looked into the issue:
> > https://github.com/git-for-windows/git/issues/2734
> > and I found that git-for-windows always use utf-8
> > encoding even if the locale is ja_JP.CP932.
> > It does not change coding based on locale or code
> > page.
> >
> > Even with Johannes's patch, if mintty is started with
> > locale ja_JP.CP932, the file name will be garbled
> > bacause SetConsoleOutputCP(CP_UTF8) will not be called.
> >
> > IMHO, it is the problem of git-for-windows rather
> > than cygwin and msys2.
> >
> > To make current version of git-for-windows work, it is
> > necessary to set code page to CP_UTF8 regardless locale.
> > This does not make sense at all.
> 
> You are misrepresenting the problem. It has nothing to do with Git for
> Windows. For example, if you run tests in an Angular project inside
> Cygwin's MinTTY, the output will be garbled because node.js (or the
> Angular libraries, I don't know) will interpret `LANG` or something.

You listed these issues in git-for-windows:
> > > > https://github.com/git-for-windows/git/issues/2734,
> > > > https://github.com/git-for-windows/git/issues/2792,
didn't you? Therefore, I looked into them.

OK, I will check Angular/CLI next. But I am not familier with
Agnular/CLI. Could you please provide simple steps to reproduce
the problem?

> This is a much bigger problem than you make it sound. The console
> applications that do _not_ call `SetConsoleOutputCP()` are sooooo
> ubiquituous. I think you are underestimating that problem rather
> dramatically.
> 
> Ciao,
> Johannes


-- 
Takashi Yano <takashi.yano@nifty.ne.jp>


More information about the Cygwin-patches mailing list