Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: callback to know when the sub-process is started, of why it failed to exec #265

Open
simark opened this issue Jan 25, 2019 · 15 comments
Labels
enhancement help wanted Issues identified as good community contribution opportunities

Comments

@simark
Copy link

simark commented Jan 25, 2019

Environment details

  • OS: GNU/Linux
  • OS version: 16.04
  • node-pty version: master

Issue description

When using the spawn function, I didn't find any way to know when the process has been successfully started, or if the exec failed for some reason. My concrete use case is that I need to start a process on a pty. On success, I need to report to the caller that the process has been started*. On failure (e.g. if the executable doesn't exist), I should return the error code. I haven't found a way to do this using the current API.

I have implemented something in our fork of node-pty here, but unfortunately my employer doesn't want me to sign the Microsoft CLA, and thus I can't contribute it. But I thought I would share the idea, to get some feedback. And if somebody wants to implement something similar in the upstream repo, I guess you can't copy the code but you can get some good inspiration from it.

In my patch, I add an exec event to the IPty/ITerminal interface, with an optional error parameter. If the exec goes well, the exec callback is invoked without parameter. If there is an error, it is invoked with an errno string (e.g. ENOENT).

On Linux/macOS, since we use fork + exec to spawn a new process, it's a bit difficult to get some feedback on the exec (since it happens in a new process). I looked at how libuv does (what node's child_process.spawn relies on) and did something similar, which is to use a pipe between the child process and our process to report success or failure.

For Windows, it looks like we just need to catch an exception on failure, here's what I did.

Any feedback is appreciated, and if anybody is willing to work on an upstream version of this, I would be happy to help.

  • My criterion for "started" just means that the exec system call returned "success". Many things can go wrong after that, like dynamic libraries not found, but that's out of scope here.
@Tyriar
Copy link
Member

Tyriar commented Jan 26, 2019

@simark to track a process I normally launch cmd ['/c', 'command'] on Windows otherwise sh ['-c', 'command'], that has cmd/sh launch the command immediately and returns the exit code of the sub-process. Am I not understanding the problem correctly?

@simark
Copy link
Author

simark commented Jan 26, 2019

Hmm not really. It's not related to the process exit. It is that we want to be able to know whether the process was started successfully (meaning, the exec system call succeeded) or not. In practice, in our UI, we want to show either "Process X started" or "Couldn't start process X: no such file or directory", for example. It could be a long running process.

Checking the exit event doesn't help. First, we are not notified when the process has been successfully started (to show the success message). Then, on failure (if the program does not exist), we just receive an exit event, which we can't differentiate from a normal process exit.

For example, with node's child_process.spawn, you get an error when trying to exec something that doesn't exist:

const cp = require('child_process');
const proc = cp.spawn('./doesnotexist');
proc.on('error', err => {
	console.log('error', err);
});
proc.on('exit', code => {
	console.log('exit', code);
});
$ node test.js  
error { Error: spawn ./doesnotexist ENOENT
    at _errnoException (util.js:992:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
    at onErrorNT (internal/child_process.js:372:16)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
    at Function.Module.runMain (module.js:695:11)
    at startup (bootstrap_node.js:191:16)
    at bootstrap_node.js:612:3
  code: 'ENOENT',
  errno: 'ENOENT',
  syscall: 'spawn ./doesnotexist',
  path: './doesnotexist',
  spawnargs: [] }

with node-pty, we just get this exit event:

const cp = require('node-pty');
const proc = cp.spawn('./doesnotexist');
proc.on('exit', code => {
	console.log('exit', code);
});
$ node test.js 
exit 1

@Tyriar
Copy link
Member

Tyriar commented Jan 26, 2019

@simark I see, definitely seems useful. Can you get approval to sign the CLA to contribute this specific change?

@simark
Copy link
Author

simark commented Jan 27, 2019

We have already tried to get the approval for another Microsoft project (language-server stuff). IIUC, the problem our legal department has with the Microsoft CLA is that it applies to all open source Microsoft projects, which would be too wide (again, IIUC, I am not a lawyer). I doubt it will be possible to get an agreement signed just for one change...

If the situation changes, I will definitely update you.

@jerch
Copy link
Collaborator

jerch commented Feb 4, 2019

@simark You trigger the success event on nextTick, but it is not guaranteed that the exec call already happened in the forked process (the kernel process scheduler could have priorized the parent over the child process for some reason).
A reliable success indicator for exec call is not possible without hacking deeper into the system. A cheap workaround to this might be to misuse the exit code from the child with some unlikely return value (like 127). Thats a common pattern used by shells, bash returns 127 for "command not found".

@simark
Copy link
Author

simark commented Feb 4, 2019

@simark You trigger the success event on nextTick, but it is not guaranteed that the exec call already happened in the forked process (the kernel process scheduler could have priorized the parent over the child process for some reason).

My patch does something to avoid this (exactly the same technique as libuv). The parent synchronously waits for the child to exec, using a pipe:

theia-ide@f58e6ee#diff-32a8618c58cb849e9b5535ba19ffacc5R359

When the native function in the parent returns, the exec has either already succeeded or failed.

@simark
Copy link
Author

simark commented Feb 4, 2019

@jerch
Copy link
Collaborator

jerch commented Feb 4, 2019

Woops have overlooked this, yeah the CLOEXEC trick works for this.

@JohnStarich
Copy link

Could we document this behavior on spawn() and/or in the README?

I just spent hours banging my head against a wall, and it turned out the exec had silently failed because the file path was wrong 😄

I suppose I expected an exception or some sort of feedback of the failure. Maybe we can save the next person some debugging pain with a small note?

@simark
Copy link
Author

simark commented Jun 6, 2020

Even better, it should be fixed. @marcdumais-work, is the situation still the same at Ericsson (still not allowed to contribute to this project)?

@jerch
Copy link
Collaborator

jerch commented Jun 7, 2020

Yeah, the fact that a failed exec* gets silently swallowed by the module, is less than ideal.

@simark You already proposed a working patch for posix systems, would be nice to see this being applied. But also note, that the module is likely to see a major overhaul soon due to #405.

Not sure about the windows side here - maybe @Tyriar has some insights, whether it already promotes a failed process binary lookup?

@Tyriar
Copy link
Member

Tyriar commented Jun 8, 2020

Not sure what you mean by failed process binary lookup, but afaik on Windows it acts the same. VS Code uses a workaround for this by validating cwd and executable before invoking node-pty https://github.com/microsoft/vscode/blob/cf73ba7aac021ee448a258f597064dcd0f023b87/src/vs/workbench/contrib/terminal/node/terminalProcess.ts#L74-L110

It leads to some obscure errors though which could be better if we had this.

@marcdumais-work
Copy link

marcdumais-work commented Jun 8, 2020

Even better, it should be fixed. @marcdumais-work, is the situation still the same at Ericsson (still not allowed to contribute to this project)?

Yes, it's still the case, as far as I know, that we are not allowed to sign the Microsoft CLA.

However we kept our node-pty fork under the MIT license, so it's my (non-lawyer) understanding that anyone could take Simon's enhancement and submit it here, so long as the project's maintainers are willing to accept this.

Looking quickly I think these are the pieces of the patch and follow-ups:
theia-ide@f58e6ee
theia-ide@186e39c
theia-ide@89e077b

@jerch
Copy link
Collaborator

jerch commented Jun 8, 2020

Not sure what you mean by failed process binary lookup...

Sorry for sloppy wording - I mean whether it gives a meaningful error message, if the process binary lookup failed. (you already answered it)

Imho spawn is a 2 stage thingy on OS level:

  1. open a pty fd (posix) or conpty handle (windows)
  2. run given binary as first process - exec* on posix or CreateProcessW on windows

I think both should be exposed with proper error messages, so node-pty embedding code knows whats going on - 1. can happen if the the user context has not enough privileges to call the pty subsystem or the OS hits the max pty limit (is windows limiting this as well?), 2. is the binary lookup error.

I am also not sure, if the pty fd / conpty handle is correctly teared down atm, if 2. fails (they are two different steps on OS levels prolly wasting resources if they are not teared down).

@Tyriar Tyriar added enhancement help wanted Issues identified as good community contribution opportunities and removed need more info labels Oct 21, 2021
@Tyriar
Copy link
Member

Tyriar commented Oct 21, 2021

Summary: I think we do throw when conpty fails to create processes now, but may not in other cases. To action this we should review what happens when processes fail to launch and ensure they throw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Issues identified as good community contribution opportunities
Projects
None yet
Development

No branches or pull requests

5 participants