development
GitHub Actions Self-hosted Runner Is Idle but Jobs Fail to Run
GitHub Actions Self-hosted Runner Is Idle but Jobs Fail to Run
When you set up a self-hosted runner in GitHub Actions, the runner itself usually comes online without much trouble. The more frustrating part starts after that: the runner shows Idle, but once a workflow is assigned, the actual commands inside the job do not run properly.
In many cases, this is not really a networking issue. It is usually a mix of job routing/matching and the runtime environment on the runner machine.
The key point: think in labels, not runner names
One thing that causes confusion early on is the runner name.
When you register a self-hosted runner, you give it a name, but that name is mostly just an identifier for humans.
In practice, GitHub Actions decides where to send a job based on the labels defined in runs-on.
So when working with self-hosted runners, it is better to think of the workflow as matching against labels, not the machine name itself.
| Type | Example |
|---|---|
| Less reliable approach | runs-on: my-server-01 |
| Recommended approach | runs-on: [self-hosted, linux, build] |
In other words, an Idle runner only tells you that the runner is connected and waiting. It does not guarantee that the workflow is matched the way you expect, and it definitely does not guarantee that commands will execute correctly once the job starts.
Custom labels make self-hosted runners much easier to manage
At first, the default labels may seem enough. But once you have more than one machine, or different roles for each runner, adding custom labels makes things much easier to understand and maintain.
buildfor build machinesdeployfor deployment machinesdockerfor machines that can run Docker commandsgpufor machines with GPU workloads
For example, a build-only runner can be targeted like this:
runs-on: [self-hosted, linux, build]
If the job also needs Docker, you can be more explicit:
runs-on: [self-hosted, linux, docker, build]
This makes job routing much clearer and helps avoid the situation where a workflow lands on the wrong machine.
If the runner is Idle but commands do not work
When you get to this stage, the runner itself is usually not the real problem. More often, the job is assigned successfully, but the environment inside the runner machine is not ready for the command being executed.
1. PATH issues
This is one of the most common problems.
A command works fine when you SSH into the server and run it manually,
but fails inside GitHub Actions with messages like docker: command not found or
node: command not found.
The reason is simple: the environment used by the runner is not always the same as your interactive shell. A binary that exists on the machine may still not be available in the runner’s PATH.
A simple debug step like this can save a lot of time:
- name: Debug runner environment
run: |
whoami
pwd
echo "$PATH"
which docker || true
which node || true
which npm || true
If which docker or which node returns nothing,
then the command is either not installed where you think it is, or it is not visible in the runner environment.
2. working-directory and path problems
Another common issue is the current directory. The command may have worked locally because you were already inside the right folder, but GitHub Actions may be running it from somewhere else.
- run: npm install
working-directory: ./app
If the ./app directory does not actually exist at runtime,
the step will fail right away.
This often ends up showing as exit code 1 or exit code 2.
The quickest way to verify this is to check:
pwdto confirm the current directoryls -alto inspect the actual file structure
It is also worth checking whether the step is trying to use a path before actions/checkout has even run.
3. Permission problems
Sometimes the script is there, but it is simply not executable.
This shows up often with files like ./gradlew, ./mvnw, or ./deploy.sh.
chmod +x gradlew
chmod +x deploy.sh
If the machine was set up recently, or file permissions got changed somewhere along the way, the runner may look healthy while the actual step fails immediately.
4. Missing environment variables
If the same command works in your terminal but fails in Actions, missing environment variables are another likely cause.
Things like JAVA_HOME, NODE_ENV, internal API endpoints,
or authentication tokens may already exist in your normal shell session,
but not inside the runner process.
env:
NODE_ENV: production
JAVA_HOME: /usr/lib/jvm/java-17-openjdk
In general, it is safer to define the values your workflow needs explicitly, rather than assuming the machine will always provide them.
How I usually interpret exit code 1 and 2
Exit code 1
This is the most general kind of failure. It can mean a missing file, a permission issue, a missing environment variable, or even that the application itself failed after starting correctly.
- Check whether the file or directory really exists
- Check execute permissions
- Check whether required environment variables are present
- Check whether the underlying program failed internally
Exit code 2
I usually treat this as a command usage or shell issue first. It often points to a bad argument, a broken path, or a shell syntax problem.
- Check whether the command options or arguments are valid
- Check whether the target directory actually exists
- Check whether
working-directoryis correct - Check for shell syntax mistakes in the script
The most important thing is not the number itself, but which step failed and what command was running at the time.
A minimal checklist I use first
- Make sure
runs-onis based on labels, not on a machine name pattern - Check
pwdandls -alto confirm the current path and files - Check
echo $PATH,which docker, andwhich node - Check execute permissions on scripts
- Check whether required environment variables are explicitly defined
Final thoughts
The most important thing to remember with self-hosted runners is this: GitHub Actions matches jobs based on labels, not on the runner name.
And once the runner reaches Idle status, the next question is no longer “is it connected?” but rather “can this machine actually run the commands this job needs?”
In my experience, most failures at this point come down to one of four causes:
- PATH issues
- directory or working-directory problems
- permission problems
- missing environment variables
So if your self-hosted runner is already Idle but steps still fail, the fastest path is usually to check things in this order: label matching → current directory → PATH → permissions → environment variables.
Copied to clipboard.