Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions crowdsec-docs/unversioned/troubleshooting/log_processor_offline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
title: Log Processor Offline
id: log_processor_offline
---

When the Console or a notification rule reports **Log Processor Offline**, the local agent has not checked in with the Local API (LAPI) for more than 24 hours. The alert is different from **Log Processor No Alert**, which only means logs were parsed but no scenarios fired. Use the sections below to identify why the heartbeat stopped and how to bring the agent back online.

## Common Root Causes & Diagnostics

### Service stopped or stuck

- Confirm the service state on the host:

```bash
sudo systemctl status crowdsec
sudo journalctl -u crowdsec -n 50
```

- For containerised deployments, verify the workload is still running:

```bash
docker ps --filter name=crowdsec
kubectl get pods -n crowdsec
```

- On the LAPI node, run `sudo cscli machines list` and check whether the `Last Update` column is older than 24 hours for the affected machine.

### Machine not validated or credentials revoked

- `sudo cscli machines list` on the LAPI shows the machine in `PENDING` state or missing entirely.
- On the agent host, ensure `/etc/crowdsec/local_api_credentials.yaml` exists and contains the expected login and password.
- If you recently reinstalled or renamed the machine, it must be re-validated. See [Machines management](/u/user_guides/machines_mgmt) for details.

### Local API unreachable

- From the agent, run:

```bash
sudo cscli lapi status
```

Errors such as `401 Unauthorized`, TLS failures, or connection timeouts indicate an authentication or network issue.

- Verify the API endpoint declared in `/etc/crowdsec/config.yaml` (`api.client.credentials_path`, `url`, `ca_cert`, `insecure_skip_verify`) matches your LAPI setup. Refer to [Local API configuration](/docs/local_api/configuration) and [TLS authentication](/docs/local_api/tls_auth) if certificates changed.
- Confirm the network path between the agent and the LAPI host is open (default port `8080/TCP`). Firewalls or reverse proxies introduced after installation commonly block the heartbeat.

### Local API unavailable

- If several agents show as offline simultaneously, the LAPI service might be down. Check its status on the LAPI machine:

```bash
sudo systemctl status crowdsec
sudo journalctl -u crowdsec -n 50
```

- Inspect `/var/log/crowdsec/` (or container logs) for database or authentication errors that prevent the LAPI from responding.
- Use `sudo cscli metrics show engine` on the LAPI to confirm it is still ingesting events from other agents. See the [Health Check guide](/u/getting_started/health_check) for additional diagnostics.

## Recovery Actions

### Restart the Log Processor service

- Systemd:

```bash
sudo systemctl restart crowdsec
```

- Docker:

```bash
docker restart crowdsec
```

- Kubernetes:

```bash
kubectl rollout restart deployment/crowdsec -n crowdsec
```

After the restart, re-run `sudo cscli machines list` on the LAPI to confirm the `Last Update` timestamp is refreshed.

### Validate or re-register the machine

#### Using credentials

:::info
More suitable for single machine setups.
:::

- To regenerate credentials directly on the LAPI host when the agent runs locally, run:

```bash
sudo cscli machines add -a
```

#### Using registration system

:::info
Registration system is more suitable for distributed setups.
:::



- Approve pending machines on the LAPI:

```bash
sudo cscli machines validate <machine_name>
```

- If credentials were removed or the agent was rebuilt, re-register it against the LAPI:

```bash
sudo cscli lapi register --url http://<lapi_host>:8080 --machine <machine_name>
sudo systemctl restart crowdsec
```

Update the `--url` to match your deployment. Auto-registration tokens are covered in [Machines management](/u/user_guides/machines_mgmt#machine-auto-validation).

### Restore connectivity to the Local API

- Open the required port on firewalls or security groups and verify with:

```bash
nc -zv <lapi_host> 8080
```

- If TLS certificates were renewed, update the agent trust store (`ca_cert`) or temporarily enable `insecure_skip_verify: true` for testing. Follow the hardening recommendations in [TLS authentication](/docs/local_api/tls_auth).
- When using proxies or load balancers, ensure they forward HTTP headers and TLS material expected by the LAPI.

### Stabilise the Local API

- Restart the LAPI service or pod if it was unresponsive:

```bash
sudo systemctl restart crowdsec
kubectl rollout restart deployment/crowdsec-lapi -n crowdsec
```

- Run `sudo cscli support dump` to collect diagnostics if the LAPI repeatedly crashes or loses database access. Review the resulting archive for database connectivity errors and consult the [Security Engine troubleshooting guide](/u/troubleshooting/security_engine) when escalation is required.

Once the heartbeat is restored, the Console alert clears automatically during the next polling cycle. Consider adding a [notification rule](/u/console/notification_integrations/rule) for **Log Processor Offline** so you are alerted promptly when it happens again.
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: Security Engine Offline
id: security_engine_offline
---

The **Security Engine Offline** alert appears in the Console and notification integrations when an enrolled engine has not reported or logged in to CrowdSec for more than 48 hours. This usually means the core `crowdsec` service (Log Processor + Local API) has stopped working or communicating with our infrastructure.

## Common Root Causes & Diagnostics

### Host or service down

- Check that the `crowdsec` service is running:

```bash
sudo systemctl status crowdsec
sudo journalctl -u crowdsec -n 50
```

- For container or Kubernetes deployments, confirm the workload is still healthy:

```bash
docker ps --filter name=crowdsec
kubectl get pods -n crowdsec
```

- If the host itself is unreachable (hypervisor, VM, or cloud instance down), the Console cannot receive a heartbeat and marks the engine offline.

### Enrollment revoked or pending

- On the engine, run `sudo cscli console status` to verify it is still enrolled and accepted.
- In the Console, visit **Security Engines** and confirm the engine is not archived or removed. Follow [Pending Security Engines](/u/console/security_engines/pending_security_engines) if it shows as waiting for approval.
- Review `/etc/crowdsec/console.yaml` for disabled options (`console_management`, `custom`, `tainted`, `context`) that may prevent expected data from being sent.

### Console connectivity issues

- `sudo cscli console status` may show errors such as `permission denied`, `unable to reach console`, or TLS failures. Inspect `/var/log/crowdsec/crowdsec.log` (or container stdout) for more details.
- Ensure outbound access to the CrowdSec Console endpoints listed in [Network management](/docs/configuration/network_management). Firewalls or proxy changes often block the HTTPS calls required for heartbeats.
- Verify system time is synced (via NTP). Large clock drifts can invalidate console tokens.

### Local API unavailable

- If the Local API is stopped, the Security Engine cannot gather or forward alerts. Check its status on the same host:

```bash
sudo cscli machines list
sudo cscli metrics show engine
```

- Errors in `/var/log/crowdsec/local_api.log` regarding database connectivity or TLS indicate the Local API is not processing alerts, which will in turn stop console updates. Refer to [Security Engine troubleshooting](/u/troubleshooting/security_engine) and [Log Processor Offline](/u/troubleshooting/log_processor_offline) if needed.

## Recovery Actions

### Restart the Security Engine service

- Systemd:

```bash
sudo systemctl restart crowdsec
```

- Docker:

```bash
docker restart crowdsec
```

- Kubernetes:

```bash
kubectl rollout restart deployment/crowdsec -n crowdsec
```

After restarting, re-run `sudo cscli console status` to ensure the heartbeat is restored.

### Re-enroll the engine in the Console

- If the engine was removed or enrollment expired, obtain a fresh key from **Settings > Enrollment** in the Console and run:

```bash
sudo cscli console enroll <ENROLLMENT_KEY>
sudo systemctl restart crowdsec
```

- When replacing an existing enrollment, append `--overwrite` so the Console updates the existing record.
- Confirm the engine appears as **Healthy** in the Console after the restart.

### Restore connectivity to the Console

- Check that you can access crowdsec services and APIs listed in [network management](https://doc.crowdsec.net/docs/next/configuration/network_management/)
- If a proxy is required, configure it in `/etc/crowdsec/config.yaml` under `common.http_proxies` and reload the service.
- Renew TLS trust stores if the host cannot validate the Console certificate chain.

### Stabilise the Local API

- Restart the Local API component (same `crowdsec` service or the dedicated LAPI pod) and confirm it responds to local commands:

- Investigate persistent database or authentication errors using `sudo cscli support dump`, then consult the [Security Engine troubleshooting guide](/u/troubleshooting/security_engine) if issues remain.

Once the engine resumes contact, the Console clears the **Security Engine Offline** alert during the next poll. Consider enabling the **Security Engine Offline** notification in your preferred integration so future outages are caught quickly.