Amazon Linux 2023 Troubleshooting on AWS¶

Common issues and solutions when using Kryden Solutions CIS Level 2 + STIG-hardened Amazon Linux 2023 AMIs on AWS.

SSH Connection Issues¶

Cannot Connect via SSH¶

Symptoms: Connection timeout or refused

Possible Causes:

Security Group - Ensure port 22 is open from your IP
Network ACL - Check VPC network ACLs allow SSH
Instance not running - Verify instance state in EC2 console

Solutions:

# Verify security group allows your IP
aws ec2 describe-security-groups --group-ids sg-xxxxx

# Check instance status
aws ec2 describe-instance-status --instance-ids i-xxxxx

Permission Denied (publickey)¶

Symptoms: Permission denied (publickey) error

Possible Causes:

Wrong SSH key
Wrong username (must use ec2-user)
Incorrect key permissions

Solutions:

# Ensure correct permissions on your key
chmod 600 /path/to/your-key.pem

# Connect with verbose output
ssh -v -i /path/to/your-key.pem ec2-user@<ip>

Session Disconnects After Inactivity¶

Symptoms: SSH session terminates after ~10 minutes of no activity

Cause:

The AMI enforces a 10-minute idle timeout via TMOUT=600 (set in /etc/profile.d/tmout.sh). This is a CIS Level 2 requirement and is read-only (readonly TMOUT).

Solutions:

# Option 1: Use tmux for long-running sessions
sudo dnf install -y tmux
tmux new -s mysession

# Option 2: Use screen
sudo dnf install -y screen
screen -S mysession

# Option 3: Send keepalive from your SSH client
# Add to ~/.ssh/config on your local machine:
# ServerAliveInterval 60
# ServerAliveCountMax 9

Boot Issues¶

Instance Stuck Waiting for Devices (NVMe/Xen Mismatch)¶

Symptoms:

Instance never passes status checks
System log shows: A start job is running for dev-nvme0n1p2.device
System log shows: Timed out waiting for device dev-nvme0n1p2.device

Cause:

You launched the AMI on a Xen-based instance type (t2, m4, c4, r4). These AMIs require Nitro-based instances (t3, m5, c5, r5, t4g, m6g, etc.).

Solution:

Terminate the stuck instance
Launch a new instance using a Nitro-based instance type
See Supported Instance Types for the full list

How to Identify Instance Type

In the system log, look for Hypervisor detected::

Hypervisor detected: Xen HVM → Wrong instance type (Xen-based)
Hypervisor detected: KVM → Correct instance type (Nitro-based)

Cloud-init Failures¶

Check cloud-init logs:

sudo cat /var/log/cloud-init-output.log
sudo cat /var/log/cloud-init.log

Firewall Issues¶

Application Cannot Accept Connections¶

Symptoms: Your application is running but connections are refused or time out from outside the instance

Cause:

The AMI uses firewalld with the default zone set to drop. All inbound traffic is blocked unless explicitly allowed. Only SSH (port 22) is pre-configured.

Solution:

# Check current firewall rules
sudo firewall-cmd --list-all --zone=drop

# Open a specific port permanently
sudo firewall-cmd --permanent --zone=drop --add-port=8080/tcp
sudo firewall-cmd --reload

# Open a named service (e.g., http, https, postgresql)
sudo firewall-cmd --permanent --zone=drop --add-service=http
sudo firewall-cmd --permanent --zone=drop --add-service=https
sudo firewall-cmd --reload

# Verify the rule was applied
sudo firewall-cmd --list-all --zone=drop

Warning

Always add rules to the drop zone (the default). Rules added to other zones will not apply to incoming traffic on the primary interface.

Loopback Traffic Blocked¶

Symptoms: Application connecting to 127.0.0.1 or ::1 (localhost) fails

Cause:

Loopback traffic is routed through the trusted zone (fully permitted). If you're seeing loopback connection issues, check that the lo interface is assigned to the trusted zone:

sudo firewall-cmd --get-active-zones

Expected output should show lo under trusted.

Service Issues¶

Service Won't Start¶

# Check service status
sudo systemctl status <service>

# View service logs
sudo journalctl -u <service> -n 50

# Check SELinux denials
sudo ausearch -m AVC -ts recent | grep <service>

SELinux Blocking Application¶

# Find the denial
sudo ausearch -m AVC -ts recent

# Generate a policy module (if appropriate)
sudo ausearch -m AVC -ts recent | audit2allow -M myapp
sudo semodule -i myapp.pp

Warning

Only create custom SELinux policies if you understand the security implications.

File Permission Issues (umask 027)¶

Symptoms: Files created by your application are not readable by other users or processes

Cause:

The AMI enforces umask 027, which means newly created files get 640 permissions (owner read/write, group read only, no world access) instead of the typical 644. Directories get 750 instead of 755.

Solutions:

# Fix permissions on existing files
chmod 644 /path/to/file
chmod 755 /path/to/directory

# Or add the user to the owning group for read access
sudo usermod -aG <group> <user>

Container Workloads¶

Container Networking Issues¶

The AMI has IP forwarding and IPv6 forwarding enabled to support container runtimes (Podman, Kubernetes/k3s, etc.). If you encounter container networking issues:

# Verify IP forwarding is enabled
sysctl net.ipv4.ip_forward
# Expected: net.ipv4.ip_forward = 1

# Check Podman/Docker service status
sudo systemctl status podman

BPF / eBPF Tools (Cilium, Falco, etc.)¶

Unprivileged user namespaces and BPF access are enabled on this AMI to support security tools and CNI plugins that require them. No additional configuration is needed.

AWS-Specific Issues¶

Instance Metadata Service¶

If applications cannot access instance metadata:

# Check IMDSv2 token
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/

Note

IMDSv2 is enforced on this AMI. Applications that use IMDSv1 (unauthenticated metadata requests) will receive a 401 response. Update your application or SDK to use IMDSv2 token-based requests.

EBS Volume Issues¶

# Check disk space
df -h

# List block devices
lsblk

# On Nitro instances, devices are /dev/nvme*
sudo xfs_repair -n /dev/nvme1n1p1

Device Names on Nitro

On Nitro-based instances, EBS volumes appear as NVMe devices:

Root volume: /dev/nvme0n1
Additional volumes: /dev/nvme1n1, /dev/nvme2n1, etc.