Amazon Linux 2023 Troubleshooting on AWS¶
Common issues and solutions when using Kryden Solutions CIS Level 2 + STIG-hardened Amazon Linux 2023 AMIs on AWS.
SSH Connection Issues¶
Cannot Connect via SSH¶
Symptoms: Connection timeout or refused
Possible Causes:
- Security Group - Ensure port 22 is open from your IP
- Network ACL - Check VPC network ACLs allow SSH
- Instance not running - Verify instance state in EC2 console
Solutions:
# Verify security group allows your IP
aws ec2 describe-security-groups --group-ids sg-xxxxx
# Check instance status
aws ec2 describe-instance-status --instance-ids i-xxxxx
Permission Denied (publickey)¶
Symptoms: Permission denied (publickey) error
Possible Causes:
- Wrong SSH key
- Wrong username (must use
ec2-user) - Incorrect key permissions
Solutions:
# Ensure correct permissions on your key
chmod 600 /path/to/your-key.pem
# Connect with verbose output
ssh -v -i /path/to/your-key.pem ec2-user@<ip>
Session Disconnects After Inactivity¶
Symptoms: SSH session terminates after ~10 minutes of no activity
Cause:
The AMI enforces a 10-minute idle timeout via TMOUT=600 (set in /etc/profile.d/tmout.sh). This is a CIS Level 2 requirement and is read-only (readonly TMOUT).
Solutions:
# Option 1: Use tmux for long-running sessions
sudo dnf install -y tmux
tmux new -s mysession
# Option 2: Use screen
sudo dnf install -y screen
screen -S mysession
# Option 3: Send keepalive from your SSH client
# Add to ~/.ssh/config on your local machine:
# ServerAliveInterval 60
# ServerAliveCountMax 9
Boot Issues¶
Instance Stuck Waiting for Devices (NVMe/Xen Mismatch)¶
Symptoms:
- Instance never passes status checks
- System log shows:
A start job is running for dev-nvme0n1p2.device - System log shows:
Timed out waiting for device dev-nvme0n1p2.device
Cause:
You launched the AMI on a Xen-based instance type (t2, m4, c4, r4). These AMIs require Nitro-based instances (t3, m5, c5, r5, t4g, m6g, etc.).
Solution:
- Terminate the stuck instance
- Launch a new instance using a Nitro-based instance type
- See Supported Instance Types for the full list
How to Identify Instance Type
In the system log, look for Hypervisor detected::
Hypervisor detected: Xen HVM→ Wrong instance type (Xen-based)Hypervisor detected: KVM→ Correct instance type (Nitro-based)
Cloud-init Failures¶
Check cloud-init logs:
Firewall Issues¶
Application Cannot Accept Connections¶
Symptoms: Your application is running but connections are refused or time out from outside the instance
Cause:
The AMI uses firewalld with the default zone set to drop. All inbound traffic is blocked unless explicitly allowed. Only SSH (port 22) is pre-configured.
Solution:
# Check current firewall rules
sudo firewall-cmd --list-all --zone=drop
# Open a specific port permanently
sudo firewall-cmd --permanent --zone=drop --add-port=8080/tcp
sudo firewall-cmd --reload
# Open a named service (e.g., http, https, postgresql)
sudo firewall-cmd --permanent --zone=drop --add-service=http
sudo firewall-cmd --permanent --zone=drop --add-service=https
sudo firewall-cmd --reload
# Verify the rule was applied
sudo firewall-cmd --list-all --zone=drop
Warning
Always add rules to the drop zone (the default). Rules added to other zones will not apply to incoming traffic on the primary interface.
Loopback Traffic Blocked¶
Symptoms: Application connecting to 127.0.0.1 or ::1 (localhost) fails
Cause:
Loopback traffic is routed through the trusted zone (fully permitted). If you're seeing loopback connection issues, check that the lo interface is assigned to the trusted zone:
Expected output should show lo under trusted.
Service Issues¶
Service Won't Start¶
# Check service status
sudo systemctl status <service>
# View service logs
sudo journalctl -u <service> -n 50
# Check SELinux denials
sudo ausearch -m AVC -ts recent | grep <service>
SELinux Blocking Application¶
# Find the denial
sudo ausearch -m AVC -ts recent
# Generate a policy module (if appropriate)
sudo ausearch -m AVC -ts recent | audit2allow -M myapp
sudo semodule -i myapp.pp
Warning
Only create custom SELinux policies if you understand the security implications.
File Permission Issues (umask 027)¶
Symptoms: Files created by your application are not readable by other users or processes
Cause:
The AMI enforces umask 027, which means newly created files get 640 permissions (owner read/write, group read only, no world access) instead of the typical 644. Directories get 750 instead of 755.
Solutions:
# Fix permissions on existing files
chmod 644 /path/to/file
chmod 755 /path/to/directory
# Or add the user to the owning group for read access
sudo usermod -aG <group> <user>
Container Workloads¶
Container Networking Issues¶
The AMI has IP forwarding and IPv6 forwarding enabled to support container runtimes (Podman, Kubernetes/k3s, etc.). If you encounter container networking issues:
# Verify IP forwarding is enabled
sysctl net.ipv4.ip_forward
# Expected: net.ipv4.ip_forward = 1
# Check Podman/Docker service status
sudo systemctl status podman
BPF / eBPF Tools (Cilium, Falco, etc.)¶
Unprivileged user namespaces and BPF access are enabled on this AMI to support security tools and CNI plugins that require them. No additional configuration is needed.
AWS-Specific Issues¶
Instance Metadata Service¶
If applications cannot access instance metadata:
# Check IMDSv2 token
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/
Note
IMDSv2 is enforced on this AMI. Applications that use IMDSv1 (unauthenticated metadata requests) will receive a 401 response. Update your application or SDK to use IMDSv2 token-based requests.
EBS Volume Issues¶
# Check disk space
df -h
# List block devices
lsblk
# On Nitro instances, devices are /dev/nvme*
sudo xfs_repair -n /dev/nvme1n1p1
Device Names on Nitro
On Nitro-based instances, EBS volumes appear as NVMe devices:
- Root volume:
/dev/nvme0n1 - Additional volumes:
/dev/nvme1n1,/dev/nvme2n1, etc.