Ubuntu Server Crash Troubleshooting Guide

If you’re experiencing unexpected shutdowns, freezes, or performance issues, the steps below will teach you how to identify and resolve the most common issues using logs and diagnostic tools.

Ubuntu Server Crash Troubleshooting Guide

Introduction

This guide is designed to help you diagnose and troubleshoot crashes on Ubuntu Server. If you’re experiencing unexpected shutdowns, freezes, or performance issues, the steps below will teach you how to identify and resolve the most common problems using logs and diagnostic tools.

By the end of this guide, you’ll know:

  • Where to find system logs

  • How to read logs to identify problems

  • Basic troubleshooting steps for common issues


Chapter 1: Understanding Ubuntu Server Crashes

Common Causes of Crashes

Here are some common causes of crashes on Ubuntu Server:

  • Insufficient memory (RAM): Your server may run out of memory, especially if too many processes are running.

  • Disk space issues: Low disk space can cause crashes when the server cannot write critical data.

  • Driver issues: Although servers usually have fewer driver-related issues than desktops, outdated or incompatible hardware drivers can still cause crashes.

  • Kernel panics: The kernel (the operating system's core) may crash due to misconfiguration, software bugs, or hardware failures.


Chapter 2: How to Check Logs

Where to Find Logs

Ubuntu Server stores detailed logs of system activities that are essential for troubleshooting. These logs are located in the /var/log/ directory. Critical log files to check include:

  • /var/log/syslog: The primary system log. It logs boot messages, service events, and general system errors.

  • /var/log/kern.log: Logs related to the Linux kernel. This is useful for diagnosing kernel panics and hardware-related issues.

  • /var/log/dmesg: Logs system boot and hardware information. Use this to check for hardware or startup issues.

  • /var/log/auth.log: Tracks authentication events like login attempts, which can be helpful if you suspect security-related crashes.

Example: Checking System Logs

If your server crashes or becomes unresponsive, the first step is to check the logs. Use the following command to view the latest system logs:

sudo tail -f /var/log/syslog

This will display recent log entries. Pay attention to any errors or warnings near the time the crash occurred. For example, an entry like this could indicate a memory issue:

Oct 15 14:55:23 myserver kernel: [12345.678901] Out of memory: Kill process 1234 (nginx) score 987 or sacrifice child

In this case, the server ran out of memory, and the kernel had to terminate a process to recover.


Chapter 3: Basic Troubleshooting Steps

1. Check Available Disk Space

Running out of disk space can cause severe issues for a server. To check available disk space, use the following command:

df -h

Look for any partitions that are close to 100% usage. If your root partition (/) is complete, consider clearing space by removing unnecessary files or moving data to another storage location.

2. Check for Memory Issues

If your server is low on RAM, it can slow down or crash entirely. Use this command to check memory usage:

free -h

Look at the "available" memory column. If this number is deficient, your server may run out of memory. Consider closing resource-heavy processes or upgrading the server's RAM.

3. Check for Kernel Panics

Kernel panics can cause the server to crash or reboot unexpectedly. To diagnose a kernel panic, check the kernel logs with the following command:

dmesg | grep -i panic

Example output:

[12345.678901] Kernel panic - not syncing: Fatal exception

This means the kernel encountered a critical issue and halted the system. Kernel panics can indicate serious hardware problems or kernel misconfigurations.

4. Verify Software and Drivers

On servers, driver-related crashes are less common, but they can still happen. Ensure your server is up-to-date with the latest software and driver updates by running:

sudo apt update
sudo apt upgrade

5. Check for Application-Specific Crashes

Sometimes, only a specific application might crash. For example, if your web server (like NGINX or Apache) crashes frequently, check its logs for clues. Logs for most applications are stored in the /var/log/ directory or a subdirectory (e.g., /var/log/nginx/error.log for NGINX).

You can use the following command to check the most recent lines of an application log:

sudo tail -f /var/log/nginx/error.log

Look for errors related to the time when the crash occurred.

Chapter 4: Log Rotation and Wiped Logs

Log Rotation

Ubuntu Server uses a process called log rotation to prevent logs from growing too large and consuming too much disk space. Log rotation works by archiving old logs and creating new ones. Archived logs may be compressed and stored with extensions like .gz. The most recent logs will be in their regular form (e.g., syslog, kern.log), while older logs will be named something like syslog.1 or syslog.1.gz.

How to View Older Logs

To view older logs, you can either open the archived log files directly or use zcat to read compressed logs. For example:

sudo zcat /var/log/syslog.1.gz

This command will display the contents of the compressed log file.

Logs After Reboot

After a reboot, some logs (like /var/log/dmesg) may be wiped, as they are stored in memory and not persisted to disk. However, the system log (/var/log/syslog) will persist across reboots.

To check boot-specific logs or events, use the following command to view logs from the previous boot:

journalctl -b -1

This command tells journalctl to show logs from the previous boot (-b -1), which can be helpful if a crash occurred before the server was restarted.

Why Logs May Disappear

  • Reboots: Certain logs, such as dmesg, are wiped on reboot unless configured otherwise.

  • Log Rotation: Logs may be archived out of the active directory. If logs are older, they may be compressed.

  • Log Settings: The logrotate configuration determines how often logs are rotated and how long they are kept. The configuration file for log rotation is located at /etc/logrotate.conf and /etc/logrotate.d/.

If important logs seem to be missing after a crash, reviewing log rotation settings or ensuring persistent logging is enabled may help.



Conclusion

Understanding how to access logs and perform basic diagnostics can help you troubleshoot Ubuntu Server crashes effectively. Monitoring memory usage, disk space, and system logs will also help you prevent and resolve many of the most common issues.