CH11: Linux Forensics

Chapter Overview

Chapters 4 through 10 built an investigative toolkit grounded almost entirely in Windows. You learned to parse the Registry for user activity, extract browser artifacts from SQLite databases, capture volatile memory with specialized tools, trace email headers through SMTP infrastructure, and detect anti-forensic techniques designed to undermine every phase of the forensic lifecycle. That focus on Windows reflects reality: it dominates the corporate desktop and personal computing landscape, and the majority of endpoint investigations begin and end on a Windows machine.

This chapter confronts the other half of the equation. The infrastructure that powers the modern internet runs overwhelmingly on Linux. Web servers, cloud instances, containerized applications, database backends, network appliances, IoT devices, and NAS systems all rely on some variant of the Linux kernel. When an investigation leads to a compromised web server hosting a phishing kit, a cloud-based command-and-control node, an employee's Raspberry Pi plugged into the corporate network, or a suspect's Network Attached Storage device, the investigator must be able to navigate the system, locate its artifacts, and interpret the evidence it holds.

This chapter serves two purposes. First, it provides a grounded introduction to how Linux works, using your existing Windows knowledge as a bridge. You will learn the file system hierarchy, the permissions model, the ext4 file system and its inode structure, and the core commands that every investigator needs. Second, it introduces the forensic artifacts unique to Linux systems: log structures, user activity traces, configuration files, persistence mechanisms, and the purpose-built forensic distributions that analysts use to conduct examinations.

No prior Linux experience is assumed. By the end of this chapter, you will have the foundational knowledge needed to approach a Linux system with the same confidence you now bring to a Windows investigation.

Learning Objectives

By the end of this chapter, you will be able to:

Explain why Linux forensics is an essential skill for digital forensic investigators and identify the types of systems and investigations where Linux evidence is commonly encountered.
Navigate a Linux file system using command-line tools and map the Linux directory hierarchy to equivalent Windows structures.
Interpret Linux timestamps (MAC times) and explain their forensic significance.
Describe how the ext4 file system uses inodes to track files and explain how file deletion on ext4 differs from NTFS, including the implications for data recovery.
Identify key forensic artifacts on Linux systems, including log files, user history, configuration files, scheduled tasks, and persistence mechanisms, and describe what each reveals to an investigator.
Describe the purpose and capabilities of major Linux-based forensic distributions (SIFT, Kali, Tsurugi, CAINE) and explain when each is best suited for use.

11.1 Why Linux Forensics Matters

Where Linux Lives in Investigations

If you have used the internet today, you have interacted with Linux. The majority of the world's web servers run on Linux. Most cloud infrastructure, from Amazon Web Services EC2 instances to Microsoft Azure virtual machines to Google Cloud Platform compute nodes, defaults to Linux-based operating systems. The containers that power modern application deployment (Docker, Kubernetes) run on a Linux kernel. The routers, firewalls, and intrusion detection sensors that protect enterprise networks frequently run embedded Linux. Smart home devices, security cameras, point-of-sale terminals, and medical equipment all increasingly rely on Linux-based firmware.

For investigators, this means that Linux evidence appears in a wide range of case types:

Server compromises: A company's Apache or Nginx web server is breached and used to host malware or a phishing kit. The access logs, error logs, and web shell artifacts all reside on a Linux file system.
Cloud-based investigations: A threat actor spins up an AWS EC2 instance to serve as a command-and-control (C2) server for a botnet. The forensic image of that instance is a Linux disk.
Insider threats: An employee connects a Raspberry Pi to the corporate network to exfiltrate data. The device runs Raspberry Pi OS, a Debian-based Linux distribution.
IoT and embedded devices: A Network Attached Storage (NAS) device contains evidence relevant to an intellectual property theft case. Its operating system is a customized Linux build.
Criminal infrastructure: Dark web marketplaces, bulletproof hosting providers, and ransomware C2 panels overwhelmingly operate on Linux servers.

The common thread is that even investigations that begin on a Windows endpoint frequently lead to Linux infrastructure. An investigator who can only work in Windows will eventually hit a wall.

The Linux Forensic Workstation

Linux is not only a source of evidence. It is also the platform on which many of the most powerful forensic tools are built and run. The Sleuth Kit, Autopsy, Volatility, log2timeline (Plaso), and bulk_extractor all have deep roots in the Linux ecosystem. The SANS SIFT Workstation, one of the most widely used forensic environments in the field, is an Ubuntu-based Linux distribution.

This creates a dual relationship that is unique to Linux forensics: the same operating system you are learning to examine is the operating system you will use to examine it. The command-line skills you develop in this chapter are directly transferable to your forensic workstation workflow.

Tip

You can find more information and download the .ova virtual machine image of SANS SIFT Workstation at https://www.sans.org/tools/sift-workstation -- I recommend downloading VMware Workstation Pro to run the VM.

Windows vs. Linux: A Conceptual Bridge

Many of the concepts you learned in previous chapters have direct counterparts in Linux. The underlying principles of how an operating system manages files, users, processes, and logs are consistent across platforms. What changes is the implementation and the location. The following table maps the Windows concepts you already know to their Linux equivalents.

Windows Concept	Linux Equivalent	Key Difference
Registry (HKLM, HKCU)	Configuration files in /etc/ and ~/.	Linux stores settings in human-readable text files, not a binary database
Event Viewer / .evtx logs	syslog, journald (/var/log/)	Linux logs are primarily text-based (or queryable via journalctl)
Task Manager	`ps`, `top`, `htop`	Linux process management is command-line driven
File Explorer	Terminal (`ls`, `cd`, `find`)	GUI file managers exist, but investigators work in the terminal
PowerShell / CMD	Bash shell	Bash is the default shell on most Linux distributions
NTFS	ext4 (and others: XFS, Btrfs)	ext4 uses inodes instead of MFT entries
Windows Services	systemd units / daemons	Services are defined in unit files, not the Registry
Recycle Bin	No equivalent (CLI `rm` is permanent)	Desktop environments have a Trash folder, but CLI deletions are immediate
MFT (Master File Table)	Inode table	Both track file metadata; deletion behavior differs significantly

11.2 Linux Fundamentals for Investigators

The Linux Architecture

At the highest level, a Linux system has three layers. The kernel is the core of the operating system. It manages hardware, memory, processes, and file systems. You do not interact with the kernel directly. Instead, you interact with the shell, which is a command-line interpreter that accepts your typed commands, passes them to the kernel for execution, and returns the results. The default shell on most Linux distributions is Bash (Bourne Again Shell). Above the shell sits the user space, which includes all the applications, utilities, and graphical interfaces (if installed) that you work with day to day.

Vertical layered architecture diagram of the Linux operating system showing four tiers connected by downward-pointing gold arrows. Top layer: 'User Space' containing 'Applications,' 'Utilities,' and 'GUI (if installed),' with a side note 'You interact here →.' Second layer: 'Shell (Bash)' described as 'Command-line interpreter,' with a side note 'Commands passed down →.' Third layer: 'Kernel' managing 'Memory,' 'Processes,' and 'File Systems,' with a side note 'Manages resources →.' Bottom layer: 'Hardware.' Each layer is a wide horizontal bar with a dark slate header and light content area.

A Linux distribution (often shortened to "distro") is a complete operating system built around the Linux kernel. Different organizations package the kernel with different sets of software, desktop environments, package managers, and default configurations. The result is a large ecosystem of distributions, each tailored to specific use cases. The most common distributions you will encounter in forensic work include:

Ubuntu / Debian: Widely used for servers and workstations. Ubuntu is the base for the SANS SIFT forensic distribution. Uses the apt package manager.
CentOS / Red Hat Enterprise Linux (RHEL) / Rocky Linux: Common in enterprise server environments. Uses the yum or dnf package manager.
Fedora: A community-driven distribution often used as a development platform. Also uses dnf.

The distribution matters to investigators for two reasons. First, log file names and locations can vary between distributions (for example, Debian-based systems use /var/log/auth.log while Red Hat-based systems use /var/log/secure). Second, the package manager logs (/var/log/dpkg.log on Debian, /var/log/yum.log on Red Hat) record software installation history, which is a forensic artifact in its own right.

The File System Hierarchy

One of the most fundamental differences between Windows and Linux is how the file system is organized. Windows uses drive letters (C:\, D:) as the top-level entry points. Linux has a single, unified directory tree that begins at the root directory, represented by a forward slash (/). Every file, directory, device, and even running process is accessible somewhere within this tree.

Linux follows the Filesystem Hierarchy Standard (FHS), which defines a common directory layout across distributions. As an investigator, knowing this layout tells you exactly where to look for specific types of evidence.

Directory	Purpose	Forensic Relevance
`/`	Root of the entire file system	Starting point for all examination
`/home`	User home directories (e.g., /home/jsmith)	User documents, bash history, SSH keys, browser profiles, desktop files
`/root`	Home directory for the root (admin) user	Same as /home but for the superuser account
`/etc`	System-wide configuration files	User accounts (/etc/passwd), network config, service settings, sudoers
`/var`	Variable data: logs, caches, spools	Primary log location (/var/log/), web server data, mail spools
`/tmp`	Temporary files (often cleared on reboot)	Attacker staging area, malware drop zone, exploit artifacts
`/dev`	Device files (hard drives, terminals, etc.)	Disk device references for imaging (e.g., /dev/sda)
`/proc`	Virtual filesystem exposing kernel/process data	Running processes, network connections, system info (live systems only)
`/mnt`, `/media`	Mount points for external/removable media	USB drives, mounted disk images, network shares
`/opt`	Optional or third-party software	Non-standard application installations
`/usr`	User programs, libraries, documentation	Installed application binaries and shared libraries
`/bin`, `/sbin`	Essential system binaries	Core commands (ls, cp, mount); /sbin holds admin-only tools

A guiding principle in Linux is that everything is a file. Directories are files. Hardware devices are represented as files in /dev. Running processes are exposed as files in /proc. This unified abstraction means that investigators can use the same file-reading tools to inspect a text configuration, a raw disk device, or process memory.

The ext4 File System and Inodes

The ext4 (Fourth Extended Filesystem) is the default file system on the majority of modern Linux distributions. It fills the same role that NTFS fills on Windows: it is the structure responsible for organizing data on a storage volume, tracking which blocks belong to which files, and maintaining the metadata that describes every file on disk.

In Chapter 4, you learned that NTFS uses the Master File Table (MFT) as its central metadata structure. Every file and directory on an NTFS volume has a corresponding MFT entry that records the file's name, size, timestamps, permissions, and the cluster runs that point to the file's data on disk.

Ext4 uses a different but conceptually parallel structure called an inode (short for "index node"). Every file and directory on an ext4 volume has an inode, and each inode stores the following metadata:

File size
Ownership (User ID and Group ID)
Permissions (read, write, execute for owner, group, and others)
Timestamps (access time, modification time, change time)
Pointers to the data blocks where the file's actual content is stored on disk
Link count (how many directory entries reference this inode)

There is one critical distinction between an inode and an MFT entry that catches many new investigators off guard: the inode does not store the filename. Filenames are stored in directory entries, which are separate structures that map a human-readable name to an inode number. This separation means that a single file can have multiple names (called hard links) that all point to the same inode and the same underlying data.

Each inode has a unique number within the file system. You can view a file's inode number using ls -i and view the full inode metadata using the stat command.

Three-column architecture diagram showing the relationship between Directory Entries, an Inode, and Data Blocks in the ext4 file system. Left column labeled 'Directory Entries' contains two rounded boxes: 'report.pdf → Inode 131074' and 'final_report.pdf → Inode 131074,' with the note '(Two names, same inode = hard link).' A gold arrow points right to the center column labeled 'Inode 131074,' which lists metadata fields: 'Size: 4096 bytes,' 'Owner: jsmith,' 'Permissions: rw-r--r--,' 'mtime: 2025-03-10,' and 'Block Pointers: → → →.' A gold note below the inode reads 'No filename stored here.' A gold arrow points right to the third column labeled 'Data Blocks,' showing three boxes labeled 'Block 1024,' 'Block 1025,' and 'Block 1026' with the caption 'Actual file content on disk.'

How ext4 Handles File Deletion

Understanding how ext4 handles deletion is essential for forensic recovery work, and this is where the comparison to NTFS becomes particularly important.

When you delete a file on an NTFS volume (even bypassing the Recycle Bin), the MFT entry is marked as available for reuse, and the clusters it occupied are marked as free. However, the MFT entry itself often retains the cluster run information, the pointers that tell you exactly where the file's data was stored on disk. This is why NTFS deleted file recovery is relatively straightforward in many cases: the roadmap back to the data frequently survives.

Comparison diagram titled 'NTFS Deletion' on the left and 'ext4 Deletion' on the right, each showing a 'Before' and 'After Deletion' state. NTFS Before: an MFT Entry box containing 'Filename: report.pdf' and 'Cluster Runs: 100-102' points via arrow to three Data Clusters boxes. NTFS After Deletion: the same MFT Entry and Data Clusters appear inside a dashed border labeled 'Marked as available,' with a note 'Pointers often preserved' marked with a gold checkmark and 'Data intact on disk.' ext4 Before: a Directory Entry box containing 'report.pdf → Inode 131074' points to an Inode box containing 'Block Pointers: 1024-1026,' which points to three Data Blocks. ext4 After Deletion: the Directory Entry is removed (marked with a red X and 'Removed'), the Inode shows 'Block Pointers: 0, 0, 0' with a note 'Pointers zeroed out' marked with a red X, and Data Blocks remain with the note 'Data intact but no map to find it.' Bottom summary reads 'NTFS: Follow the surviving pointers | ext4: File carving required.'

Ext4 behaves differently. When a file is deleted using rm:

The directory entry linking the filename to the inode is removed.
The inode is marked as free, and its data block pointers are zeroed out.
The actual data blocks on disk are NOT overwritten. They remain intact until the file system reuses that space for new data.

Step 2 is the critical difference. On NTFS, the pointers from the MFT entry to the data clusters often survive deletion. On ext4, the pointers from the inode to the data blocks are actively cleared. The data is still on disk, but the file system no longer provides a map to find it.

Warning

Unlike NTFS, ext4 zeroes out the block pointers in the inode when a file is deleted. This means traditional metadata-based recovery (following the inode's pointers back to the data) is significantly harder on Linux than on Windows. Investigators working with ext4 evidence should plan for file carving and journal analysis as primary recovery strategies rather than relying on inode-based reconstruction.

The ext4 Journal

Ext4 maintains a journal that logs metadata changes before committing them to the main file system structures. This is conceptually similar to the NTFS $LogFile you encountered in Chapter 4. The journal exists to protect file system integrity: if the system crashes mid-write, the journal allows ext4 to replay or roll back incomplete operations and return to a consistent state.

For investigators, the journal has a secondary value. Because it records metadata changes, the journal can sometimes contain remnants of deleted file metadata, including block pointers that have already been zeroed in the inode itself. Journal analysis is not guaranteed to recover deleted file information, but it provides a partial recovery path that should always be explored.

Recovery Tools and Strategies

Given the challenges of ext4 deletion, investigators rely on several approaches:

File carving (using tools like Foremost, Scalpel, or PhotoRec, which you encountered in Chapter 4) remains highly effective on ext4. Carving operates on raw data blocks and searches for file signatures (magic bytes) regardless of file system metadata. Because the data blocks survive deletion, carving can recover files even when the inode pointers are gone.
Journal analysis using tools such as extundelete or ext4magic can attempt to reconstruct deleted file metadata from journal entries.
The Sleuth Kit (fls, icat, ifind) can enumerate deleted directory entries and, in some cases, recover file content when the data blocks have not yet been reused.
Time is the critical variable. The longer a system runs after a file is deleted, the more likely its data blocks are reused by new files. Rapid acquisition is even more important on ext4 than on NTFS.

Feature	NTFS	ext4
Central metadata structure	MFT (Master File Table)	Inode table
Metadata per file	MFT entry	Inode
Filename storage	Inside the MFT entry	In directory entries (separate from inode)
Data location pointers	Cluster runs in MFT entry	Block pointers in inode
Pointers after deletion	Often preserved in MFT entry	Zeroed out in inode
Transaction log	$LogFile	ext4 journal
Deleted file recovery	Metadata-based recovery often viable	File carving and journal analysis preferred
Key recovery tools	Autopsy, FTK, Recuva	Foremost, Scalpel, extundelete, Sleuth Kit

Timestamps in Linux

Three-row reference diagram explaining Linux MAC timestamps. Top row: 'mtime (Modification Time)' described as 'Last content change,' triggered by 'Write new data to file' and 'Append content,' each shown with a document icon. Middle row: 'atime (Access Time)' described as 'Last read,' triggered by 'Open/read file' and 'cat, less, head,' each with an icon, and a note 'Often limited by relatime/noatime mount options.' Bottom row: 'ctime (Change Time)' described as 'Last metadata change,' triggered by 'chmod (permissions),' 'chown (ownership),' and 'Rename file,' each with an icon, and a highlighted warning note 'NOT creation time.' A footer reads 'View all timestamps with: stat filename.'

Linux tracks three timestamps for every file, commonly referred to as MAC times (not to be confused with Apple Mac computers):

mtime (Modification time): The last time the file's content was changed. Writing new data to the file updates mtime.
atime (Access time): The last time the file was read. Opening or reading a file updates atime (though many modern systems mount with relatime or noatime to reduce disk writes, which limits atime updates).
ctime (Change time): The last time the file's metadata was changed. This includes permission changes, ownership changes, and renaming. Critically, ctime is not a "creation time." Linux ext4 traditionally did not record a creation time at all, although ext4 does support a crtime (creation time) field that is accessible through stat on newer systems.

The stat command displays all timestamps for a file:

$ stat report.pdf
  File: report.pdf
  Size: 4096       Blocks: 8          IO Block: 4096   regular file
  Device: 802h/2050d   Inode: 131074     Links: 1
  Access: (0644/-rw-r--r--)  Uid: ( 1000/ jsmith)  Gid: ( 1000/ analysts)
  Access: 2025-03-15 09:22:14.000000000 -0400
  Modify: 2025-03-10 14:35:02.000000000 -0400
  Change: 2025-03-10 14:35:02.000000000 -0400
   Birth: 2025-03-01 08:15:30.000000000 -0400

Compare this to the Windows timestamp model from Chapter 4. NTFS stores timestamps in the $STANDARD_INFORMATION and $FILE_NAME attributes of each MFT entry. The principle is the same (timestamps track when a file was created, accessed, modified, and when its metadata changed), but the specific names and behaviors differ. In particular, the ctime/mtime distinction on Linux is a frequent source of confusion: changing a file's permissions updates ctime but not mtime, because the content did not change, only the metadata.

11.3 Essential Linux Commands for Forensic Investigators

The Linux command line is the primary interface for forensic work. Unlike Windows, where GUI tools dominate the investigator's workflow, Linux forensic analysis is built around chaining small, purpose-built commands together to search, filter, and analyze data. This section covers the commands you will use most frequently, organized by function.

Activity

The best way to learn Linux is by DOING it. After you have read through the various commands in the upcoming sections, try out various Linux file system navigation and forensic commands in this simulated, hands-on activity.

Screenshot of an interactive forensic exercise titled 'Where Am I?' with a green checkmark icon and a 'COMPLETE' badge. The instruction reads: 'You have just opened a terminal on a forensic workstation. Before doing anything else, confirm your current location in the file system by printing the present working directory.' Below is a dark terminal window showing the prompt 'analyst@sift:~$' with the command 'pwd' entered and the output '/home/analyst.' A green explanation box at the bottom reads: 'You are in your home directory, /home/analyst. Every user has a home directory, and ~ is a shortcut that refers to it.'

These commands let you move through the file system, examine files, and inspect metadata.

pwd (print working directory) displays your current location in the file system. On a mounted forensic image, this helps you confirm that you are examining the correct directory.

ls (list) displays the contents of a directory. The flags you will use most often include -l (long format showing permissions, owner, size, and timestamps), -a (show hidden files, which start with a dot), and -h (human-readable file sizes). In practice, ls -la is the investigator's default command upon entering any directory.

cd (change directory) moves you between directories. Use cd /var/log/ to navigate to the log directory, cd ~ to return to your home directory, and cd .. to move up one level.

cat displays the entire contents of a file. Useful for short files like configuration files. For large files (like multi-gigabyte logs), use less instead, which provides a paginated, scrollable view. head and tail display the first or last N lines of a file, respectively. tail -n 100 /var/log/auth.log is a quick way to check the most recent authentication events.

file identifies a file's type by examining its magic bytes (the file header), not its extension. This is the same concept you encountered in Chapter 10 when detecting file extension mismatch as an anti-forensic technique. Running file suspicious.dat might return suspicious.dat: PHP script, ASCII text, revealing that someone renamed a PHP web shell with a .dat extension.

stat displays detailed file metadata including all three timestamps (atime, mtime, ctime), inode number, permissions, and ownership. This is one of the most important commands for forensic timeline work.

wc (word count) counts lines, words, and characters. wc -l access.log quickly tells you how many entries are in a log file, helping you gauge its size before diving in.

Terminal screenshot showing three uses of the 'file' command in the ~/cases/2024-0392/evidence directory. A green comment reads '# The file command reads magic bytes, not extensions.' First command: 'file report.pdf' returns 'PDF document, version 1.7 (password protected).' Second command: 'file image_backup.jpg' returns in yellow 'gzip compressed data, last modified: Sat Dec 16 03:41:12 2023.' Third command: 'file system_update.sh' returns in green 'ELF 64-bit LSB executable, x86-64, dynamically linked.' Fourth command: 'file notes.txt' returns 'ASCII text, with very long lines (413).' Key findings noted in green comments: 'image_backup.jpg is NOT an image (gzip archive disguised as .jpg),' 'system_update.sh is NOT a script (compiled ELF binary disguised as .sh),' and 'Never trust the extension. Always verify with magic bytes.'

Searching and Filtering

These commands are the backbone of log analysis and evidence discovery on Linux.

find locates files based on criteria such as name, type, size, modification time, and permissions. It is one of the most versatile commands in the investigator's toolkit. Examples:

find /home -name "*.php" finds all PHP files under /home (potential web shells)
find / -mtime -3 finds all files modified in the last 3 days
find / -perm -4000 -type f finds all SUID executables on the system

grep searches for text patterns within files. It supports regular expressions and is essential for extracting specific events from log files. Examples:

grep "Failed password" /var/log/auth.log extracts all failed login attempts
grep -i "error" /var/log/syslog performs a case-insensitive search for errors
grep -r "eval(" /var/www/html/ recursively searches web files for a common PHP web shell indicator

Terminal screenshot showing the command grep "Failed password" /var/log/auth.log | tail -8 run on a SIFT workstation. A green comment reads # Search auth.log for failed SSH login attempts. The output displays eight log entries from Dec 17 between 22:41:03 and 22:41:22, all from IP address 198.51.100.14. The entries show failed password attempts cycling through usernames: first root (two attempts), then admin (two attempts highlighted in yellow), then deploy (four attempts highlighted in magenta). Green comments note: Pattern: attacker cycled root > admin > deploy, Source IP: 198.51.100.14 (document for IOC list), and Rapid-fire attempts every 2 seconds = automated brute force.

Piping (|) and redirection (>, >>) are what make the Linux command line powerful. The pipe operator takes the output of one command and feeds it as input to the next. Redirection sends output to a file. These operators allow you to build analysis chains:

cat /var/log/auth.log | grep "Failed password" | awk '{print $11}' | sort | uniq -c | sort -rn

This single command chain reads the authentication log, extracts failed password entries, pulls out the source IP address (field 11), sorts the results, counts the unique occurrences, and sorts by frequency in descending order. The output is a ranked list of IP addresses that generated the most failed login attempts, which is a brute-force attack indicator.

sort arranges lines alphabetically or numerically. uniq removes adjacent duplicate lines (which is why sort typically precedes it). uniq -c counts the occurrences of each unique line, making it invaluable for frequency analysis. cut extracts specific fields from delimited text. awk is a more powerful field-based text processor that can handle complex parsing tasks.

Analyst Perspective

The command chain shown above is not a theoretical exercise. SOC analysts and forensic investigators build exactly these kinds of pipelines daily to triage log data during incident response. Memorizing individual commands is less important than understanding how to chain them together. Each command does one thing well. Piping lets you assemble them into an analysis workflow that answers specific investigative questions.

Terminal screenshot showing a compound piped command on a SIFT workstation. Green comments read # Objective: Find which IPs made the most failed SSH login attempts and # Pipeline: grep > extract IPs > sort > count unique > rank by frequency. The command is: grep "Failed password" auth.log | awk "{print $(NF-3)}" | sort | uniq -c | sort -rn. The output lists IP addresses ranked by attempt count: 4218 from 198.51.100.14, 892 from 203.0.113.77, 341 from 192.0.2.201, 87 from 198.51.100.55, 34 from 203.0.113.12, 12 from 192.0.2.88, 5 from 10.0.1.200, 3 from 10.0.1.50, 2 from 10.0.1.15, and 1 from 10.0.1.1. Green comments explain: Breakdown: grep filters > awk extracts IP field > sort alphabetizes > uniq -c counts duplicates > sort -rn ranks highest first and Result: 198.51.100.14 leads with 4,218 attempts (primary attacker).

Infographic titled How the Pipeline Works with subtitle Step-by-step breakdown of the compound command. The full command is displayed at top in colored text: grep "Failed password" /var/log/auth.log | awk "{print $(NF-3)}" | sort | uniq -c | sort -rn. Below are five dark cards showing each stage. Card 1, grep — Filter: Extract only lines containing Failed password from auth.log, with sample output showing full log lines. Card 2, awk — Extract: Pull out just the IP address field from each matching line, with output showing raw IPs like 198.51.100.14 and 192.0.2.201. Card 3, sort — Alphabetize: Group identical IPs together so uniq can count them, with output showing IPs in alphabetical order. Card 4, uniq -c — Count: Count consecutive duplicates and prefix each line with the count, with output showing counts like 341 and 4218 next to IPs. Card 5, sort -rn — Rank: Sort numerically in reverse order (highest count first), with final output showing 4218 198.51.100.14 at top, then 892 203.0.113.77, then 341 192.0.2.201.

System Information

These commands reveal details about the system itself, which is essential context for any examination.

Terminal screenshot showing two commands for system identification on a SIFT workstation. A green comment reads # First step: identify the OS, kernel, and architecture of the target system. First command: uname -a returns Linux webserver-prod-03 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux. Green comments annotate key observations: Hostname: webserver-prod-03 (production web server), Kernel: 5.15.0-91-generic (Ubuntu 22.04 LTS series), Architecture: x86_64 (64-bit system), and Build date: Nov 14, 2023 (check if patches are current). Second command: cat /etc/os-release | head -4 returns NAME="Ubuntu", VERSION="22.04.3 LTS (Jammy Jellyfish)", ID=ubuntu, and ID_LIKE=debian.

uname -a displays the kernel version, hostname, architecture, and OS information. This tells you exactly what you are examining.

hostname returns the system's hostname, helping you confirm which machine you are analyzing.

df -h shows disk usage across all mounted file systems. Unusual disk utilization (a nearly full /tmp partition, for example) can indicate data staging.

lsblk lists all block devices (physical drives, partitions, logical volumes) in a tree format. This gives you a quick map of the system's storage layout.

fdisk -l displays the partition table for a specified device, showing partition types, sizes, and starting sectors.

mount (with no arguments) lists all currently mounted file systems and their mount options. On a live system, this shows you what is attached and how.

free -h displays memory usage (total, used, free, cached), providing a snapshot of system resource state.

uptime shows how long the system has been running since its last boot, which tells an investigator when the system was last restarted, a key reference point for timeline analysis.

who and w show currently logged-in users. last reads the /var/log/wtmp file and displays a history of user logins, logouts, and system reboots.

Terminal screenshot showing the 'df -h' command run on a SIFT workstation in the ~/cases/2024-0392 directory. A green comment reads '# Identify mounted filesystems, sizes, and mount points.' The command output displays a table with columns: Filesystem, Size, Used, Avail, Use%, and Mounted on. Entries include: /dev/sda2 (234G, 85%, mounted on /), /dev/sda1 (512M, 61%, /boot), tmpfs (3.9G, 1%, /run), /dev/sdb1 (932G, 70%, /var/www), /dev/sdc1 highlighted in red (466G, 99%, /tmp), and tmpfs (3.9G, 0%, /sys/fs/cgroup). Key observations noted in green comments: '/tmp is 99% full on a separate 466GB drive (suspicious),' '/var/www is on its own partition (web content),' and 'Total storage: ~1.6TB across three drives.'

File Operations and Integrity

cp, mv, and rm copy, move, and remove files. The critical forensic note about rm is that it is permanent. Unlike Windows, where deleted files go to the Recycle Bin (Chapter 4), Linux's rm command removes the directory entry immediately. There is no graphical "undo." Desktop environments (GNOME, KDE) do implement a Trash folder, but most server environments have no graphical desktop, and most administrative and attacker activity happens on the command line.

chmod and chown change file permissions and ownership. On a forensic image, you will not use these to modify evidence, but understanding them helps you interpret permission strings you encounter.

md5sum and sha256sum generate cryptographic hash values, connecting directly to the chain of custody principles from Chapter 3. Hashing a forensic image immediately after acquisition and verifying that hash before examination is a non-negotiable step.

dd is the foundational Linux disk imaging tool. The command dd if=/dev/sda of=image.dd bs=4096 creates a bit-for-bit copy of a source drive. In Chapter 3, you learned about the raw/dd image format; this is the tool that creates it. More advanced alternatives like dc3dd add built-in hashing and progress reporting.

strings extracts printable ASCII and Unicode text from any file, including binary executables and memory dumps. Running strings against a suspicious binary can reveal embedded URLs, IP addresses, commands, and other indicators of compromise.

xxd produces a hex dump of a file, allowing you to examine raw file headers and data at the byte level.

nano is a simple terminal-based text editor. During an examination, you may use it to create notes or examination logs. (On forensic images, you read files; you do not edit evidence.)

Terminal screenshot showing the command 'strings system_update.sh | head -20' run in the ~/cases/2024-0392/evidence directory. A green comment reads '# Extract human-readable strings from the suspicious binary.' The output lists extracted strings including system libraries ('/lib64/ld-linux-x86-64.so.2,' 'libcurl.so.4,' 'libcrypto.so.1.1'), network functions ('socket,' 'connect,' 'recv,' 'send'), a C2 URL highlighted in green ('https://c2.malicious-domain.xyz/beacon'), the file path in green ('/tmp/.hidden/beacon.sh'), a spoofed User-Agent ('Mozilla/5.0 (compatible)'), a content type ('application/octet-stream'), sensitive file paths highlighted in red ('/etc/shadow' and '/home/%s/.ssh/id_rsa'), and data encoding tools ('base64,' 'gzip,' 'sleep'). Green comments summarize: 'Key findings: C2 URL, shadow file access, SSH key theft, data encoding (base64/gzip), and spoofed User-Agent.'

Process and Network

These commands are relevant during live response on a running Linux system, when volatile data must be captured before the system is powered down.

ps aux lists all running processes with their process IDs, users, CPU/memory usage, and the command that launched them. Suspicious processes (unusual names, running from /tmp, or running under unexpected user accounts) are key triage indicators.

top (and its more user-friendly alternative htop) provides a real-time, updating view of system processes and resource usage.

netstat -tulnp and its modern replacement ss -tulnp display active network connections and listening ports. The flags stand for: TCP (-t), UDP (-u), listening (-l), numeric output (-n), and show process names (-p). Identifying unexpected outbound connections or unusual listening ports is a core step in live incident triage.

ip addr show (replacing the older ifconfig) displays the system's network interface configuration, including IP addresses, MAC addresses, and interface status.

Terminal screenshot showing the command 'ps aux --sort=-start_time | head -15' run on a SIFT workstation. A green comment reads '# List all running processes, sorted by most recent start time.' The output displays a process table with columns: USER, PID, %CPU, %MEM, VSZ, RSS, TTY, STAT, START, TIME, and COMMAND. Two suspicious processes are highlighted in red: PID 28917 running as user 'deploy' at 2.3% CPU, started at 02:14, command '/tmp/.hidden/beacon.sh'; and PID 28844 running as user 'www-data' at 0.4% CPU, started at 01:58, command '/usr/bin/python3 /tmp/sess_a8f3c12e.' Other processes shown include cron (root, PID 28102), two apache2 workers (www-data), mysqld (mysql, PID 1087), apache2 main (root, PID 1204), and sshd (root, PID 841). Green comments note: 'Suspicious processes: PID 28917: beacon.sh running from /tmp/.hidden/ as deploy' and 'PID 28844: Python process spawned by www-data in /tmp/.'

Terminal screenshot showing the command 'netstat -tulnp' run on a SIFT workstation. A green comment reads '# Show all TCP/UDP listeners with process info.' The output displays a table with columns: Proto, Recv-Q, Send-Q, Local Address, Foreign Address, State, and PID/Program. Normal services shown: sshd on port 22, apache2 on ports 80 and 443, mysqld on port 3306. Three suspicious entries are highlighted in red: 'beacon.sh' (PID 28917) listening on port 4444 with state LISTEN, an ESTABLISHED connection from 10.0.1.15:4444 to 198.51.100.14:51992 by beacon.sh with 412 bytes in the send queue, and an ESTABLISHED connection from 10.0.1.15:443 to 203.0.113.45:62104 by apache2. Green comments note: 'Suspicious: beacon.sh listening on port 4444 (common Metasploit port),' 'Active C2 connection to 198.51.100.14 (same IP from brute force),' and '412 bytes in send queue = data currently being exfiltrated.'

11.4 Linux Forensic Artifacts

The investigative value of a Linux system lies in the artifacts it generates during normal operation. Every login, every command, every service start, every network connection produces a trace somewhere in the file system. This section maps the key artifact categories, their locations, and what they reveal to an investigator.

Log Files: The Investigator's Primary Evidence

The /var/log/ directory is the single most important location on a Linux system for forensic analysis. Nearly all system, application, and security logs are written here. Unlike Windows, where event logs are stored in a proprietary binary format (.evtx), most Linux logs are plain text files that can be read directly with cat, grep, less, and the other tools covered in Section 11.3.

The key log files include:

/var/log/syslog (Debian/Ubuntu) or /var/log/messages (Red Hat/CentOS): The general-purpose system log. Captures a wide range of system events including service starts and stops, kernel messages, and application notices.
/var/log/auth.log (Debian/Ubuntu) or /var/log/secure (Red Hat/CentOS): Authentication events. This log records every login attempt (successful and failed), every sudo command, every SSH connection, and every su (switch user) event. For most investigations, this is the first log you read.
/var/log/kern.log: Kernel-level messages, including hardware events, driver loading, and USB device connections.
/var/log/dmesg: Boot-time messages. Useful for determining what hardware was detected at startup.
/var/log/apache2/ or /var/log/nginx/: Web server access and error logs. The access log records every HTTP request (source IP, timestamp, requested URL, response code, user agent). The error log records server-side failures and application errors.
/var/log/cron or /var/log/cron.log: Records the execution of scheduled tasks (cron jobs), including which user's crontab triggered the task and when.
/var/log/dpkg.log (Debian/Ubuntu) or /var/log/yum.log (Red Hat/CentOS): Records all package installations, upgrades, and removals. An attacker who installs tools using the package manager leaves traces here.

syslog vs. journald

Modern Linux distributions that use systemd (which includes virtually all current distributions) have a second logging system called journald. Journald captures the same types of events as syslog but stores them in a binary format rather than plain text. You query journald using the journalctl command:

journalctl -u ssh shows all log entries for the SSH service
journalctl --since "2025-03-01" --until "2025-03-15" filters by date range
journalctl -p err shows only error-level and above messages

Many distributions run both systems in parallel: journald captures events from systemd services, and syslog receives a copy for writing to traditional text files. Investigators should check both sources.

Warning

Linux log files are subject to log rotation. The logrotate utility automatically compresses and archives old logs on a schedule (daily, weekly, or based on file size). Rotated logs are typically named with a numeric suffix or date stamp (e.g., auth.log.1, auth.log.2.gz). Depending on the rotation policy, logs older than a few weeks may be compressed or deleted entirely. Investigators must check the logrotate configuration (/etc/logrotate.conf and /etc/logrotate.d/) to understand what retention policy was in effect and identify any gaps in the log record.

User Activity Artifacts

Bash History

The ~/.bash_history file in each user's home directory is one of the most forensically valuable artifacts on a Linux system. Every command typed in a Bash shell is recorded here by default. In a compromise investigation, bash history can reveal exactly what an attacker did after gaining access: what files they viewed, what tools they downloaded, what data they exfiltrated, and what they tried to cover up.

Several environment variables control bash history behavior:

HISTFILE: The path to the history file. Default is ~/.bash_history.
HISTSIZE: The maximum number of commands stored in memory during a session.
HISTFILESIZE: The maximum number of lines retained in the history file.
HISTCONTROL: Controls what gets recorded. If set to ignorespace, commands starting with a space are not recorded. If set to ignoredups, consecutive duplicate commands are not recorded.

From an anti-forensic perspective (connecting back to Chapter 10), attackers frequently target bash history. Common techniques include running history -c to clear the in-memory history, deleting the .bash_history file, setting HISTSIZE=0 to prevent recording, or running unset HISTFILE at the start of a session. The absence of a bash history file, or a suspiciously empty one on an active account, is itself an indicator of anti-forensic activity.

SSH Artifacts

The ~/.ssh/ directory in a user's home folder contains several forensically significant files:

authorized_keys: Lists the public keys authorized to log into this account via SSH without a password. An attacker who adds their own public key to this file establishes persistent, password-less backdoor access.
known_hosts: Records the fingerprint of every SSH server this user has connected to. This reveals lateral movement targets.
id_rsa / id_ed25519 (and their .pub counterparts): The user's private and public SSH key pair. The private key, if exfiltrated by an attacker, grants access to any system that trusts the corresponding public key.

Browser Artifacts

If the Linux system has a desktop environment, browser artifacts follow the same patterns you learned in Chapter 7. Firefox stores its profile under ~/.mozilla/firefox/, and Chromium-based browsers store under ~/.config/google-chrome/ or ~/.config/chromium/. The SQLite databases containing history, downloads, cookies, and cached data are structurally identical to their Windows counterparts.

Crontab Entries

Cron is the Linux task scheduler, equivalent in purpose to Windows Task Scheduler. Each user can have their own crontab (viewable with crontab -l for that user), and system-wide scheduled tasks are defined in /etc/crontab and the /etc/cron.d/, /etc/cron.daily/, /etc/cron.hourly/, /etc/cron.weekly/, and /etc/cron.monthly/ directories.

System Configuration Artifacts

The /etc/ directory is the Linux equivalent of the Windows Registry for system configuration, with one major difference: every configuration is stored in a human-readable text file rather than a binary database. Key files include:

/etc/passwd, /etc/shadow, /etc/group: User account database (covered in Section 11.2).
/etc/hostname and /etc/hosts: The system's hostname and local DNS override entries. An attacker modifying /etc/hosts can redirect traffic silently.
/etc/fstab: Defines the file systems that should be mounted at boot, including network shares. This tells an investigator what storage was expected to be connected.
/etc/sudoers: Defines which users and groups can execute commands as root (or other users) using sudo. Unauthorized modifications to this file grant privilege escalation.
/etc/ssh/sshd_config: The SSH server configuration file. Investigators should check for non-standard settings such as PermitRootLogin yes, modified port numbers, or the addition of AuthorizedKeysFile paths that point to unusual locations.

Persistence Mechanisms

When an attacker compromises a Linux system, maintaining access across reboots requires establishing a persistence mechanism. These are the Linux equivalents of the Windows Registry autostart keys you studied in Chapter 5. The most common persistence techniques include:

Cron jobs: A malicious cron entry that periodically downloads and executes a payload or opens a reverse shell. Check both system-level (/etc/crontab, /etc/cron.d/) and per-user (crontab -l) schedules.
Systemd services and timers: An attacker creates a new .service unit file in /etc/systemd/system/ that starts a malicious binary at boot. Systemd timers can also trigger periodic execution, similar to cron.
rc.local and init scripts: On older systems, /etc/rc.local runs commands at the end of the boot process. Attackers may append commands here.
.bashrc / .profile injection: Adding malicious commands to a user's ~/.bashrc or ~/.profile causes them to execute every time the user opens a new shell session.
SSH authorized_keys backdoors: As described above, adding an attacker's public key to ~/.ssh/authorized_keys grants persistent, password-less access.
LD_PRELOAD and shared library hijacking: An advanced technique where an attacker creates a malicious shared library and configures the system to load it before legitimate libraries, allowing code injection into running processes.

Putting It Together

Scenario: A mid-size e-commerce company contacts your firm after their web application firewall flagged anomalous outbound traffic from their primary web server. The server runs Ubuntu 22.04 with Apache 2.4. A forensic image has been acquired and your task is to determine the scope of the compromise.

Step 1: Authentication Analysis. Mount the image read-only and begin with /var/log/auth.log. Search for failed SSH login attempts: grep "Failed password" auth.log | awk '{print $11}' | sort | uniq -c | sort -rn. The results show 4,200 failed attempts from a single IP address over a 6-hour window, followed by a successful login to the "deploy" service account.

Step 2: Post-Compromise Activity. Navigate to /home/deploy/.bash_history. The attacker's commands are visible: they downloaded a PHP web shell using wget, placed it in /var/www/html/uploads/, modified its permissions to be executable, and then ran history -c (but the history was already written to disk before the clear command took effect).

Step 3: Persistence Check. Examine the deploy user's crontab using the crontab file in the image. A new entry exists: */5 * * * * /tmp/.hidden/beacon.sh. This script runs every 5 minutes and sends a heartbeat to an external IP address, maintaining the attacker's C2 channel.

Step 4: Web Server Log Analysis. Read /var/log/apache2/access.log. Search for requests to the uploaded web shell: grep "uploads/config.php" access.log. The results show 38 POST requests to the web shell from three different source IP addresses over two days, indicating that the shell was actively used for further operations.

Step 5: Timeline Reconstruction. Combine the timestamps from auth.log (initial brute-force and successful login), bash_history file timestamps, the web shell's inode timestamps via stat, and the cron job's creation time to build a complete timeline of the intrusion from initial access through persistence establishment.

This five-step process, grounded entirely in the artifacts and commands covered in this chapter, gives the analyst a clear narrative: the attacker brute-forced SSH access, deployed a web shell, established cron-based persistence, and used the web shell for ongoing access. The timeline provides the legal team with the dates and times needed for their response.

11.5 Forensic Linux Distributions

What Are Forensic Distributions?

A forensic Linux distribution is a purpose-built operating system that comes preconfigured with forensic tools, validated workflows, and (in most cases) default settings designed to protect evidence integrity. These distributions can be booted from a USB drive or installed on a forensic workstation, providing a ready-to-use examination environment without the need to manually install and configure dozens of individual tools.

Forensic distributions matter for several reasons. Their tool chains are tested and validated for use in legal proceedings. Many default to write-blocking behavior, ensuring that connected evidence drives are not inadvertently modified. They provide a consistent, reproducible environment, which strengthens the defensibility of your examination in court.

SANS SIFT Workstation

The SANS Investigative Forensic Toolkit (SIFT) is one of the most widely recognized forensic distributions in the industry. Maintained by SANS and originally developed by Rob Lee, SIFT is built on Ubuntu and is designed for comprehensive disk and memory forensics.

SIFT comes preloaded with tools including Autopsy, The Sleuth Kit, Volatility, log2timeline/Plaso (for super-timeline creation), bulk_extractor, and numerous other utilities. It is the distribution most commonly used in SANS DFIR training courses and is well-suited for detailed forensic examinations where the investigator needs a full analysis environment.

Kali Linux

Kali Linux, maintained by Offensive Security, is built on Debian and is primarily known as a penetration testing distribution. Its default toolset is oriented toward offensive security: network scanning, vulnerability assessment, exploitation, and password cracking.

However, Kali includes a Forensic Mode that can be selected at boot. Forensic Mode prevents the system from mounting any internal drives automatically or enabling swap, protecting evidence from accidental modification. Kali also includes a number of forensic and network analysis tools. While it is not a dedicated forensic platform, its forensic mode makes it useful for live response and network forensic tasks, and many analysts are already familiar with it from their penetration testing work in courses like CFS275.

Tsurugi Linux

Tsurugi is an Italian-developed forensic distribution designed for DFIR, open-source intelligence (OSINT), and malware analysis. It is built with law enforcement use cases in mind and includes a broad set of tools covering disk forensics, memory analysis, network forensics, and OSINT gathering. Tsurugi is particularly well-regarded for its OSINT investigation capabilities.

CAINE (Computer Aided INvestigative Environment)

CAINE is another Italian forensic distribution, built on Ubuntu and designed with an emphasis on a user-friendly graphical interface. CAINE implements write-blocking by default on all connected devices and provides a GUI-oriented workflow that is accessible to investigators who prefer graphical tools over the command line. It is well-suited for disk forensics where a streamlined, GUI-based workflow is preferred.

Distribution	Base OS	Primary Focus	Write-Block Default	Key Tools	Best Suited For
SANS SIFT	Ubuntu	Disk & memory forensics	Configurable	Autopsy, Sleuth Kit, Volatility, Plaso	Comprehensive DFIR examinations
Kali Linux	Debian	Penetration testing (Forensic Mode available)	In Forensic Mode	Nmap, Wireshark, Hashcat, Volatility	Network forensics, live response, offensive testing
Tsurugi	Independent	DFIR, OSINT, malware analysis	Yes	OSINT tools, Autopsy, Volatility	OSINT investigations, malware RE
CAINE	Ubuntu	Disk forensics (GUI-focused)	Yes	Autopsy, Guymager, Foremost	GUI-based disk forensic examinations

Analyst Perspective

Most experienced forensic analysts do not rely on a single distribution. They maintain several bootable USB drives, each with a different distribution, and select the one best suited for the specific case. SIFT for a deep-dive disk and memory examination. Kali in forensic mode for quick network triage and live response. CAINE when the team needs a GUI-driven workflow. Knowing what each distribution offers, and its limitations, lets you pick the right tool for the job.

11.6 Simulated Linux Terminal: Test Your Understanding

Activity

Try out various Linux file system navigation and forensic commands in this simulated, hands-on activity.

Chapter Summary

This chapter transitioned from the Windows-centric forensic toolkit built in Chapters 4 through 10 to the Linux environment that powers servers, cloud infrastructure, IoT devices, and the forensic workstations investigators use daily.

Key concepts covered include:

Linux is everywhere in investigations. Even Windows-focused cases frequently involve Linux infrastructure, from compromised web servers to cloud-hosted C2 nodes to embedded devices. Investigators who cannot work in Linux will encounter evidence they cannot analyze.
The Linux file system hierarchy follows the Filesystem Hierarchy Standard. The most forensically relevant directories are /home (user data), /etc (system configuration), /var/log (logs), and /tmp (temporary files and attacker staging areas).
Ext4 uses inodes as its central metadata structure, serving the same purpose as NTFS MFT entries. The critical forensic difference is that ext4 zeroes out inode block pointers upon file deletion, making metadata-based recovery harder than on NTFS. File carving and journal analysis are the primary recovery strategies.
Timestamps on Linux include mtime (content modification), atime (access), and ctime (metadata change). Ctime is frequently misunderstood as "creation time" but actually tracks the last metadata change.
The Linux command line is the primary forensic interface. Commands like grep, find, sort, uniq, and awk chain together through pipes to build powerful analysis workflows, especially for log analysis.
Log files in /var/log/ are the primary evidence source on Linux systems. Authentication logs (auth.log/secure) are typically the first artifact an investigator examines. Log rotation policies determine how much history is available.
User activity artifacts include bash history (commands typed), SSH artifacts (keys, known hosts), browser profiles, and crontab entries. Bash history is both highly valuable and a frequent target of anti-forensic techniques.
Persistence mechanisms on Linux include cron jobs, systemd services, rc.local scripts, shell profile injection, and SSH authorized_keys backdoors, serving the same purpose as Windows Registry autostart keys.
Forensic distributions (SIFT, Kali, Tsurugi, CAINE) provide preconfigured examination environments with validated tool chains and write-blocking defaults. Each distribution has strengths suited to different investigation types.

The commands and artifact locations covered in this chapter are consolidated in the reference tables that follow. These tables are designed as study aids and quick-reference resources that you will return to throughout the remainder of the course and in hands-on lab exercises.

Quick-Reference Tables

Table A: Linux Command Quick Reference:Navigation & File Inspection

Command	Syntax Example	Purpose	Forensic Use
pwd	`pwd`	Print working directory	Verify current location on mounted image
ls	`ls -la /home/`	List files with details	View permissions, timestamps, hidden files
cd	`cd /var/log/`	Change directory	Navigate to evidence locations
cat	`cat auth.log`	Display file contents	Read log files and configuration files
head / tail	`tail -n 50 syslog`	View first or last N lines	Check recent log entries quickly
less	`less largefile.log`	Paginated file viewer	Browse large log files
file	`file suspicious.dat`	Identify file type by magic bytes	Detect file extension mismatch (Ch10)
stat	`stat document.pdf`	Show detailed file metadata	Examine all three timestamps and inode info
wc	`wc -l access.log`	Count lines, words, characters	Gauge log file size before analysis

Table B: Linux Command Quick Reference:Searching & Filtering

Command	Syntax Example	Purpose	Forensic Use
find	`find / -name "*.php" -mtime -7`	Locate files by criteria	Find recently modified files (web shells, backdoors)
grep	`grep -i "failed password" auth.log`	Search text patterns in files	Extract specific events from log files
sort	`sort access.log`	Sort lines alphabetically or numerically	Organize log entries for analysis
uniq	`sort \\| uniq -c`	Remove or count duplicate lines	Frequency analysis of log events
cut	`cut -d' ' -f1 access.log`	Extract fields from delimited text	Isolate IPs or timestamps from logs
awk	`awk '{print $1}' access.log`	Field-based text processing	Parse structured log fields

Table C: Linux Command Quick Reference:System Information & Disk

Command	Syntax Example	Purpose	Forensic Use
uname	`uname -a`	Display OS and kernel info	Identify target system version and architecture
hostname	`hostname`	Display system hostname	Confirm system identity
df	`df -h`	Show disk space usage	Check for suspicious volume utilization
lsblk	`lsblk`	List block devices	Map drives, partitions, and logical volumes
fdisk	`fdisk -l /dev/sda`	Display partition table	Identify partition layout and types
mount	`mount`	Show mounted file systems	Verify what is attached and mount options
free	`free -h`	Display memory usage	Snapshot of system RAM state
uptime	`uptime`	Show system uptime	Determine last reboot time
who / w	`who`	Show logged-in users	Identify active user sessions
last	`last`	Display login history	Review historical user login patterns

Table D: Linux Command Quick Reference:Integrity & Acquisition

Command	Syntax Example	Purpose	Forensic Use
md5sum	`md5sum image.dd`	Generate MD5 hash	Verify forensic image integrity (Ch3)
sha256sum	`sha256sum image.dd`	Generate SHA-256 hash	Chain of custody hash verification (Ch3)
dd	`dd if=/dev/sda of=image.dd bs=4096`	Bit-stream disk copy	Forensic disk imaging (Ch3)
strings	`strings binary_file`	Extract printable text from any file	Find readable text in binaries or memory dumps
xxd	`xxd file.bin \\| head`	Hex dump of file contents	Examine file headers and raw byte data
nano	`nano notes.txt`	Simple terminal text editor	Create examination notes during analysis

Table E: Linux Command Quick Reference:Process & Network

Command	Syntax Example	Purpose	Forensic Use
ps	`ps aux`	List all running processes	Identify suspicious or unexpected processes
top / htop	`top`	Real-time process monitor	Observe system resource usage
netstat	`netstat -tulnp`	Display network connections	Identify active connections and listening ports
ss	`ss -tulnp`	Socket statistics (modern netstat)	Same as netstat, preferred on newer systems
ip addr	`ip addr show`	Show network interface details	Identify system network configuration

Table F: Key Linux Forensic Artifact Locations

Artifact	Location	What It Reveals
User accounts	/etc/passwd	All accounts, UIDs, home directories, default shells
Password hashes	/etc/shadow	Hashed passwords, password change dates, account status
Group membership	/etc/group	Group assignments and privilege associations
Command history	~/.bash_history	Commands typed by each user (or attacker)
SSH keys	~/.ssh/	Authorized keys, known hosts, private key pairs
System logs	/var/log/syslog or /var/log/messages	General system events
Authentication events	/var/log/auth.log or /var/log/secure	Logins, sudo usage, SSH authentication
Web server logs	/var/log/apache2/ or /var/log/nginx/	HTTP requests, errors, access patterns
Cron jobs (user)	crontab -l (per user)	User-level scheduled tasks (potential persistence)
Cron jobs (system)	/etc/crontab, /etc/cron.d/, /etc/cron.daily/	System-level scheduled tasks
Systemd services	/etc/systemd/system/	Service unit definitions (potential persistence)
Mounted file systems	/etc/fstab	Expected mount points and network shares
Sudo privileges	/etc/sudoers	Who can run commands as root and what commands
Network configuration	/etc/network/ or /etc/netplan/	Static IP assignments, DNS servers, default gateway
Installed packages	/var/log/dpkg.log or /var/log/yum.log	Software installation and removal history
System hostname	/etc/hostname	System identity
Local DNS overrides	/etc/hosts	Manual hostname-to-IP mappings (potential traffic redirection)
SSH server config	/etc/ssh/sshd_config	SSH access settings, allowed authentication methods

CH11: Linux Forensics

Chapter Overview

Learning Objectives

11.1 Why Linux Forensics Matters

Where Linux Lives in Investigations

The Linux Forensic Workstation

Windows vs. Linux: A Conceptual Bridge

11.2 Linux Fundamentals for Investigators

The Linux Architecture

The File System Hierarchy

The ext4 File System and Inodes

How ext4 Handles File Deletion

The ext4 Journal

Recovery Tools and Strategies

Timestamps in Linux

11.3 Essential Linux Commands for Forensic Investigators

Navigation and File Inspection

Searching and Filtering

System Information

File Operations and Integrity

Process and Network

11.4 Linux Forensic Artifacts

Log Files: The Investigator's Primary Evidence

syslog vs. journald

User Activity Artifacts

Bash History

SSH Artifacts

Browser Artifacts

Crontab Entries

System Configuration Artifacts

Persistence Mechanisms

Putting It Together

11.5 Forensic Linux Distributions

What Are Forensic Distributions?

SANS SIFT Workstation

Kali Linux

Tsurugi Linux

CAINE (Computer Aided INvestigative Environment)

11.6 Simulated Linux Terminal: Test Your Understanding

Chapter Summary

Quick-Reference Tables

Table A: Linux Command Quick Reference:Navigation & File Inspection

Table B: Linux Command Quick Reference:Searching & Filtering

Table C: Linux Command Quick Reference:System Information & Disk

Table D: Linux Command Quick Reference:Integrity & Acquisition

Table E: Linux Command Quick Reference:Process & Network

Table F: Key Linux Forensic Artifact Locations