1. Introduction
The rm command is one of the most commonly used utilities in Unix-like operating systems. From casual desktop users to seasoned system administrators, it sits at the core of everyday file management tasks, removing logs, cleaning up temporary files, pruning old backups, and more. Yet despite how often we invoke it, few of us take the time to understand what rm actually does under the hood. We may develop a reflexive confidence : “if I type rm file.txt, it’s gone forever” (without appreciating the underlying mechanics of filesystems).
In reality, rm does not immediately obliterate the data contained in a file. Instead, it simply removes (or “unlinks”) the reference that the filesystem holds to that data. The actual content on the disk remains intact meaning that deleted files on many filesystems can often be recovered, at least until they’re overwritten.
In this article, we will:
- Explain why
rmon an ext4 filesystem does not actually erase data. - Show how deleted files can be recovered using the
extundeletetool, illustrating why relying onrmfor secure deletion is unsafe. - Introduce a simple command,
shred, that actually overwrites file data to make recovery far more difficult. - Briefly contrast traditional ext4 behavior with modern copy‑on‑write (CoW) filesystems, where deletion does truly destroy the old data.
By the end, you will have a practical understanding of how deletion works on ext4, how to recover, or permanently erase, files, and when deletion really means deletion.
2. How rm Works on an ext4 Filesystem
Let’s start by digging into how files are stored and deleted on an ext4 filesystem, the most common default for Linux distributions.
2.1 The Basics: Inodes, Blocks, and Links
To begin, let’s have a look at the filesystem architecture from the highest to the lowest level of abstraction.
- Superblocks: The superblock is the control center of the filesystem. It stores global information: the total number of inodes and blocks, block size, mount status, and other critical metadata needed to manage the entire volume.
- Block and Inode Bitmaps: These bitmaps track which blocks and inodes are in use. When a file is created or deleted, these bitmaps are updated to mark the corresponding blocks or inodes as allocated or free.
- Directory Entry:
Each file name you see in a directory is actually a hard link that points to an inode. That’s why multiple names (via
ln) can reference the same underlying file: they all point to the same inode. - Inode: An inode is a small data structure that stores metadata about a file: owner, permissions, timestamps, and pointers to the actual data blocks on disk. However, it does not contain the file’s name.
- Data Blocks: These are the chunks of disk space where file contents are actually stored. Each block is typically 4 KiB. The inode’s pointers reference these blocks in order.
We will focus on the inodes and the data blocks, as they are at the heart of what does, or more importantly, does not happen when you delete a file using rm.
Note
When you run ls -li on a file, -i you see its inode number.ls -l, you see the link count:
|
|
- Inode 15896956 is the inode of a directory, it has a link count of 2 : itself and the above folder (the current folder).
- Inode 13663711 is a file that only points to itself.
- Inode 15787809 is the inode of a directory, it has a link count of 2 : itself and the above folder (the current folder).
- Inode 15896971 is a symbolic link, it links to itself and the file it links to.
2.2 What Happens When You Use rm?
When you type rm sensitive_data.txt, you’re telling the operating system to remove the directory entry that links the file name (sensitive_data.txt) to its underlying data. This does not erase the file’s contents. Instead, it’s like tearing the label off a drawer (you no longer know what’s inside or how to open it, but the contents are still there, undisturbed). Indeed, the rm command is essentially a wrapper around the unlink(2) system call. What it does:
- Locate the directory entry for the specified file.
- Remove that entry from the directory.
- Decrement the inode’s link count by one.
- If the link count reaches zero and no process has the file open, mark the inode and its associated data blocks as free in the filesystem’s tables.
Crucially, the contents of the data blocks are not immediately zeroed out; they are simply marked as available for future allocations. Until the filesystem reassigns those blocks to another file, the data bits remain intact on disk.
Deleting a file
After rm a_file:
Deleting a directory
The -r flag recursively unlinking every inode within the directory, deleting the directory and all its contents.
Note
An interesting detail is that when a symbolic link points to a file that has been deleted, its inode remains, but the link count reflects only the symlink itself. It no longer contributes to the original file’s link count, since that target is gone.
|
|
2.3 The Role of the ext4 Journal
Ext4 uses a journal to ensure filesystem consistency. When you delete a file:
- The intention to update directory and inode metadata is written to the journal.
- The actual removal of directory entries and update of metadata is done.
- The journal is committed.
Note
Key takeaway:
On ext4, rm removes the reference to the data, not the data itself.
3. Recovering “Deleted” Files with extundelete
Because ext4 merely frees blocks without wiping them, specialized recovery tools can scan raw disk areas for remnants of files. One of the most popular tools is extundelete.
3.1 Installing extundelete
On Debian‑based systems:
|
|
Using nix:
nix shell nixpkgs#extundelete
3.2 Basic Usage
Suppose you accidentally deleted a_file from the mounted filesystem /dev/sdb1. To recover:
Ensure the partition is unmounted (to avoid overwriting blocks):
|
|
Warning
Don’t unmount your system partition
If you want to experience extundelete, do not try to unmount the partition your Linux system is currently running on (usually /). Doing so will freeze or crash your system, as critical processes and binaries rely on access to that partition.
If you want to test things safely, use a virtual machine, a secondary disk, or boot from a live USB environment.
Run extundelete in restore mode:
|
|
Find recovered files in the RECOVERED_FILES directory that extundelete creates.
3.3 How extundelete Actually Recovers Files
Under the hood, extundelete performs a sequence of low‑level operations against the raw ext4 filesystem device to unearth “deleted” data:
- Reading the Superblock and Group Descriptors
extundeletefirst parses the filesystem’s superblock to determine layout parameters (block size, inode count, number of block groups, etc.).- It then reads each group descriptor to locate the block and inode bitmaps, the inode tables, and the journal area.
- Scanning Inode and Block Bitmaps
- Deleted inodes are those whose entries in the inode bitmap have been flipped from “allocated” to “free”, but whose contents may still exist.
extundeletescans these bitmaps to build a list of candidate inodes that were recently freed.- For each candidate inode, it checks the inode table on disk to recover its metadata (timestamps, file size, block pointers).
- Reconstructing file contents
- From each recovered inode,
extundeletereads the block pointers to copy the corresponding data blocks into a safe recovery directory.
- Recovering file names from Journal or Directory Blocks
extundeletecan parse these journal entries to find directory‑to‑inode mappings that no longer exist in the live filesystem.
- Handling partially overwritten data
- When some blocks have already been reallocated and overwritten,
extundeletewill recover only the intact portions. It writes each recovered fragment as a separate file (e.g., FILE.13663711.part1) so you can manually inspect and piece together what remains.
Walkthrough Example
Imagine you deleted a 10 KiB text file:
- The inode bitmap marks its inode as free but doesn’t zero out its entry.
extundeletespots that freed inode, reads its old block pointers (say, blocks 100–102).- It copies the raw contents from blocks 100–102 into a new file under RECOVERED_FILES/.
- If ext4’s journal still contains the directory record linking “a_file” to that inode,
extundeleterecreates the file under its original name.
Info
Limitations Recap
- Overwritten Blocks: Any data blocks already reused by new files are irretrievable.
- High Fragmentation: Files with many non‑contiguous extents may recover incompletely or out of order.
- Journaling Mode Variances: In “writeback” mode (metadata journaled, data written directly), file data in the journal may be truncated or absent.
Because of these caveats, extundelete’s success depends on acting quickly, ideally before the filesystem has had a chance to reallocate freed blocks. This powerful recovery capability clearly illustrates why relying on rm alone is insufficient for secure deletion on ext4.
4. How to Actually Delete Files with shred
To guarantee that file contents are irrecoverable, you need to overwrite the data blocks themselves. The GNU Coreutils package provides the shred tool for this purpose.
4.1 Overview of shred
By default, shred overwrites a file multiple times with patterns designed to obscure the original data, then optionally truncates and removes the file. Key options include:
-uor--remove. After overwriting, truncate and remove the file.-vor--verbose. Show progress on standard error.-zor--zero. After overwriting the file with random data, makes a final overwrite is performed with zeros. This can be useful for hiding the fact that the file was shredded, as it leaves the storage space filled with zeros, potentially making it less obvious that data destruction has taken place.
4.2 Example
To securely delete a_file:
|
|
This does:
- It overwrites the file’s data blocks multiple times with random bytes. The goal is to make it much harder for forensic tools to recover any previous content by analyzing magnetic remnants on the disk surface.
- Then it performs a final overwrite pass with zeros. This step helps obscure the fact that the file was shredded at all. Without this, someone inspecting the disk might guess that random, looking blocks had been intentionally wiped.
- Finally, it truncates the file to zero bytes and deletes it from the directory. Truncation clears the content, while deletion removes the filename and inode reference, effectively unlinking it from the filesystem.
By default, shred uses three passes of random data, but you can increase this with -n.
After this process, the original bits have been wiped, making recovery via extundelete or low‑level block scanning virtually impossible.
Info
Note: On journaling filesystems, metadata updates may still leave traces in the journal itself. However, the actual file contents are securely wiped.
5. A Glance at CoW Filesystems – When rm Does Destroy Data
Copy‑on‑write (CoW) filesystems like Btrfs and ZFS adopt a fundamentally different approach to data updates compared to traditional in‑place filesystems such as ext4. Instead of modifying blocks directly, CoW filesystems always allocate fresh blocks for every write, ensuring that the on‑disk state remains consistent and that snapshots can reference past versions without interference. This design has the added benefit that deleting a file truly makes its old data inaccessible, provided no snapshots refer to it.
5.1 CoW Principles
- Allocating New Blocks for Every Change:
Whenever you create, modify, or delete a file, the filesystem writes the new data (or metadata) to previously unused blocks.
The existing blocks remain untouched until the new write is fully committed.
- Atomic Metadata Switch:
Once the new blocks are safely written (and checksummed, in ZFS), the filesystem atomically updates the metadata pointers to reference the new blocks.
This atomic switch guarantees that even in the event of a crash, you’ll either see the old version or the new version, never an inconsistent mix.
- Freeing the Old Blocks:
After the metadata update, the old blocks are marked as free in the allocation map.
Unlike ext4’s deferred freeing (which simply marks pointers as unused), CoW frees are treated as new metadata operations and are themselves journaled or transactional.
5.2 Why Deletion Is More Final on CoW
- No in‑place overwrites: Since no block is ever modified in place, deleting a file simply involves switching metadata away from its blocks, without touching the old data. (
rmtriggers new metadata writes. Since nothing points to them anymore, recovery is far less likely.) - Immediate reclamation: The freed blocks enter the free‑space map, where they can be reclaimed by new allocations or proactively overwritten by background scrubbing processes.
- Data scrubbing and TRIM:
- ZFS runs periodic scrubbing to verify and rewrite data, which will eventually repurpose freed blocks.
- Btrfs supports the
discardoperation (withmount -o discardor scheduledfstrim) to inform underlying storage that blocks are no longer needed, enabling SSDs to erase them at the flash level.
5.3 Handling Snapshots
CoW filesystems derive much of their power from snapshot capabilities. A snapshot captures a read‑only view of the filesystem at a particular point in time by retaining references to all blocks in use at that moment. Because snapshots hold references:
- Old Blocks Persist: Even after deleting a file, if any snapshot still references its blocks, those blocks cannot be freed.
- True Deletion Requires Snapshot Clean‑Up: You must destroy or expire all snapshots that include the file before its data blocks become reclaimable.
|
|
5.4 Practical Implications
- Secure Deletion by Default: On a CoW filesystem without snapshots,
rmeffectively abandons the old data blocks, making recovery tools unable to locate them. - Snapshot Oversight: If you rely heavily on snapshots, you must manage their lifecycles to avoid unintentionally preserving deleted data.
- Performance Trade‑Offs: CoW writes can lead to fragmentation, so periodic defragmentation or rebalancing may be necessary.
- Use Cases: CoW filesystems are ideal for scenarios requiring atomic updates, point‑in‑time recovery, or built‑in versioning, such as virtualization hosts, backup servers, or large‑scale storage arrays.
Note
Key takeaway:
In contrast to ext4’s in‑place updates, where deleted data lingers until overwritten, CoW filesystems allocate new space for every change and free old blocks immediately (unless held by snapshots), making rm a genuinely destructive operation in the common case. Nonetheless, snapshot management remains critical to ensure that sensitive data is not inadvertently preserved.
6. Conclusion
rmon ext4 is not secure deletion. It only unlinks directory entries and frees inodes; data remains on disk until overwritten.- Data recovery is possible. Tools like
extundeletecan restore deleted files on ext4, demonstrating the need for caution. - Use
shredfor secure erase. With multiple overwrite passes and zeroing,shred -uvzeffectively destroys file contents. - CoW filesystems behave differently. On Btrfs or ZFS, deletion via CoW genuinely frees old blocks without in-place updates, making data less accessible, but watch out for snapshots.
Selecting the right tool and filesystem depends on your requirements: performance, reliability, snapshotting, and security. When privacy or data sensitivity is a concern, never assume that rm equates to irrecoverable deletion. Instead, choose targeted secure‑erase utilities or filesystems whose design aligns with your security goals.
Bibliography
- How to implement our own file system by Angelina Kuntz
- An introduction to Linux’s EXT4 filesystem by David Both
- [YTB Video] Introduction to the Ext4 File System for Linux by Joe Collins
- [YTB Video] EXT4 | How does it work? by Maple Circuit
- Website hosting muliple graphs of EXT4 architecture
- [YTB Video] BTRFS | All You NEED to know!
- Overview of Linux Filesystems: Ext4, XFS, Btrfs, ZFS, etc
- ext4 Data Structures and Algorithms - kernel.org