From deduplication to frequency analysis, learn how to turn raw terminal output into actionable insight.
Raw output lies to you.
A list of 3,000 IP addresses means nothing until you know which ones appear 500 times and which ones appear once. A credential dump is useless until you know which passwords are shared across accounts. Log files tell you nothing until events are ranked by frequency.
sort and uniq answer those questions. They do not find things and they do not extract things — they organize and count what you already have. In security work, that step is often where the actual insight lives.
This article covers both tools from scratch: every flag worth knowing and how they fit into real security pipelines.
Two Tools, One Workflow
sort — takes lines of input and puts them in order. Alphabetical, numerical, reverse, by field. The ordering itself is often the goal, but sort also sets up uniq to work correctly.
uniq — removes or counts duplicate lines. It only works on adjacent duplicates — consecutive lines that are identical. This is why sort almost always comes first.
They are separate tools that solve separate problems, but in practice they are almost always used together.
Part One — sort
What sort Does
sort reads lines from input and writes them back out in sorted order. The basic call:
bash
sort filename
Or pass output from another command:
bash
some_command | sort
By default, sort orders lines lexicographically — the same way a dictionary orders words. It compares character by character from left to right.
Input:
banana
apple
cherry
date
Output:
apple
banana
cherry
date
That is the default. The flags are where the real control comes in.
The Core Flags
-n — Sort Numerically
Default sort is lexicographic, which means numbers sort as strings. 10 comes before 2 because 1 comes before 2 in character order.
bash
sort numbers.txt
Input:
10
2
30
5
Lexicographic output:
10
2
30
5
That is wrong for numbers. -n fixes it:
bash
sort -n numbers.txt
Output:
2
5
10
30
When to use it: Any time you are sorting counts, port numbers, sizes, UIDs, or anything numeric. If the values are numbers, always use -n.
-r — Reverse the Sort Order
Reverses whatever order sort would normally produce.
bash
sort -r names.txt
Alphabetical becomes reverse alphabetical. Numeric ascending becomes descending.
bash
sort -rn numbers.txt
Combines -r and -n — numeric sort, highest first. This is the pattern you will use constantly: rank by frequency, highest count at the top.
-u — Sort and Remove Duplicates
-u tells sort to output only unique lines — the first occurrence of each value, duplicates discarded.
bash
sort -u ips.txt
Sorts the list and removes any duplicate IP addresses in one step. This is a shortcut for sort | uniq when you only need deduplication and not counts.
-f — Case-Insensitive Sort
Treats uppercase and lowercase as equivalent when sorting.
bash
sort -f names.txt
Admin, admin, and ADMIN sort to the same position. Without -f, uppercase letters sort before lowercase in ASCII order, which can give counterintuitive results.
-k — Sort by a Specific Field
By default sort uses the entire line. -k tells it to sort by a specific column.
bash
sort -k2 data.txt
Sorts by the second whitespace-separated field.
bash
sort -k2 -n data.txt
Sorts by the second field, numerically.
The field syntax is worth knowing precisely. -k2,2 means “start at field 2, end at field 2” — sort only on that field. -k2 without an end position means “start at field 2, continue to end of line,” which can produce unexpected results on lines with trailing fields. For a single-field sort, always use -k n,n.
-t — Define the Field Separator for -k
By default sort splits fields on whitespace. -t changes the separator so -k works on delimited data.
bash
sort -t':' -k3 -n /etc/passwd
Split on :, sort by field 3 (the UID), numerically. Shows you users ordered by UID from lowest to highest.
-h — Human-Readable Sort
Sorts values that include size suffixes — K, M, G — correctly.
bash
du -sh * | sort -h
Without -h, 10M sorts before 2G lexicographically because 1 comes before 2. With -h, it correctly treats 2G as larger.
When to use it: Sorting file sizes, disk usage output, anything with human-readable size units.
-R — Randomize Order
Shuffles lines into random order. Not often needed in analysis, but useful for sampling a large dataset or randomizing a wordlist.
bash
sort -R wordlist.txt
sort in Security Workflows
Sort a List of IPs Numerically
bash
sort -t'.' -k1,1n -k2,2n -k3,3n -k4,4n ips.txt
Sorts IP addresses numerically by each octet. Each field is separated by . and sorted as a number. The result is a properly ordered IP list — not the lexicographic mess you get from a plain sort.
Sort Nmap Output by Port Number
bash
grep "open" nmap.txt | cut -d'/' -f1 | sort -n
grep filters open ports. cut extracts the port number. sort -n orders them numerically lowest to highest.
Sort Files by Size
bash
ls -lh | sort -k5 -h
List files with human-readable sizes, then sort by the size column (field 5) using human-readable sort.
Order Discovered Paths Alphabetically
bash
grep "Status: 200" gobuster.txt | cut -d' ' -f1 | sort
Clean alphabetical list of discovered paths. Easier to read and identify patterns than unsorted output.
Part Two — uniq
What uniq Does
uniq filters adjacent duplicate lines from input. It compares each line to the one immediately before it — if they are identical, the duplicate is removed or counted depending on the flags you use.
The critical point: uniq only acts on adjacent duplicates. Lines must be consecutive to be compared. This is why sort almost always comes first — it groups identical lines together so uniq can process them reliably.
bash
sort data.txt | uniq
This is the standard pattern. sort groups duplicates together, uniq removes them.
The Core Flags
No Flag — Remove Duplicates
bash
uniq filename
Removes consecutive duplicate lines. Each unique line appears once.
Input (already sorted):
apple
apple
banana
cherry
cherry
cherry
Output:
apple
banana
cherry
-c — Count Occurrences
This is the flag you will use most. -c prepends a count to each line — how many times that line appeared in the input.
bash
sort data.txt | uniq -c
Output:
2 apple
1 banana
3 cherry
The count is on the left, the value on the right. Combined with sort -rn, this gives you a frequency-ranked list — one of the most useful patterns in log analysis and output parsing.
bash
sort data.txt | uniq -c | sort -rn
Output:
3 cherry
2 apple
1 banana
Highest count first. This three-command pipeline appears constantly in security work.
-d — Show Only Duplicate Lines
Prints only lines that appear more than once — one copy of each duplicate.
bash
sort usernames.txt | uniq -d
Shows which usernames appear multiple times in the list — useful for finding repeated entries in a data dump or identifying reused credentials.
-u — Show Only Unique Lines
The opposite of -d. Prints only lines that appear exactly once — entries with no duplicates anywhere in the input.
bash
sort usernames.txt | uniq -u
-i — Case-Insensitive Comparison
Treats lines as duplicates even if they differ only in case.
bash
sort -f usernames.txt | uniq -i
Admin, admin, and ADMIN are treated as the same line. One copy survives.
-f n — Skip First n Fields
Ignores the first n whitespace-separated fields when comparing lines for duplicates.
bash
uniq -f 1 data.txt
Compares lines starting from field 2 onward. Useful when lines have a timestamp or sequence number in field 1 that you want to ignore during deduplication.
-s n — Skip First n Characters
Ignores the first n characters of each line when comparing.
bash
uniq -s 8 logfile.txt
Skips the first 8 characters (often a timestamp prefix) and compares the rest of each line.
uniq in Security Workflows
Count and Rank IP Addresses in a Log
bash
grep -oE "b([0-9]{1,3}.){3}[0-9]{1,3}b" access.log | sort | uniq -c | sort -rn
grep extracts every IP. sort groups identical IPs together. uniq -c counts each one. sort -rn ranks highest first.
Output shows you exactly which IPs hit the server most — your top talkers, potential scanners, or brute-force sources at a glance.
Find Duplicate Usernames in a Dump
bash
sort usernames.txt | uniq -d
Any username appearing more than once surfaces immediately.
Count Unique vs Total
bash
sort ips.txt | uniq | wc -l
How many unique IPs are in this list? wc -l counts the lines after deduplication.
Frequency Analysis on HTTP Status Codes
bash
cut -d' ' -f9 access.log | sort | uniq -c | sort -rn
In the Combined Log Format used by Apache and Nginx, the HTTP status code sits at field 9 when the line is space-delimited. Log formats vary — verify your field position against a sample line before relying on the number. sort groups the codes. uniq -c counts each. sort -rn ranks by frequency.
Output:
8423 200
1204 404
341 403
89 500
12 301
The traffic distribution, error rate, and whether something is hammering your 403s or 500s — visible in seconds.
Find Unique Ports Across Multiple Scans
bash
cat scan1.txt scan2.txt scan3.txt | grep "open" | cut -d'/' -f1 | sort -n | uniq
Combine output from multiple nmap runs, extract open ports, sort numerically, deduplicate. One clean list of every unique open port found across all scans.
Detect Password Reuse in a Credential Dump
bash
cut -d':' -f2 creds.txt | sort | uniq -d
cut extracts the password field. sort groups identical passwords. uniq -d shows only duplicates — passwords used by more than one account.
Rank User-Agent Strings from Web Logs
bash
cut -d'"' -f6 access.log | sort | uniq -c | sort -rn | head -20
In Combined Log Format, splitting on double-quotes puts the User-Agent string at field 6. This assumes the format has not been customized — check a sample line if results look wrong. sort + uniq -c counts each distinct User-Agent. sort -rn ranks by frequency. head -20 shows the top 20.
Rare or unusual User-Agent strings near the bottom often indicate scanners, custom tooling, or automated clients worth investigating.
Part Three — sort and uniq Together
The flags make sense individually. The power comes from chaining them.
The Core Pipeline
bash
sort | uniq -c | sort -rn
This three-step pipeline is the foundation of frequency analysis in the terminal. You will use it constantly.
sort— group identical lines togetheruniq -c— count each groupsort -rn— rank by count, highest first
Everything else is just feeding different data into this pipeline.
Top Attacking IPs
bash
grep "Failed password" /var/log/auth.log | grep -oP "(?<=from )S+" | sort | uniq -c | sort -rn | head -10
Step by step:
- grep finds failed SSH login lines
- grep -oP extracts only the source IP using a lookbehind
- sort groups identical IPs together
- uniq -c counts each IP
- sort -rn ranks highest first
- head -10 shows only the top 10
One pipeline. Immediate visibility into your top brute-force sources.
Deduplicate a Wordlist
bash
sort wordlist.txt | uniq > clean_wordlist.txt
Sorts the list, removes duplicates, and writes to a new file. Smaller and cleaner for the next tool.
Compare Two Lists — What Is in One but Not the Other
bash
sort list1.txt list2.txt | uniq -u
This works correctly only when each value appears exactly once in each file. When you combine both files and sort them, a value present in both lists appears twice — uniq -u filters it out. A value present in only one list appears once — uniq -u keeps it.
The caveat: if a value appears more than once within a single file, the count changes and the result is unreliable. For a clean set difference on well-formed lists, this pattern works. For anything with internal duplicates, use comm instead — it is purpose-built for comparing sorted files.
bash
comm -23 <(sort list1.txt) <(sort list2.txt)
comm -23 prints only lines unique to the first file. -13 gives lines unique to the second. -12 gives lines in both.
Count Unique Values in a Specific Field
bash
cut -d':' -f1 /etc/passwd | sort | uniq -c | sort -rn
Extracts usernames, counts each one. In /etc/passwd every username should appear once — if any show up with a count greater than 1, something is wrong.
Rank Error Types in Application Logs
bash
grep "ERROR" app.log | cut -d' ' -f5- | sort | uniq -c | sort -rn | head -15
grep filters to error lines. cut extracts everything from field 5 onward — adjust the field number to match your log format, since application log structures vary. sort + uniq -c counts each distinct error message. sort -rn ranks by frequency. head -15 shows the most common ones.
Immediately separates systemic errors from one-off events.
Quick Reference
sort
| Flag | What It Does |
|—-|—-|
| -n | Sort numerically |
| -r | Reverse sort order |
| -u | Sort and remove duplicates |
| -f | Case-insensitive sort |
| -k n | Sort by field n |
| -k n,n | Sort by field n only |
| -t 'x' | Use x as field separator for -k |
| -h | Sort by human-readable size (K, M, G) |
| -R | Randomize order |
| -rn | Numeric sort, highest first (common combo) |
uniq
| Flag | What It Does |
|—-|—-|
| (no flag) | Remove consecutive duplicate lines |
| -c | Prefix each line with its occurrence count |
| -d | Show only lines that appear more than once |
| -u | Show only lines that appear exactly once |
| -i | Case-insensitive comparison |
| -f n | Skip first n fields when comparing |
| -s n | Skip first n characters when comparing |
The Core Pipeline
bash
sort input | uniq -c | sort -rn
Group → Count → Rank. Use this for any frequency analysis task.
Closing
sort and uniq do not find vulnerabilities. They do not generate payloads. What they do is take the raw output of every other tool you run and make it readable and actionable.
Frequency analysis is one of the most underrated skills in security work. Knowing which IP sent 3,000 requests in ten minutes, which password is shared across 40 accounts, which error type fires hundreds of times per hour — that is the signal buried in the noise. One pipeline pulls it out.
bash
sort | uniq -c | sort -rn
That is the whole idea.