Sort and Uniq — How to Turn Noise Into Signal


From deduplication to frequency analysis, learn how to turn raw terminal output into actionable insight.


Raw output lies to you.

A list of 3,000 IP addresses means nothing until you know which ones appear 500 times and which ones appear once. A credential dump is useless until you know which passwords are shared across accounts. Log files tell you nothing until events are ranked by frequency.

sort and uniq answer those questions. They do not find things and they do not extract things — they organize and count what you already have. In security work, that step is often where the actual insight lives.

This article covers both tools from scratch: every flag worth knowing and how they fit into real security pipelines.


Two Tools, One Workflow

sort — takes lines of input and puts them in order. Alphabetical, numerical, reverse, by field. The ordering itself is often the goal, but sort also sets up uniq to work correctly.

uniq — removes or counts duplicate lines. It only works on adjacent duplicates — consecutive lines that are identical. This is why sort almost always comes first.

They are separate tools that solve separate problems, but in practice they are almost always used together.


Part One — sort


What sort Does

sort reads lines from input and writes them back out in sorted order. The basic call:

bash

sort filename

Or pass output from another command:

bash

some_command | sort

By default, sort orders lines lexicographically — the same way a dictionary orders words. It compares character by character from left to right.

Input:

banana
apple
cherry
date

Output:

apple
banana
cherry
date

That is the default. The flags are where the real control comes in.


The Core Flags


-n — Sort Numerically

Default sort is lexicographic, which means numbers sort as strings. 10 comes before 2 because 1 comes before 2 in character order.

bash

sort numbers.txt

Input:

10
2
30
5

Lexicographic output:

10
2
30
5

That is wrong for numbers. -n fixes it:

bash

sort -n numbers.txt

Output:

2
5
10
30

When to use it: Any time you are sorting counts, port numbers, sizes, UIDs, or anything numeric. If the values are numbers, always use -n.


-r — Reverse the Sort Order

Reverses whatever order sort would normally produce.

bash

sort -r names.txt

Alphabetical becomes reverse alphabetical. Numeric ascending becomes descending.

bash

sort -rn numbers.txt

Combines -r and -n — numeric sort, highest first. This is the pattern you will use constantly: rank by frequency, highest count at the top.


-u — Sort and Remove Duplicates

-u tells sort to output only unique lines — the first occurrence of each value, duplicates discarded.

bash

sort -u ips.txt

Sorts the list and removes any duplicate IP addresses in one step. This is a shortcut for sort | uniq when you only need deduplication and not counts.


-f — Case-Insensitive Sort

Treats uppercase and lowercase as equivalent when sorting.

bash

sort -f names.txt

Admin, admin, and ADMIN sort to the same position. Without -f, uppercase letters sort before lowercase in ASCII order, which can give counterintuitive results.


-k — Sort by a Specific Field

By default sort uses the entire line. -k tells it to sort by a specific column.

bash

sort -k2 data.txt

Sorts by the second whitespace-separated field.

bash

sort -k2 -n data.txt

Sorts by the second field, numerically.

The field syntax is worth knowing precisely. -k2,2 means “start at field 2, end at field 2” — sort only on that field. -k2 without an end position means “start at field 2, continue to end of line,” which can produce unexpected results on lines with trailing fields. For a single-field sort, always use -k n,n.


-t — Define the Field Separator for -k

By default sort splits fields on whitespace. -t changes the separator so -k works on delimited data.

bash

sort -t':' -k3 -n /etc/passwd

Split on :, sort by field 3 (the UID), numerically. Shows you users ordered by UID from lowest to highest.


-h — Human-Readable Sort

Sorts values that include size suffixes — K, M, G — correctly.

bash

du -sh * | sort -h

Without -h, 10M sorts before 2G lexicographically because 1 comes before 2. With -h, it correctly treats 2G as larger.

When to use it: Sorting file sizes, disk usage output, anything with human-readable size units.


-R — Randomize Order

Shuffles lines into random order. Not often needed in analysis, but useful for sampling a large dataset or randomizing a wordlist.

bash

sort -R wordlist.txt

sort in Security Workflows


Sort a List of IPs Numerically

bash

sort -t'.' -k1,1n -k2,2n -k3,3n -k4,4n ips.txt

Sorts IP addresses numerically by each octet. Each field is separated by . and sorted as a number. The result is a properly ordered IP list — not the lexicographic mess you get from a plain sort.


Sort Nmap Output by Port Number

bash

grep "open" nmap.txt | cut -d'/' -f1 | sort -n

grep filters open ports. cut extracts the port number. sort -n orders them numerically lowest to highest.


Sort Files by Size

bash

ls -lh | sort -k5 -h

List files with human-readable sizes, then sort by the size column (field 5) using human-readable sort.


Order Discovered Paths Alphabetically

bash

grep "Status: 200" gobuster.txt | cut -d' ' -f1 | sort

Clean alphabetical list of discovered paths. Easier to read and identify patterns than unsorted output.


Part Two — uniq


What uniq Does

uniq filters adjacent duplicate lines from input. It compares each line to the one immediately before it — if they are identical, the duplicate is removed or counted depending on the flags you use.

The critical point: uniq only acts on adjacent duplicates. Lines must be consecutive to be compared. This is why sort almost always comes first — it groups identical lines together so uniq can process them reliably.

bash

sort data.txt | uniq

This is the standard pattern. sort groups duplicates together, uniq removes them.


The Core Flags


No Flag — Remove Duplicates

bash

uniq filename

Removes consecutive duplicate lines. Each unique line appears once.

Input (already sorted):

apple
apple
banana
cherry
cherry
cherry

Output:

apple
banana
cherry

-c — Count Occurrences

This is the flag you will use most. -c prepends a count to each line — how many times that line appeared in the input.

bash

sort data.txt | uniq -c

Output:

      2 apple
      1 banana
      3 cherry

The count is on the left, the value on the right. Combined with sort -rn, this gives you a frequency-ranked list — one of the most useful patterns in log analysis and output parsing.

bash

sort data.txt | uniq -c | sort -rn

Output:

      3 cherry
      2 apple
      1 banana

Highest count first. This three-command pipeline appears constantly in security work.


-d — Show Only Duplicate Lines

Prints only lines that appear more than once — one copy of each duplicate.

bash

sort usernames.txt | uniq -d

Shows which usernames appear multiple times in the list — useful for finding repeated entries in a data dump or identifying reused credentials.


-u — Show Only Unique Lines

The opposite of -d. Prints only lines that appear exactly once — entries with no duplicates anywhere in the input.

bash

sort usernames.txt | uniq -u

-i — Case-Insensitive Comparison

Treats lines as duplicates even if they differ only in case.

bash

sort -f usernames.txt | uniq -i

Admin, admin, and ADMIN are treated as the same line. One copy survives.


-f n — Skip First n Fields

Ignores the first n whitespace-separated fields when comparing lines for duplicates.

bash

uniq -f 1 data.txt

Compares lines starting from field 2 onward. Useful when lines have a timestamp or sequence number in field 1 that you want to ignore during deduplication.


-s n — Skip First n Characters

Ignores the first n characters of each line when comparing.

bash

uniq -s 8 logfile.txt

Skips the first 8 characters (often a timestamp prefix) and compares the rest of each line.


uniq in Security Workflows


Count and Rank IP Addresses in a Log

bash

grep -oE "b([0-9]{1,3}.){3}[0-9]{1,3}b" access.log | sort | uniq -c | sort -rn

grep extracts every IP. sort groups identical IPs together. uniq -c counts each one. sort -rn ranks highest first.

Output shows you exactly which IPs hit the server most — your top talkers, potential scanners, or brute-force sources at a glance.


Find Duplicate Usernames in a Dump

bash

sort usernames.txt | uniq -d

Any username appearing more than once surfaces immediately.


Count Unique vs Total

bash

sort ips.txt | uniq | wc -l

How many unique IPs are in this list? wc -l counts the lines after deduplication.


Frequency Analysis on HTTP Status Codes

bash

cut -d' ' -f9 access.log | sort | uniq -c | sort -rn

In the Combined Log Format used by Apache and Nginx, the HTTP status code sits at field 9 when the line is space-delimited. Log formats vary — verify your field position against a sample line before relying on the number. sort groups the codes. uniq -c counts each. sort -rn ranks by frequency.

Output:

   8423 200
   1204 404
    341 403
     89 500
     12 301

The traffic distribution, error rate, and whether something is hammering your 403s or 500s — visible in seconds.


Find Unique Ports Across Multiple Scans

bash

cat scan1.txt scan2.txt scan3.txt | grep "open" | cut -d'/' -f1 | sort -n | uniq

Combine output from multiple nmap runs, extract open ports, sort numerically, deduplicate. One clean list of every unique open port found across all scans.


Detect Password Reuse in a Credential Dump

bash

cut -d':' -f2 creds.txt | sort | uniq -d

cut extracts the password field. sort groups identical passwords. uniq -d shows only duplicates — passwords used by more than one account.


Rank User-Agent Strings from Web Logs

bash

cut -d'"' -f6 access.log | sort | uniq -c | sort -rn | head -20

In Combined Log Format, splitting on double-quotes puts the User-Agent string at field 6. This assumes the format has not been customized — check a sample line if results look wrong. sort + uniq -c counts each distinct User-Agent. sort -rn ranks by frequency. head -20 shows the top 20.

Rare or unusual User-Agent strings near the bottom often indicate scanners, custom tooling, or automated clients worth investigating.


Part Three — sort and uniq Together

The flags make sense individually. The power comes from chaining them.


The Core Pipeline

bash

sort | uniq -c | sort -rn

This three-step pipeline is the foundation of frequency analysis in the terminal. You will use it constantly.

  • sort — group identical lines together
  • uniq -c — count each group
  • sort -rn — rank by count, highest first

Everything else is just feeding different data into this pipeline.


Top Attacking IPs

bash

grep "Failed password" /var/log/auth.log | grep -oP "(?<=from )S+" | sort | uniq -c | sort -rn | head -10

Step by step:

  1. grep finds failed SSH login lines
  2. grep -oP extracts only the source IP using a lookbehind
  3. sort groups identical IPs together
  4. uniq -c counts each IP
  5. sort -rn ranks highest first
  6. head -10 shows only the top 10

One pipeline. Immediate visibility into your top brute-force sources.


Deduplicate a Wordlist

bash

sort wordlist.txt | uniq > clean_wordlist.txt

Sorts the list, removes duplicates, and writes to a new file. Smaller and cleaner for the next tool.


Compare Two Lists — What Is in One but Not the Other

bash

sort list1.txt list2.txt | uniq -u

This works correctly only when each value appears exactly once in each file. When you combine both files and sort them, a value present in both lists appears twice — uniq -u filters it out. A value present in only one list appears once — uniq -u keeps it.

The caveat: if a value appears more than once within a single file, the count changes and the result is unreliable. For a clean set difference on well-formed lists, this pattern works. For anything with internal duplicates, use comm instead — it is purpose-built for comparing sorted files.

bash

comm -23 <(sort list1.txt) <(sort list2.txt)

comm -23 prints only lines unique to the first file. -13 gives lines unique to the second. -12 gives lines in both.


Count Unique Values in a Specific Field

bash

cut -d':' -f1 /etc/passwd | sort | uniq -c | sort -rn

Extracts usernames, counts each one. In /etc/passwd every username should appear once — if any show up with a count greater than 1, something is wrong.


Rank Error Types in Application Logs

bash

grep "ERROR" app.log | cut -d' ' -f5- | sort | uniq -c | sort -rn | head -15

grep filters to error lines. cut extracts everything from field 5 onward — adjust the field number to match your log format, since application log structures vary. sort + uniq -c counts each distinct error message. sort -rn ranks by frequency. head -15 shows the most common ones.

Immediately separates systemic errors from one-off events.


Quick Reference

sort

| Flag | What It Does |
|—-|—-|
| -n | Sort numerically |
| -r | Reverse sort order |
| -u | Sort and remove duplicates |
| -f | Case-insensitive sort |
| -k n | Sort by field n |
| -k n,n | Sort by field n only |
| -t 'x' | Use x as field separator for -k |
| -h | Sort by human-readable size (K, M, G) |
| -R | Randomize order |
| -rn | Numeric sort, highest first (common combo) |

uniq

| Flag | What It Does |
|—-|—-|
| (no flag) | Remove consecutive duplicate lines |
| -c | Prefix each line with its occurrence count |
| -d | Show only lines that appear more than once |
| -u | Show only lines that appear exactly once |
| -i | Case-insensitive comparison |
| -f n | Skip first n fields when comparing |
| -s n | Skip first n characters when comparing |

The Core Pipeline

bash

sort input | uniq -c | sort -rn

Group → Count → Rank. Use this for any frequency analysis task.


Closing

sort and uniq do not find vulnerabilities. They do not generate payloads. What they do is take the raw output of every other tool you run and make it readable and actionable.

Frequency analysis is one of the most underrated skills in security work. Knowing which IP sent 3,000 requests in ten minutes, which password is shared across 40 accounts, which error type fires hundreds of times per hour — that is the signal buried in the noise. One pipeline pulls it out.

bash

sort | uniq -c | sort -rn

That is the whole idea.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.