Bash : check or remove duplicate lines of file (« sort -u » vs « awk » benchmark)

Publié le 24 septembre 201929 mai 2022 par admin5699

Remove duplicate lines of file input.txt

sort -u input.txt

Or

awk '!a[$0]++' input.txt

Performance, for big file (24,8 M lines)

$ time awk '!a[$0]++' input.txt | wc -l
24800000

real 0m41.185s
user 0m39.145s
sys 0m1.214s

$ time sort -u input.txt | wc -l
24800000

real 3m47.940s
user 3m38.976s
sys 0m1.976s