Bash : check or remove duplicate lines of file (« sort -u » vs « awk » benchmark)
Remove duplicate lines of file input.txt
sort -u input.txt
Or
awk '!a[$0]++' input.txt
Performance, for big file (24,8 M lines)
$ time awk '!a[$0]++' input.txt | wc -l 24800000 real 0m41.185s user 0m39.145s sys 0m1.214s $ time sort -u input.txt | wc -l 24800000 real 3m47.940s user 3m38.976s sys 0m1.976s