The little things give you away... A collection of various small helper stuff
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

18 lines
820 B

  1. #!/bin/bash
  2. function usage_exit {
  3. echo 'Usage: dedupe FILE1 FILE2' >&2
  4. echo >&2
  5. echo 'Prints all lines from FILE2 that do not appear in FILE1, in the order of FILE2.' >&2
  6. echo "WARNING: FILE1 has to be read into memory fully, and memory use scales with about a factor 40 of FILE1's size. If your files are sorted, use comm instead." >&2
  7. exit $1
  8. }
  9. if [[ "$1" == '-h' || "$1" == '--help' ]]; then usage_exit 0; fi
  10. if [[ $# -ne 2 ]]; then usage_exit 1; fi
  11. # Perl seems to be ~30 % faster than AWK for this, but grep is ~2-3 times faster than Perl.
  12. # AWK uses the least memory, Perl about 1.5 times as much, grep twice as much (as AWK).
  13. #awk 'NR==FNR { s[$0]=1; next; } !($0 in s)' "$1" "$2"
  14. #perl -ne 'if (@ARGV == 1) { $seen{$_}=1; } else { print $_ if !(exists $seen{$_}); }' "$1" "$2"
  15. grep -F -x -v -f "$1" "$2"