Let's say that you have two large unsorted files, each of which is essentially a long list of strings, and you want to find their intersection. There's the simple brute-force way:
while read line; do grep "$line" file2; done < file1
There's a problem - this is quadratic! Fortunately, we can cut this down pretty easily with the old time-memory tradeoff:
sort file1 > file1.sorted
sort file2 > file2.sorted
sort -m file1.sorted file2.sorted > combined
diff combined <(uniq combined)
No comments:
Post a Comment