Post Syndicated from Anonymous original http://deliantech.blogspot.com/2017/05/file-deduplication-written-in-bash.html
It is not very hard and it is a curious problem, so I am publishing my code here:
#!/bin/bash
[ ! -d $1 ] && echo “$1 is not a directory! exit” && exit 1
cd $1
oldsize=”yyyyy”;oldname=”xxxxx”
find . -type f -ls | awk ‘{ print $7″:”$11 }’ | sort -k 1,1 -n -r | while read line; do
size=${line%%:*}
name=${line##:*}
if [ “$oldsize” == “$size” -a -f “$name” -a -f “$oldname” ] && diff -s “$oldname” “$name”; then
rm -f “$name”
ln “$oldname” “$name”
continue
fi
oldsize=”$size”
oldname=”$name”
done
I am wondering, would it be possible to be made even simpler…