Python : 10MB 이상의 파일중에서 중복된 파일 찾기

2011. 7. 12. 14:26

PHP, Ruby 에 이어서, Python 에서도 중복파일 찾기를 만들어보았다.

# -*- coding: cp949 -*-
# 버전 3.2.2 이상
 
from operator import itemgetter
from hashlib import md5
import os
 
TARGET_DIR   = "M:\\PATH\\TO\\특정디렉토리"
LIMITED_SIZE = 10*(1024*1024) # 10MB
 
 
def md5sum(filename, buf_size=4068):
    m = md5()
    with open(filename, 'rb') as f:
        data = f.read(buf_size)
        while data:
            m.update(data)
            data = f.read(buf_size)
    return m.hexdigest()
 
 
def main():
    hash_cnt  = {}
    file_list = []
    for p, ds, fs in os.walk(TARGET_DIR):
        for f in fs:
            filename = os.path.join(p, f)
            if not os.path.isfile(filename) : continue
            if os.path.islink(filename) : continue
            if os.path.getsize(filename) < LIMITED_SIZE: continue
            crc = md5sum(filename)
            if crc in hash_cnt:
                hash_cnt[crc] = hash_cnt[crc] + 1
            else:
                hash_cnt[crc] = 1
            file_list.append(crc+"|"+filename)
 
    for hash, cnt in sorted(hash_cnt.items(), key=itemgetter(1), reverse=True):
        if cnt < 2: continue
        print("\n["+hash+"]")
        for item in file_list:
            (hash2, filename) = item.split("|")
            if hash == hash2 : print(filename)
 
 
if __name__ == '__main__':
    main()

저작자표시 비영리 동일조건

'Language > Python' 카테고리의 다른 글

FreeBSD 9.0 에서 pysqlite 설치 (0)	2012.02.15
Python : 딕셔너리, 값으로 정렬하기 (0)	2011.07.13
pythonbrew 를 이용한 여러 버전의 Python 설치 (0)	2011.05.07
Pydev 구성 (eclipse:helios) (0)	2010.11.15
Python 에서 엑셀파일 만들기 (0)	2009.06.09

취미로 코딩하기

Python : 10MB 이상의 파일중에서 중복된 파일 찾기

'Language > Python' 카테고리의 다른 글

+ Recent posts

티스토리툴바