Abstract:In document similarity detection, coarse grain will reduce the accuracy and too small particle size will increase the computation time. Proposes a quick document similarity detection method based on b-bit Minwise Hash. Firstly extracts the document text to generate a grouping fingerprint features; Then establishes the index structure of fine-grained grouping fingerprint; Finally computes the resemblance of document part by Hamming distance, and stores and displays the evidence of similarity by XML document format. Through system practice, verifies the effectiveness of the method and increases the efficiency of retrieval.