Approximate clustering of fingerprint vectors with missing values
We study the problem of clustering fingerprints with at most p missing values (CMV (p) for short) naturally arising in oligonucleotide fingerprinting, which is an efficient method for characterizing DNA clone libraries.We show that already CMV(2) is NP-hard. We also show that a greedy algorithm yields a min(1 + ln n, 2+pln l) approximation for CMV(p), and can be implemented to run in O(nl2p) time.