Thread: Duplicate Fish
View Single Post
Old 10-27-2009, 02:32 AM   #6
jleslie
Engineer
 
jleslie's Avatar
 
Join Date: Aug 2002

Location: London, UK
Posts: 1,279
Also random doesn't work the way some people think, if it's really random and the odds of collisions go up quickly...

This is called the "Birthday Problem". This is because that while the chance of two people having the same birthday is 1 in 365.242 (I think), the number of people you need to have in a group to have a better of 50% chance of having two people with the same birthday is actually... 23! Once you get to 57 people the chance is over 99% (random people, not selected people).

This is due to the number of comparisons going up fast as numbers increase.
Person A has 22 chances of a match.
Then Person B has 21 chances (as already tried to match with A)
Person C has 20...
so have 22+21+20+... comparisons of birthdays... (that's 253 in total, to save lots of people's fingers.)

I think (from another post) this isn't strictly true in this case, but not a lot of people know about this so I thought I'd work it in. It's has many uses, including in Cryptography, where can get duplicates faster than people would think.

Another interesting application is in mass DNA matching (or any data mining operation). If the odds of a DNA match are 1 in 10 trillion (the best in current usage, this depends on the number of regions compared or loci, the number is for 13 loci, which I think is the largest in use, and assumes true randomness, which is unlikely for DNA) then a jury will think that they have the guilty person. However if you have a large number of people on file and do a lot of comparisons (most new crimes and a bunch of old ones) the number of comparisons goes up exponentially and heads into the trillions (ignoring the errors lab work always introduces). This means you actually get a lot of random matches and hope someone points it out to the jury.

In America a little while back a modest DNA database was used for some tests and generated several clearly random matches at lower loci.

A quote:
A recent analysis of the Arizona convicted offender data base (a database that uses the 13 CODIS loci) revealed that among the approximately 65,000 entries listed there were 144 individuals whose DNA profiles match at 9 loci (including one match between individuals of different races, one Caucasion, the other African American), another few who match at 10 loci, one pair that match at 11, and one pair that match at 12. The 11 and 12 loci matches were siblings, hence not random. But matches on 9 or 10 loci among a database as small as 65,000 entries cast considerable doubt in my mind on figures such as the oft-cited "one in ten trillion" for a match that extends to just 3 or 4 additional loci.
It works the same if you could track people. If there were enough people you would get random ones at all the crime scenes for a series of robberies, etc. You would hope the Police would get this before deciding to kick someone's door in at 3am...

BTW this isn't meant to be political, just a maths curiosity that I hope some will find interesting.

John
jleslie is offline   Reply With Quote