MathBench > Probability

BLAST and (Im)probability

What can you say with 4 letters?

Just in case you're not a Nancy Drew fan, Nancy is the girl detective, Bess is the pleasingly plump if clueless girlfriend, and George is the tomboy. That just about sums up most of the plots. Anyway, in some updated universe, Bess writes for a fashion magazine and George runs the Human Genome Project. Where she has just discovered 'the sleuthing gene'. And is being interrogated by Bess.

"So you're telling me that you have an 'alphabet' with just 4 letters? "

"Yup, just A, C, T and G"

dna sculpture" It doesn't seem like you could say anything very interesting with 4 letters. I mean, come on, you'd be repeating yourself constantly!!"

"That's not true. The chances of repeating even a moderately long string of DNA are astronomically low. Even a single 3-letter 'word'..."

"Oh really? They all look the same to me. There's CAT and TAT and TAG and TAA... It's not exactly Shakespeare. I bet there aren't more than about a dozen possibilities."

How many possible 3-letter 'words' can you make with A, C, T, and G?

(To make this problem interactive, turn on javascript!)

I think I have the answer: 4*4*4 = 64 possibilities.

"Wait a minute," interrupts a disgruntled Bess. "I know there are only 20 amino acids, and every 3 bases code for one amino acid, so there can't be 64 possibilities, but only 20."

Patiently George explains that the reason there are 64 three-letter combinations but only 20 amino acids is that most amino acids can be coded in at least 2 ways. Thus, CAT is the same as CAC, as far as the cell's machinery is concerned. They both code for histidine. Nevertheless, they are still distinct combinations of A, C, T, and G.