"Two Identical Sequences" vs. "Two Similar Sequences" ===================================================== by Alan Kwan 4 March 1999 This article is a re-organization of an idea of mine earlier. It compares the "likelihood" of two New Style Mahjong patterns, "2 Identical Sequences" and "2 Similar Sequences". I think these 'scientific' names are self-explanatory, but anyway here are the examples: 2 Identical Sequences: C-234 C-234 plus 2 sets and a pair 2 Similar Sequences: C-234 B-234 plus 2 sets and a pair The reader is assumed to possess some background in elementary Combinatorics. The Problem ----------- How do we compare the "likelihood", or "chances", of these 2 patterns? Here I'll illustrate the question with this simplified problem: "If we randomly draw 6 tiles from the 136-tile deck, what is the ratio of the probability of drawing 2 identical sequences to the probability of drawing 2 similar sequences?" This simple and well-defined problem looks like a reasonable approach to the question. The Errors ---------- Now, I'll start by refuting several incorrect solutions. Error #1: The ratio is 1:1, since there are "as many" identical sequences as similar ones. For each number, there are 3 cases for identical (BB, CC, DD) and 3 cases for similar (BC, CD, BD). Refutation #1: This way of counting neglects the fact that the "cases" are not equiprobable. Every backgammon player (worth his salt) knows that one is twice as likely to get 1-2 (which includes both 1-2 and 2-1) as it is to get 1-1 on 2 dice. Error #2: The ratio is 1:2, taking the above "backgammon dice" factor into account. That is, for each number, there are 3 cases for identical and 6 cases for similar (BC, BD, CB, CD, DB, DC). Refutation #2: This solution still misses the problem. The "cases" are not equiprobable. Error #3: We should not forget that there are 4 of each tile, but to get the second of 2 identical sequences, we can only take from the remaining 3 tiles. Thus there should also be a factor of (4/3)^3. So the ratio is 1:(2*(4/3)^3), or approximately 1:4.74 . Refutation #3: Still not correct. See the correct solution. The Correct Solution -------------------- Okay, let's do it the full, correct way. First, we (still) reduce the problem to one number (such as 123) only, since it is obvious that, by symmetry, the probability of 2 identical sequences is 7 times that of 2 identical 123 sequences, and the same for similar sequences. So we can factor out the common factor of 7. Take one suit, say C. There are 4 C1, C2, and C3 each, and we need to draw 2 of each to get 2 identical C-123 sequences. There are C(4,2)^3 tile combinations we're looking at. Since there are 3 suits, by symmetry, there are 3*C(4,2)^3 = 3*6^3 = 2^3 * 3^4 tile combinations for 2 identical 123 sequences. Now, for similar sequences. Take 2 suits, say B and C. There are 4 B1, B2, B3, C1, C2, C3 each, and we need to draw 1 of each. There are C(4,1)^6 = 4^6 tile combinations for 2 similar sequences with B-123 C-123. Since we're taking 2 suits out of 3, there is also a factor of C(3,2), which is 3. So there are 3*4^6 = 2^12 * 3 tile combinations for 2 similar 123 sequences. Now we get the ratio we want: (2^3 * 3^4) : (2^12 * 3) = (3^3) : (2^9) = 27 : 512 which is close to 1:19 ! Also note that the incorrect solution in Error #3 is off by exactly a factor of 4. Error #3 misses the fact that a sequence consists of 3 tiles, so the "backgammon dice" factor should be applied 3 times, not just once. Each factor is a double, and 2 doubles have been missed, so the error is a factor of 4. Insights -------- A factor of 19 is generally too large a difference for 2 'easily comparable' low-value patterns to be assigned the same value. We see many cases where a factor of 4 (self-draw) or 10 (/ippatsu/ in Modern Japanese) gets an extra "faan". In fact, Perlmen & Chan suggests different values for these 2 patterns (pp. 80-81). Thus, if we see a scoring system where these 2 patterns are assigned the same value (when the rest of the system gives credit for smaller differences), we have good reasons to believe that the designer(s) of the system has based his decision on one of the erroneous solutions above.