Utation rate and several other bioinformatic estimates of functionality [3]. The nine CAN genes showed a bias towards the earlier category, six classified earlier (INHBE, KIAA0427/CTIF, MYH9, PCDHB15, RNU3IP2/RRP9, TP53) and three in the later category (ABCB8, KIAA0934/DIP2C, NCB5OR/CYB5R4). Strikingly different from the overall distribution of mutations in HCC1187 was the proportion of sequence-level truncation mutations in earlier rather than later categories: All eight classifiable INDEL mutations happened earlier, and combining this figure with nonsense mutations showed 11/13 (85 ) protein Title Loaded From File truncating mutations happened earlier. This difference in proportion (11/13 truncating vs. 23/58 missense) is statistically significant (p,0.01 for chi-squared test with continuity correction).We used a statistical model to estimate the number of mutations that showed non-random timing. The model assumed that any given class of mutations is a mixture of non-random mutations that must happen earlier (that is, before endoreduplication) and randomly timed mutations that can happen earlier or later. The randomly timed mutations are classified as earlier with probability p and later with probability 1-p, independently for each such mutation. We find the most likely number, n, of non-randomly timed mutations (the maximum likelihood estimate, or MLE) and its 95 percent lower confidence bound, given an estimate of p. Further details of the model may be found in File S3. Estimates of p based on total missense mutations or those predicted to be non-functional (see Table 1) are 0.40 ( = 23/58) or 0.32 ( = 9/28), respectively, and a plausible upper bound would be 0.59 ( = 13/22), the proportion of earlier chromosome translocations. Most classes of mutation, including non-synonymous point mutations, chromosome translocations, duplications, deletions, predicted functional mutations and CAN genes did not show any excess of mutation earlier or later. However, the observed proportion of truncating mutations falling earlier (11/13) suggests that n .0. When p = 0.4, the MLE is n = 10 mutations that had to happen before endoreduplication, with a lower confidence bound of 6 (File S3) [24]. For p = 0.32 n = 10, lower bound 7. Thus our simple statistical model suggests that a number of the truncating mutations had to occur before endoreduplication. When we use the high estimate for p, p = 0.59, the MLE was n = 9, but the lower confidence bound is 0, so data from more tumors would be required.DiscussionWe present one of the most complete studies of any cancer genome to date, combining the coding sequence scan of Wood et al [3] with molecular cytogenetic analysis of genome rearrangement. We were able to Title Loaded From File deduce for most of the mutations and genome rearrangements whether they most likely occurred before or after endoreduplication of the genome, giving us a picture of the pattern of mutation before and after this time point, for this case. Such detailed analysis was limited to a single cell line as this was the only example so far of a breast cancer cell line for which there is rather complete coding sequence data, cytogenetic data and evidence of endoreduplication, but it serves to demonstrate the feasibility and potential interest of the approach.The Earlier Versus Later ClassificationEndoreduplication in HCC1187 1676428 proved to be a useful milestone, because numbers of structural changes and point mutations were fairly equally distributed between the earlier and later categorie.Utation rate and several other bioinformatic estimates of functionality [3]. The nine CAN genes showed a bias towards the earlier category, six classified earlier (INHBE, KIAA0427/CTIF, MYH9, PCDHB15, RNU3IP2/RRP9, TP53) and three in the later category (ABCB8, KIAA0934/DIP2C, NCB5OR/CYB5R4). Strikingly different from the overall distribution of mutations in HCC1187 was the proportion of sequence-level truncation mutations in earlier rather than later categories: All eight classifiable INDEL mutations happened earlier, and combining this figure with nonsense mutations showed 11/13 (85 ) protein truncating mutations happened earlier. This difference in proportion (11/13 truncating vs. 23/58 missense) is statistically significant (p,0.01 for chi-squared test with continuity correction).We used a statistical model to estimate the number of mutations that showed non-random timing. The model assumed that any given class of mutations is a mixture of non-random mutations that must happen earlier (that is, before endoreduplication) and randomly timed mutations that can happen earlier or later. The randomly timed mutations are classified as earlier with probability p and later with probability 1-p, independently for each such mutation. We find the most likely number, n, of non-randomly timed mutations (the maximum likelihood estimate, or MLE) and its 95 percent lower confidence bound, given an estimate of p. Further details of the model may be found in File S3. Estimates of p based on total missense mutations or those predicted to be non-functional (see Table 1) are 0.40 ( = 23/58) or 0.32 ( = 9/28), respectively, and a plausible upper bound would be 0.59 ( = 13/22), the proportion of earlier chromosome translocations. Most classes of mutation, including non-synonymous point mutations, chromosome translocations, duplications, deletions, predicted functional mutations and CAN genes did not show any excess of mutation earlier or later. However, the observed proportion of truncating mutations falling earlier (11/13) suggests that n .0. When p = 0.4, the MLE is n = 10 mutations that had to happen before endoreduplication, with a lower confidence bound of 6 (File S3) [24]. For p = 0.32 n = 10, lower bound 7. Thus our simple statistical model suggests that a number of the truncating mutations had to occur before endoreduplication. When we use the high estimate for p, p = 0.59, the MLE was n = 9, but the lower confidence bound is 0, so data from more tumors would be required.DiscussionWe present one of the most complete studies of any cancer genome to date, combining the coding sequence scan of Wood et al [3] with molecular cytogenetic analysis of genome rearrangement. We were able to deduce for most of the mutations and genome rearrangements whether they most likely occurred before or after endoreduplication of the genome, giving us a picture of the pattern of mutation before and after this time point, for this case. Such detailed analysis was limited to a single cell line as this was the only example so far of a breast cancer cell line for which there is rather complete coding sequence data, cytogenetic data and evidence of endoreduplication, but it serves to demonstrate the feasibility and potential interest of the approach.The Earlier Versus Later ClassificationEndoreduplication in HCC1187 1676428 proved to be a useful milestone, because numbers of structural changes and point mutations were fairly equally distributed between the earlier and later categorie.