Drawing of DNA restriction map
?
Draw DNA restriction maps are an important issue in genetic biology. Since the DNA molecule is very long, the current experimental technique cannot be directly measured, so the biologists need to cut the DNA molecule, one segment to measure. During the cutting, the DNA fragment is lost in the order of the original DNA molecule, and how to find the order of the arrangement of these segments is a key issue.
In order to construct a restriction map, biologists use different biochemical techniques to obtain indirect information about the map, and then use these data to reconstruct the map with these data. One method is to digest DNA molecule with restriction enzyme. These enzymes cut DNA chain in a restriction site, and each enzyme corresponding to the restriction site is different. For each kind of enzyme, each DNA molecule may have multiple restriction sites, at which point you can choose to cut a certain site (not necessarily continuous) as needed. After the DNA molecule is cut, the length of each segment obtained is to reconstruct the basic information of the original order of these segments. In a variety of experimental methods for obtaining this information, there is a widely used method: the THE Partial Digest, PDP method.
In PDP, a enzyme is employed, and the length between the segments between any two restriction sites is obtained. Assuming that the restriction site corresponding to the enzyme used is N, through a large number of experiments, the distance between N 2 points (N position points plus two end points), a total
Value. Then use this
The distance to reconstruct the position of N restriction sites (solve unique, the two endpoints correspond to the longest distance). If
Is the point set on the line segment
The collection of distance between all points, PDP is given
begging
. The figure below gives an example.
?
???????? 2 ?????????????????????? 5 ???? ????? ??2
?
???????
? A ?????? a ?????????? b ??????????????????? D ???? ?? B
Figure 1. ?? a, b is two endpoints of DNA molecules. A, B, C and D are restriction sites. • You can get experiments
= {2, 3, 4, 5, 2, 5, 9, 14, 16, 7, 12, 14, 9, 11, 7}.
Come to seek
, Corresponding to the above figure
= {0, 2, 5, 9, 14, 16} is a solution.
?
The above method is to cut the DNA molecules at any two restriction sites, which is quite difficult for current experimental techniques, and it is also difficult to process the experimental data. Recent researchers have proposed a new method called a simplified partial digestive method (SPDP). This method and PDP differ in that it avoids difficulties in any two sites to cut DNA molecules and to process duplicate data. It is still assumed that there is n to have n. First, the DNA molecule is copied into N 1, and each of the pre-N compound is cut at a restriction site, and the last replicate is cut at all restriction sites. Thus we gave a 2N fragment length (referred to as the first set of data) and N 1 segment length (referred to as a second set of data). Under the premise without errors, 2N length in the first set of data can be divided into N pairs, each pair and all equal to the total length of the DNA molecule; N 1 length in the second set of data and the total of DNA molecules length. The SPDP problem is how to reconstruct this N 1 fragment in the DNA molecule using the two sets of data, so that the 2N segment length obtained by the N position is equal to 2N length obtained by the experiment. . The figure below gives an example.
? (a)
2 ?????? 6 ???????? 1 ?????? 4 ?????? 3
?
(b)
?????????? 2 ?????????????????? 14
?
?????????????????? 8 ????????????????? 8?
?????????????????? 9 ??????????????????? 7
?
?????????????????????????????????? 3
?
??????????
2 ???? 1 ???????? 4 ???????????????????????? 6
?
?
Figure 2. This example is 4 corresponding to the site. (a) is the order we want to reconstruct. The first 4 pair of 4 pairs of data is the first set of data, which is obtained by cutting a site, each pair of lengths are 16, and the remaining is the second set of data, including 5 segment length, which is cut Open all sites, their length is also 16, but the experimental results are only inform to each of the lengths, and they do not know that they are arranged in DNA molecules.
?
Now, the above SPDP problem is created, and the mathematical model is established, and the following issues are studied:
(1) Design to solve the algorithm for this problem and evaluate the efficiency and effect of the algorithm. A answer to the following 2 instances:
Example 1: First set of data: 2, 14, 8, 8, 9, 7, 13, 3
The second group of data: 2, 1, 4, 3, 6
?
Example 2: 1 The first set of data: 1, 14, 12, 3, 7, 8, 9, 6, 11, 4, 12, 3, 13, 2, 5, 10
The second group of data: 1, 1, 2, 1, 2, 2, 1, 2, 3
?
(2) ??? Discuss the error of the measurement fragment length during the experiment, will affect the effect of the algorithm, when the error is large, and the reconstruction of the restriction map will not be possible.