Through the previous project research, we have explained the overall idea of protein folding in the "Protein Folding Summary". In order to deploy protein folding applications to our P2P platform we have already built, we must conduct in-depth understanding of each step of the entire computing process. I will take this idea (this is my current work idea) to the current Work will be described and analyzed in detail the difficulties we have faced today:
1. Assign tasks to Worker for calculation (transfer the necessary parameter files, data files, and calculation procedures to worker). According to the original development plan, we should use the molecular motion simulation software Tinker as the code foundation of our Worker end, but because the software is written by Fortran, and the physical system does not contact us, all we decided to give up this program. . After further study, we chose NAMD (NAMD is parallel molecular dynamics software. Once I have won the 2002 Golden Bell Award, NAMD can run in parallel computing platforms that contain hundreds of CPUs can also run clusters that contain dozens of CPUs. And even only on a single machine that includes only one CPU, we are mainly running on a single machine). In order to avoid complicating problems, our ideas are try not to understand the internal algorithms (first because the amount of code is huge, only the code is 2 trillings, the second is professionalism is still very lacking, we are worried about large-scale modifications It will not be guaranteed to ensure the correctness of the program. Try to be familiar with the program's input and output files. It is hoped that the code modifies the code to our process framework by modifying the code.
There is a place on the NAMD website
Mindy
Molecular motion simulation software. By simplifying NAMD, Mindy is designed to sequentially execute molecular motion simulation software. This is exactly what we need. We can quickly grasp the runtime process, and its sequential meets the Worker's architecture. However, after studying and running the software, we have found that it is too small, such as: no operational result data (this is very life), can not set up the calculation (meaning only the protein in a specific environment Folding, there is a certain limitation), etc. Since Mindy is only suitable for beginners, it is not a complete software product, so the document is very small. We sent the author to the author, I hope he can give us a reasonable advice, but he did not give questions about the issues we proposed. Therefore, we can only give up MINDY, continue to study NAMD, and it is difficult to get difficult. Unfamiliar with the application software and molecular power knowledge, let us walk a detour.
Theory and computational biology teams do not only develop NAMD, and also develop a variety of supporting software to complete different tasks. In these supporting software, the most important software is VMD, which is close to NAMD (NAMD. PDB and .psf files need to be generated by this software, Figure 1 shows two soft relationships). VMD is a molecular visualization software for displaying, analyzing, and operating biomolecules. In the article "NAMD TUTORAL", the first part is all described on how to use the VMD generation to generate the importance of the VMD. The TCL scripting language is used in the VMD, and the processing of the file generation and the file is related to the file. If you have to master a parameter, you need to have a certain understanding of the change language.
Our idea is to modify the NAMD (the current original code is used for Linux, so it is necessary to port it to the Window) so that it becomes the software that worker will run. Later we will package the software to encapsulate the software with Java to get data from DataPool, calculate, complete the specified operation, and return the intermediate results to the main program. The current problem is that the documents produced by NAMD are not what we just need, but a series of documents (less documentation documents). How to extract what we need from this series of documents is what we must solve! ! ! And it is also a problem with the continuation of these files! ! ! 2. The main program needs to process the intermediate results returned from the Worker side, mainly for energy analysis of the intermediate results, and obtain the smallest energy of energy from these intermediate results and repart. Among the supporting software developed by the theory and calculation of biology groups, there is a
Mdennergy
Software that calculates energy from DCD or PDB-Files. The current problem is that the output of the output generated by the NAMD does not match the software input file.
If you can directly get energy data directly from the NAMD's output file, we will greatly reduce our workload, which requires us to continue to study the software.