Challenge 12 problems of IT

zhaozj2021-02-16  61

Dr. Jim Gray Berkeley, Ph.D., University of Stuttgart, Germany, academician of the National Academy of Engineering. The 1998 Tu Winner, the leader of the famous database expert, database and exchange processing system, today's largest network database project - Planner of Microsoft Terraserveres plans.

On May 4, 1999, at the ACM meeting held at Atlanta, Jim Gray accepted the Trium-winning awards and published a Tulus award speech entitled "What next? - a dozen remaining it question" Techniques 12 directional research topics should be solved in the future.

These 12 research objectives are not entirely a personal opinion of Gray, which represents the opinions of many computer scientists and informationologists, which are very broad representative, which is worth our attention.

In the past few decades, the information industry has always grown in an index manner: Moore prophes in the semiconductor every 18 months; in 1995, Gritge Glider predicted that the network bandwidth will grow at a rate of 3 times a year, this prophecy It seems conservative, the bandwidth expansion is better than he predicted.

Of course, the exponential growth will not have no end, there will always be such a situation to limit this growth, but for the information industry, since the new invention will overcome the obstacles that come one after another, we always maintain an accelerated momentum. This accelerated development of the information industry means that it is constantly redefining itself: It is very easy to do before 10 years ago, it is very easy, and it will be very different from now on 10 years later.

1. Scalability: Design a software and hardware architecture that enlays the scale to 10 6. This means that the storage and processing capability of an application system can be automatically increased by one million times, that is, by increasing the resource faster, or completing the size of 10 6 times the size in the same time.

Most of my work is excited by the scalability targets described by John Cocke, 1987 Charting Awards, RISC concepts. This goal is to design a software and hardware architecture that enables it to improve performance without limitation. However, the system will always be restricted by funds, power or space, so a more realistic goal is to achieve a system that is a million nodes, working on the same problem.

Implementation of scalability results in all aspects of large computer systems. The system is upgraded by adding the module, and a small portion of the entire work is done by each module. As the system upgrades, data and calculations are just scheduled to new modules. When a module fails, the additional modules are masked and continue to provide services. This automatic management, fault-tolerant and distribution load remains challenging.

In the past few years, the research progress of scalability is amazing, and there are many such systems, most of which are from the Internet.

The Internet is a world-class computer system with 100 million nodes and is doubled every year. People are worried that they cannot control the growth of networks and servers. I think there should be more research on protocols and network projects. On the other hand, the giant server has developed rapidly, and some companies have launched a system that can handle 1 billion affairs every day. How to manage these huge clusters is a fairly severe problem. Automation of this scalable system is currently only a part. In fact, almost all major machines require a specialized management system.

In the next decade, resolving scalability issues will become more urgent. The architecture of the new computer will be implemented on a single chip, so each processor chip will be a symmetrical multiprocessor (SMP). Another trend is to introduce the processor in the microelectronic mechanical system (MEMS). This price is only $ 10 MEMS will have a sensor, a reaction device, and can be processed on the board, which is challenging on the number of more than one million MEMS systems. 2. Tulex test: Constructing a computer system makes it at least 30% of the time to win the game.

The Tuling Test is based on a three person playing imitation game. Tuling About this test of this test is worth studying, he wrote: "I believe that about 50 years may implement this computer, its storage capacity is 10 9 times the existing computer, making it When playing mimic game, you can make an interrogated opportunity to make a correct judgment in 5 minutes. In average, there is less than 70%. The initial problem, can 'machine think?', I think it is too meaningless, it is not worth discussion. But I I believe that during the end of this century, people will not be rebuted when talking about the machine's intelligence. "

In the past 50 years, the computer has made great progress in Turing test, and the computer has initially has a simple brain storage and computing power. Now, machine intelligence has become the goal of many scientists together, and the computer can also help people conduct all designs, such as concept extraction, simulation, production, manufacturing, testing and evaluation. However, the computer is just a tool and collaborator rather than a smart machine. This type of computer does not generate new concepts, which only perform static programs, rarely adaptive or learning capabilities. Even in the best case, it is necessary to establish a good structure to enable parameters to form an optimization setting in accordance with the environment. This is adaptable, but not learning new things. Therefore, the current supercomputer software and database will not pass the Tuling test in the next decade. There are some ideas that are completely different from now.

We have also encountered a problem: how gene chromosome and brain work. In this regard, we don't have a trace of trace, find the answer will be an excellent long-term research goal.

3. Voice to text: I can listen like a native.

4. Text to voice: It can speak like a native.

5. Look like someone: I can identify objects and action.

There is also two difficulties and challenging issues in Tuling Test: (1) Read and understand like people; (2) thinking and writing like people. These two issues are as difficult as the Tulex test itself.

Currently, we have made great progress in other three slightly easier issues, which are: computer listening and understanding natural language, music and other sounds. Now, the system that transforms voice into text has reached a practical extent. Of course, it is benefited from faster and cheaper computers, but also benefits for a better algorithm, dictionary, lexical analysis of the in-depth understanding of the language. Merchants and semantic networks. This field is steadily developing, the error rate is 10% per year, and the word recognition rate can be reached when the vocabulary is not limited. At present, computers can better understand English than most people, many blind, listening, and losing listening people have also begun to read, listen and typing through transformation systems of voice to text.

Today, there is already a simple language translation system. To make the system through the map spirit test, it is likely to have more internal representation. If a person teaches the second language of this system, then the computer should have a similar internal representation of the language information. This will open the possibility of faithful translation between languages. Maybe there may be a more direct way, but it is not very clear until there is currently.

The third area is a visual identification system: constructing a system that can identify objects and is a dynamic object (running horse, smiling person, body posture, ...). In terms of visual representation, the computer has outstanding performance, but it is not as good. This is also a system of human-computer symbiosis, but the system of the system launched by Lucasfilm and Pixar is amazing. Of course, it is still a challenging task to make children and adults create such images in real time to entertain or communicate. Although the progress in these three areas is also very limited, it has been greatly beneficial to persons with disabilities and certain industries: Optical Character Recognition (OCR) has been used for text scanning; speech synthesizer can read text; speech recognition system It can be used by the deaf people to answer the phone, and those who have disabled by the arm are used to input text and commands; in fact, programmers have used voice inputs; for most deaf people, they can directly connect the instrument to the auditory The nerve transforms the sound into a neural pulse, thereby replacing the tympanic membrane and cochlea. Unfortunately, no one understands the encoding used by the human body.

Such long-term projects will help a more wide population, set up a revolutionary bridge between computers and people. When the computer can "see" and "listen", communication will be easier and convenient, and the computer will help us see more, better, and remember more.

6. Personal MEMEX: Record everything you see and hear and quickly retrieve it according to the request.

Wanneva Bush is one of the earliest information experts, and he has created an analog computer at the Massachusetts Institute of Technology. In 1945, Bush published a visionary article in the "Atlantic Month", which describes Memex: It is "desktop" that can store "billion book", you can put newspapers, booklets, Journals and other literary works are stored together in hyperlink. In addition, Bush proposed a prism system of a built-in camera that can take pictures when needed, and there is a recorder to record, all of which is sent to MEMEX.

MEMEX can find files or enter another file from a file by reference. Anyone can comment on the file through the link, and the files of those comments can be shared by other users. Bush has realized that information in MEMEX will be a challenge and put forward the idea of ​​"associated search" and find files by comparing some guidelines. Bush believes that this machine should recognize the voice command and print it out, and also propose a "electronic path to the human nervous system", which is considered to make the question and get the answer more effective.

50 years have passed, MEMEX is almost achieved. Most scientific literatures have been online, and the scientific literature has flip every 8 years, and most of the literatures in the last 15 years have been online, but anyone who has used the Internet knows its limitations: (1) Discovering online It is very difficult to need something; (2) Many things you want are now not online.

So why not deposit all things into your computer? The easiest answer is due to most of the valuable value, and currently there is no respect for this in computer space. In fact, the goal of computer culture is to make anyone can get all information for free anywhere. There are also many technical issues in protecting intellectual property, but truly tricky problems are in law and business.

In the face of these challenges, it is undoubtedly a good way to build a personal MEMEX. This device can record everything you see, hear or read. Of course, it must have some protection measures so that only you can get information, you can also discover related events in the form of the command and display it to you. MeMex is not data analysis or summary, it is just everything you hear and see.

In addition to visual, Memex seems to be feasible in other ways. Personal recorders record about 25GB content, and record a few TB capacity you have heard. MEMEX will record everything you read at a 250 megabyte speed of each year, and record everything you hear at a volume of 100GB bytes per year. 3 years later may be sufficient to use only one disk or tape annually. Video Memex seems to be more than our technical capabilities today, but in decades, it is likely to be economical. The capacity required for high quality video is currently several times - 80TB per year. This is a big space, and people have a life requires 8PB (1PB = 10 15 bytes), which exceeds the burden of most people. If you want a higher quality image, this number is more than 10 times, i.e., more than 80pb. In addition, object recognition techniques may require more efficient image compression technology. If the 1TB byte capacity per year is maintained, the best video that can be provided in the current compression technology is approximately 10 frames per second (TV quality), and it is expected that the quality of each decade will increase at least 100 times. Get, store, organize, and demonstrate video information will be a seductive long-term research objectives.

7. World-class MEMEX: It can summarize the text's "Complete Works" (including information such as music, image, art and movie) and answers related issues, with the speed and accuracy of the human experts in this field.

In 1999, the global memory industry produced a total of approximately 1eb (1ebκ1018 bytes) and 100eb tape, and the price of the spare tape and the online disk was $ 10,000 per tb by 10 million US dollars. Therefore, it is very cheap to store the era of existing information.

What will this vision take us? How can we find them if all things are stored in computer space? Now, people give titles and the first few words, and the website provides some summary, but the real analysis or abstract is actually doing very little.

Therefore, after the personal meme is only returned, the next challenging goal is to build the "Complete Works" to analyze and present MEMEX in a very convenient way. . Rajreddy, 1994 Tu Winner will describe a system that can read textbooks and give the correct answer after reading the problem. Further requirements will be a summary of the Internet, "Computer Science" magazine, the Great Britain Encyclopedia, and answer questions, just like the human experts in this area.

Once we have mastered the processing technology of the text, it is obvious that the next step is to construct a similar system to organize the sound library (speech, session, music, etc.); then the next challenge is to build a photo, movie that can be collected. And other images are organized and integrated. The main challenge is to automatically analyze and organize information. When someone is asked, the problem can be proposed in a natural language, gesture, graphic or window interface, etc., the system also answers the user in an appropriate manner.

This is the task we face, it may be more simple than the computer like a person, but it is really useful.

8. Remote Intervention: When the observer has an event that the previous event occurs, it has the same feeling (ie remote observation) at the time (ie remote observation); or when the participant simulates the event that is incurs, it can communicate with others, just like it is the same. Remote attendance).

One of the reason why all things is interested in, that is, it will make others see or follow it. Most of us have found this "time transfer" more valuable than "spatial transfer". However, there is currently only the processing of TV or radio broadcasting can achieve this, but this requires the conversion of Internet information to the world's most expensive VCR.

With the use of computer and virtual reality technology, the high quality on-site experience has been implemented. Through multi-angle, high-fidelity record events, the computer can reproduce the high fidel image observed from any angle, allowing the observer to have a full feelings. The challenge of this technology is to generate a virtual environment as required by recording an event, allowing the observer to experience events like actual participants, and we call this "Remote Intervention". Today's TV and broadcast have realized the low quality version of this program, but they are completely passive.

The next challenge is to allow remote interventionists to communicate with the on-site personnel, that is, from the distance. For remote attendance, there are currently in the form of a telephone, a remote conference and chat room. However, this is much better than the experience of the scene. Therefore, people are still willing to have a long journey to get a more real experience. One of the operability tests of the remote attendance is to look at the remote intervention students and whether the results of the students who are facing teachers are as good. The relationship between teachers and two groups is also harmonious.

9. There is no failed system: constructing a system that has millions of people every day, and it requires only one part-time manager.

What happens when the computer becomes faster and faster, the memory and bandwidth are getting bigger and bigger, what happens? Of course, this is still impossible recent. But this accelerated performance price ratio is shocking, and it has also changed the rules of change. When the processor, memory, and transmission cost is very low, then the only value is data and its organization. However, there is a problem that a computer scientist is a must solve --- Even the best program will have a fault per thousand lines, the computer is at least thousands of US dollars in the system management.

Today, the cost of having a computer is not big: the price of the palm or desktop computer is hundreds of dollars, the workstation is thousands of dollars, and the server is only a million dollars. There is no doubt that they don't want to pay more wages for the staff of the management system, so there must be a system that can self-tissue. Simple system such as handheld computer, customer needs to work and store data without fault, and never lose data. When the system needs to be repaired, you can "call the phone home" repair, or by the mail replacement system or module, it will not cause loss of information; if it is software or data problem, simply update from the server available everywhere. You can do it; if you bought a new part, what you have to do is to insert it into the machine and restore it from the server side (just like the old device failure).

So, this kind of visible system and who should manage from the server? Server systems are often more complicated, where many semi-customized applications are operating in overload, and they often provide a very important application service. From a certain extent, complexity has not disappeared, just shifting.

In fact, people who have servers don't care about how to manage servers. They don't want to be system management experts, so server systems must be self-management: system administrators are only responsible for determining targets, strategies, and budgets, and everything is made by the system. Includes allocation between servers. When a new module is inserted into the machine, it should be automatically integrated into the system; when there is a server failure, its storage content should be automatically copied to other places, thereby completing storage and calculation in new places; when hardware failure When the system should be self-diagnosed and replaced by express mail; the upgrade of software and hardware should also be automatically performed.

This will lead to new goals: there is no fault system. The ultimate goal of this goal is that the system can serve millions of people every day, and only one person can use a small amount of time to manage very well. At present, such a system often requires an experienced administrator 24-hour all day management, and experts need to upgrade, maintenance, system update, database management, backup, network management, etc. for the system. 10. Safety system: Ensure that the system is only serving the authorized user, and illegal users cannot terminate the system service, and the information cannot be stealing (and can also prove this).

11. Running: Ensure that the system has less than 1 second per 100 years (and to prove it).

Recently, there is a series of security issues, such as Melissa virus, CiHovirus (CIH virus) and mathematical attacks for RSA algorithms, which makes 512-bit keys still look too small, and is still very dangerous.

If this trend continues, we can't put the property trust to the computer space. This is the main challenge for system designers to make system services only provide users with authorized users, and cannot refuse to serve them, and the attackers cannot make them destroy data, and they cannot reject the conspiracy of authorized users. Device, and unless authorized, the user is not allowed to see data.

Since most systems may enable hackers to enter the inside by the identity of the system authorized users, any verification based on password or other flags seems not safe. I think we will have to use such methods such as retina to prevent counterfeiting. In addition, all software should also be signed in a way that cannot be created.

The first operational test of this research target is that hackers cannot penetrate into the system. Unfortunately, through this test, it is not really guaranteed to be safe, and the security system must understand the threats from all parties and can prevent it.

The second test is whether the system can always be used. At present, the availability of the system has risen from 90% in the 1950s to 99.99% (for systematic systems that are well managed). Due to its vulnerability, the Internet is not uniform and over-emphasized to enter the market, it is only 99%.

Despite this, we have achieved 1000 times in 45 years, which improves a quantitude every 15 years. We should work hard to improve the goals of 100,000 times: only 1 second in 100 years is not available. This is an extreme goal, but if the hardware is very cheap and the bandwidth is wide, it seems to be completed. Once this goal is reached, people can repeat the service in many localities, and avoid the failure of their modules through the consistency of transaction management data, and quickly repair the node when failing. And this will be unable to test, so it must be very careful and detailed in order to complete this goal.

12. Automatic programming: Develop an agreement language or user interface: (a) Easy to express design; (b) Computer can compile it; (c) It describes all application issues (and is complete). The system can automatically reasonine the entire application by quoruting, and it is convenient to use it through a question.

Writing software is the only one in the computer space is getting more and more expensive, more and more unreliable. Although each software fragment is not so unreliable, after the program, although it has been repeatedly tested, there will be a failure of each thousand lines. As software products are increasing, there are more and more mistakes.

How can you ask how it would be so expensive? This is very simple: the design, preparation and documentation of the program makes the program 20 US dollars per line fee, and the cost of the code test is 150% of this fees. When the code is delivered, the cost of maintaining and supporting in its life cycle is as much as the above cost. This is a very harsh reality. As the computer is getting cheaper, the program will be more and more, so this burden will become more and more. So how do I solve this problem? How can we get rid of this situation? So far, the solution is based on the use of advanced non-pro - process language to make the program code that must be written less. This has achieved great results. But this is useless to prepare a new program. Maybe the software project will eventually be achieved, but I am very pessimistic about it. I think there is a new solution, maybe it will not develop too fast, because this is a "Tu Le Trap". I think we must: (1) There is a more powerful and easier status language than the current status language; (2) Computer should compile this status language; (3) This status language should be sufficient Strong, so that all applications can be described in it.

At present, some systems have implemented two of the above three requirements, but all three requirements have not been implemented. The program design process is essentially imitated: customers find program designers and describe applications; designers return the design recommendation to him; then discuss, construct prototype, discuss; finally, build an application system. Changing this method needs to implement automatic programming. There is a 45-year history for automatic programming language and system research, but this progress is still small.

The operability test of this goal is to use a computer alternative program designer, and the effect is better, the time is shorter. There is still a distance from this system, but if the Tulex is about the machine's intelligence, then it is just a time problem.

转载请注明原文地址:https://www.9cbs.com/read-19963.html

New Post(0)