Search game and Nash balance [reproduced]

xiaoxiao2021-03-05  31

Game THEORY Basic assumptions: people are rational (Rational, or selfish), rational people refer to his purpose in specific policy choices to maximize their own interests, game theory How to make strategic selection between rational people.

John Nash, the game The classic story "The Dilemma of Prisoners" shows the establishment of non-cooperative games and its balanced solution, which is called "Nash Balance".

All game issues will encounter three elements. In the story of prisoners, two prisoners are also called participants (Strategies), which admitted to murder, and the last two per capita won (payoffs). If there is a recognition of murder, another reliance, no recognition, the aperture will get the most stringent death sentence, and the arguments will recognize the crime in the Nash story. The facts, so two prisoners get the intermediate results.

Similarly: We can also see "Nash Balance" in the theory of "selfish genes".

In the original jungle of the Internet: How is the optimal strategy generated?

I. Production of the best strategy in the game

Robert Axerrod sets two premise before starting research cooperation: First, everyone is selfish; Second, there is no authoritative intervention individual decision. In other words, individuals can perform decisions in an attempt to maximize their interests. In the previous mention, the problem of cooperation should be studied is: First, why is people to cooperate; second, when is the person working together, when is it uncomfortable? Third, how to make others with you.

There are many problems in social practice. For example, tariff returns between countries, improving tariffs in other countries is conducive to protecting their own economies, but the state's mutual tariffs, product prices have increased, lost competitiveness, and harm the complementary advantages of international trade. In the countermeasure, due to the maximization of their respective interests, the damage of the group is caused. This issue is described in the predicament of famous prisoners.

A and B respectively represent a person, their choice is completely different. Select C to represent cooperation, select D represents that do not cooperate. If AB is selected, the two have 3 points; if one party chooses C, one party chooses D, then the zero points of C, select D 5 points; if AB is selected D, the two sides have 1 point .

Obviously, the best result for the group is that both parties choose C, each has a 3 point, a total of 6 points. If one party chooses C, one party chooses D, all 5 points. If both people choose D, they have to be 2 points.

In this matrix in this matrix to describe the conflict of individual rationality and group rationality: Everyone is in pursuing individual interests to maximize the interests, which is a prisoner's dilemma. In the matrix, for A, when the other party is selected C, he chooses 5 points, select C to 3 points; when the other party is selected, he chooses 1 points, select C to zero points. Therefore, regardless of C or D of the other party, it is mostly divided by D. This is a unilateral superior strategy. And when two excellent overtime strategies, that is, A and B are selected, the result is 1 point. This result is not optimal in the matrix. The predicament is that each person takes a respective excellent strategy, the solution is stable, but is not the optimal Pareto, this result reflects the contradiction between individual rationality and group rationality. In mathematics, the matrix of this disposable decision has no optimal solution.

If the game is going many times, as long as the countermeasures know the game number, they will definitely take a strategy of playing with each other. In this case, there is no need to cooperate every time, so the countermeasures have not worked together in many games known. If the game is performed in many people, and the number is unknown, the countermeasures will realize that when they continue to cooperate and reach a tacit understanding, the countermeasures can continue to have 3 points, but if they continue to work together, everyone is Always get 1 point. In this way, the motivation of cooperation is displayed. Multiple times, future gains should be more discounts than now, the greater the future, and the more important future benefits. In the multi-person countermeasures, the W is large, that is, the optimal strategy is related to the strategy taken by others. Suppose someone's strategy is, the first cooperation, as long as the other party does not cooperate once, he will never cooperate. For this countermeasures, of course, cooperation is the policy. If some people do any strategy, he always cooperates, then the strategy score of not cooperation is the most. For those who do not cooperate, they can only adopt a non-cooperative strategy.

Aixrou has an experiment, inviting multiplayers to participate in the game, the score rules are the same as the front matrix, and when the game is unknown. He asked each entrant to write a computer program that pursuing the most score, and then use the single cycle to play two or two games to find out what kind of strategy score is the highest.

The first round of games have 14 programs to participate, plus Axlaod's own random programs (ie, select cooperation or non-cooperation with 50% probability), running 300 times. The result of the highest score is the "Tit for Tat" written by Canadian scholar Rob Bu. The character is characterized by the first time to adopt cooperation strategies. Every step is followed by the following strategy. You last cooperate, I will cooperate this time, you will not cooperate last time, I don't cooperate this time. . Aix Rod also found that there are three characteristics of the procedures in front: First, never betrayed, "kind"; second, for the counter-attention behavior must be retaliated, can not always cooperate, ie "It can be irritated"; third, can not be betrayed, you have never been returned, and later people will cooperate, "tolerance".

In order to further verify the above conclusions, Ei Shi determines that more people will make another game, and publicly published the first results. The second collection of 62 programs, plus his own random programs, and made a competition. As a result, the first place is still "one report is also reported". Ai Summary This game is: First, "a report is also a report" is still the optimal strategy. Second, the three characteristics mentioned earlier are still valid, because the top 15 in 63 people, only the 8th Harrington program is "bad", after 15, only one is always cooperation It is "kind". Irritability and tolerance have also been proven. In addition, a good strategy must have a feature that "clarity" can make each other in three and five steps to identify the bureau, too complicated countermeasures. "One report is also a report" has a good clarity, so that the other party quickly discovered the law, and therefore had to take the attitude of cooperation.

Second, cooperation process and law

"One report also report" strategy gets a good score in static groups, then in a dynamic evolutionary group, can this collaborator produce, develop, and survive? Is the group evolved in the direction of cooperation or evolved in the direction of uncoactive? If you start uncomfortable, can you create cooperation during evolution? In order to answer these questions, the principles of Ecis have analyzed the evolution of cooperation. Assuming that the strategic group consisting of countermeasures is a generation of evolution, the evolution rules include: First, try the wrong. When people treat the surroundings, they didn't know what to do, so I try this, try that, which result is good. Second, inherited. If a person is good, his future generations are more. Third, learn. The competition process is the process of practicing each other, and the "one newspaper is also reported" strategy, some people are willing to learn. According to such ideas, Ai is designed, assuming 63 countermers, who is in the first round, the higher the proportion of his group in the second round, and it is his score Positive function. In this way, the structure of the group will change during evolution, thereby visible that the group evolves in the direction.

The experimental results are very interesting. "One report is also reported" originally accounted for 1/63 in the group, and after the evolution of 1000 generation, it accounted for 24% when the structure stabilized. In addition, some programs disappear during the evolution process. One of the programs worth researching, that is, the only "bad" Harrington program in the top 15, its countermeasures is to cooperate first, and when the other party has been cooperating, it suddenly comes from cooperation If the other party retaliates immediately, it will resume cooperation, if the other party still cooperates, it will continue to betray. This program has developed very quickly, but when other programs except "one report is also reported" began to drop. Therefore, the group is increasingly cooperated in a coefficient of cooperation.

Evolutional experiment reveals a philosophy: a success of a strategy should be based on the success of the other party. "One report is also a report" when two people have countermeasures, the score cannot exceed each other, with a maximum of a flat hand, but its total score. It is very secure based on the basis of survival because it gives the other party a high score. The Harrington program is not this. When it gets a high score, the other party will inevitably get a low level. Its success is based on the failure of others, and the loser is always eliminated. After the loser is eliminated, this good person who is good for others should be eliminated.

Then, in a group of uncomplicats of extreme selfish people, can the "one report return" can survive? Ai's discovery, in the case of a score matrix and future discount factor, it can be calculated, as long as the group's 5% or more members are "one report is also reported", these collaborators can survive, and as long as Their score exceeds the total average of the group, and this cooperative group will become bigger and larger, and finally spread to the entire group. Conversely, regardless of how much proportion of non-collaborators account for a collaborator, uncomplicats are impossible to bottom. This means that the ratchet of the evolution of cooperation is irreversible, and the cooperation of the group is getting bigger. Akshrod has made the research difficulties of "prison difficulties" in such an inspiring conclusion.

In the study, it is found that the necessary conditions for cooperation are: First, the relationship should continue, one-time or limited number of games, the countermeasures are no cooperative motivation; second, the behavior of the other party must return, one forever Cooperative countermeasures will not work with him. So how do I improve cooperation? First of all, to establish a long-lasting relationship, even if love needs to establish a marriage contract to maintain both parties. (Why is the train station to deceive? Why do you want to form a group system in your work? When you change the defense, you always have to attract it. In the middle, the front line is like this), to enhance the ability to identify the other party, if If you don't know that the other party is cooperative or not cooperative, you can't return him. Third, we must maintain a reputation. If you want to retaliate, you must do it. People know that you are not bullied, I don't dare to work with you. Fourth, the bureau to be completed step-by-step should not be completed once to maintain long-term relationships, such as trade, and negotiations must be taken step by step to enable the other party to take cooperation. Fifth, don't succeimate people, "a report is also a report" is this model. Sixth, don't betray it first to avoid the moral pressure of the culprit. Seventh, not only for the betrayal, it is necessary to return to cooperation. Eighth, don't play smart, take up the cheap people.

(The difference between the bridge and play mahjong)

Axsold raised several conclusions at the end of "The Evolution of Cooperation". First, friendship is not the necessary conditions for cooperation, even the enemy, as long as the relationship is satisfied, it is possible to cooperate. For example, during the First World War, the German British army had a three-month rainy season in the warhouse battle. The two sides reached a tacit understanding in these three months, and they did not attack each other's food tricks. You die, I am playing. This example shows that friendship is not a prerequisite for cooperation. Second, the prediengest is not a premise of cooperation, and the Ai is a low-class animal, and the plants cooperate between plants to illustrate this. However, when there is a predictive human understanding of the law of cooperation, the process of cooperation will accelerate. At this time, predictability is useful, and learning is also useful.

When the random interference is taken into account, the countermeasures have begun to betray each other because of misunderstanding, Wu Jianzhong found that the revised "one newspaper also reported", that is, no revenue of the opponent, and "Repentible report is also a report", that is, take the initiative to stop betrayal at a certain probability. The stronger of all members of the group handling the random environment, the better the effect of "repentance is also reported", the worse the "wide and big one report".

Third, Aixrou's contribution and limitations

Akshro has studied how to break through the prisoner's dilemma through mathematical and computerized methods, bringing cooperation to a new realm. His proven proves in mathematics is undoubtedly unconado, and Some conclusions he drawn in computer simulations are very amazing discovery. For example, the highest human people have not got the highest score in each game. (Liu Bang and Xiang Yu's war)

The "one report also report" strategy found in Aih, can be seen as a "mutual benefit" from the perspective of sociology. This behavior is personal private, but its results are profitable, and Through mutual benefits, he may cover the most widely widened social life. People have formed a social life through gifts and returns. This order is also the most easy to understand between the people who are unrelenting in the middle of the people. . For example, when Columbus is on the continent of the Americas, the initial exchange with Indians begins to give each other gifts. Some seemingly purely beneficial behaviors, such as unpaid, and through some indirect methods, such as the social reputation, gains. To study this behavior, it will have important significance to our understanding of social life. When the prisoner's dilemma is expanded to a multiplayer game, it reflects a broader issue - "Social Paradox", or "Resource Paradox". Human share resources are limited. When everyone tries to get more than a little more than a little more, there is a conflict between local interests and overall interests. Population issues, resource crises, traffic blocking, can be explained in social paradox, in these issues, the key is to control the behavior of everyone through research and formulating game rules.

Some of Axshrod can be easily found in Chinese classical cultural moral traditions, "Purchase," People don't commit me, I don't commit crimes "reflect the idea of" Tit for TAT ". But these things are not optimal, because the "one report is also a report" is defective in the real social life full of randomness. In this regard, Confucius said in a few thousand years ago, "German" is a wonderful correction strategy so that "straight" is just a fair, and it is a non-reflex to return to the opponent's betrayal. "One report is also a report", the correction is the degree of retaliation, which will make you lose 5 points, now only let you lose 3 points, thereby ending the retaliation of the resend in a fair trial, forming civilization.

However, the assumptions and conclusions of the Gameists make it study to inevitably disconnect with realism. First, "the evolution of cooperation" is implied with an important assumption, ie, the game between individuals is completely different. In the game, the absolute equality between countermeasures is impossible. On the one hand, the countermeasures have different differences in the actual strength. When the two sides are betrayed each other, it may not be 1 point, but the strong is 5 points, the weak is 0 points, so that the recovery of the weak is meaningless. On the other hand, even if the bureau does it, the bureau is quite, some may have a gambler psychology, and it is more powerful, and the strategy to take betrayal can take advantage of it. The scoring matrix of Ai's score matrix ignores this situation, and this gambler psychology is just a large number of zero and games in society. Therefore, the program can also be further improved on this basis.

Second, Eis believes that cooperation does not need to be expected and trust. This is a lot of questions. Counterprints develop their own tactics according to the tactics in front of the other party, and cooperation requires individuals to identify individuals who have met and remember to interact with their interactions, so that these are implied with "expected" behavior. Trust may be an indispensable part of the bureau to reach cooperation with complex countermeasure environments. However, it is expected to be reflected in the computer's program, still need to be studied.

Finally, repetitive games are difficult to implement in reality. A large number of existence of one-time game has triggered a lot of non-cooperative behaviors, and after the party is betrayed by the other party, there is often no chance to retaliate. For example, the capital accumulation phase default behavior, nuclear deterrence between the state. In these cases, the society should make the transaction can be carried out, and to prevent uniform behavior, it must pass the legal means to replace the "one newspaper also report" between the individual, regulate social behavior. This is an important inspiration of Axzold's research on the institutional school. Source address: http://www.chedong.com/blog/archives/000728.html

转载请注明原文地址:https://www.9cbs.com/read-32783.html

New Post(0)