Agent-based models (ABMs) are often been used to investigate how decisions made by individuals within a system lead to systemic outcomes that might not be obvious from knowing those micro decisions. DeepMind (a Google owned company) has published a paper on using machine learning techniques to study how agents in Prisoners Dilemma-style games learn whether to cooperate or exploit each other. In the paper’s conclusion DeepMind joins a chorus of researchers that proposes the use of agent-based modelling to assess how changes in regulations will affect behaviour, including testing for unintended consequences of policy. From an asset owner’s perspective this has potential application to the design of DC retirement arrangements – particularly the ability to model decision-making in response to choice with incomplete information, and how this might lock people into different decision paths. If one has a paternalistic perspective designing such systems to have a “least harm” bias makes sense. Developing the tools to test if a system encourages harmful behaviour would seem a necessary part of that process.
DeepMind describes in its paper (Multi-agent Reinforcement Learning in Sequential Social Dilemmas) that it applied its experience of using neural networks for decision-making to repeated play of Prisoner’s Dilemma-style games. The results showed the emergence of cooperation (playing so that both players benefit) and defection (playing for individual benefit at the expense of the other player) spontaneously in each game. Whether players learnt to cooperate or defect depended on the game being played but also the “cognitive ability” of the player.
As noted in the paper, from a structural perspective the single play version of the games in the paper are identical to the Prisoner’s Dilemma. However, incorporating repeated play and learning by the players introduces a temporal and path-dependence to the strategies employed by the players and the behaviours/outcomes that result. The paper notes that a number of real world dilemmas that could be considered single-play Prisoner’s Dilemma-style games are actually better thought of as repeated, sequential games of the type modelled in the paper. Real world problems given as examples are the extraction of renewable vs non-renewable resources and the emergence of social behaviour patterns from experience of sustainable vs unsustainable social behaviours.
While many in the investment industry aim to exploit machine learning for its potential to assist in security selection, portfolio management or trading, this paper from DeepMind shows that these advances also have the potential to better model financial decision-making and the impact of policy in potentially more realistic simulations. Possible applications of such modelling might include insight into how market strategies might evolve or the unintended consequences of different regulations on the financial industry.
The DeepMind paper also shows how successful players pursued strategies that they were able to successfully execute (ie strategies they could implement) even if there were theoretically better strategies available. In an investment context this resonates with the concept of asset owners selecting an investment strategy that their governance allows them to execute effectively in preference to a theoretically “better” strategy which can’t be executed successfully. This suggests that, for an asset owner, understanding one’s governance and building a strategy that can be executed within that governance capability (or improving the governance capability to match the desired strategy) is the appropriate approach to take in a competitive environment. Making best use of a finite supply of governance capability requires a full exploration of beliefs and objectives in order to identify the strategies where a successful execution is most likely. This is particularly important in harder areas such as sustainability.
A new study (latest draft on 23 December 2016) by two Wharton professors (van Binsbergen and Opp) addressed a different but relevant question: how much potential value could society gain if all informational inefficiencies in current asset prices were eliminated? The authors quantitatively assessed the real value losses associated with financial market anomalies. It is well known that firms make the wrong investment decisions as a result of distortions in market prices and the cost of capital. The maths is complicated but the conclusion is clear: society could gain value that is worth 10.6% of public firm net payouts for eliminating price inefficiency completely. I have used free cash flow as a reasonable proxy for the paper’s net payouts. Given the latest reading of free cash flow yield for the S&P 500 of 4.7% (as of 23 Jan 2017), the potential value to society of eliminating all existing price inefficiencies in the S&P 500 is around 50bps of the market cap. This finding, that there is still value on the table, provides an incentive for the job of price discovery. The lack of a suitable counterfactual, however, means that we cannot quantify directly the amount of value (V) that active management has delivered from a base of complete market inefficiency. Nonetheless it is probably reasonable to assume that the more efficient financial markets are, the less the economic gain would be from further reducing pricing anomalies and the higher the value society was deriving from active investing (everything else being equal).
There is evidence suggesting that financial markets have become more efficient. Bai, Philippon and Savov (2015) claimed that using certain measures, prices in financial markets were 80% more efficient in 2010 than 1960, well before the first ever index fund was launched. The upward trend in improving market efficiency is steady throughout the 50-year sample. Along with a shift towards passive, in the last few decades we have also experienced the rise of high-cost and highly-active alternative sectors like hedge funds. It is plausible to suggest that these two trends together produced better price discovery for society (V).
How about the cost (P)? The aforementioned study by Kenneth French covered a shorter period of 1980-2006 and his data indicated a relatively stable P throughout the entire period (starting with 64bps in 1980 and ending at 66bps in 2006; 1983 and 1986 saw the highest 74bps and 1981 saw the lowest 56bps).
I believe this shows the investment ‘system’ is a dynamic ecosystem (or complex adaptive system, in deeper jargon). The system has acted as a search engine to find a more optimal solution – better price discovery for the same spend. It looks to have achieved that by barbelling from active to both cheap passive and expensive high active. So much for the past, what happens now? It is reasonable to expect the system to continue searching for an even better position, and that could involve more in passive, lower fees on hedge fund allocations, and a further shift from 200-stock traditional active portfolios to 20-stock high-conviction portfolios.
James Price