Monday, June 20, 2022

[ohenhfle] measuring strategic richness with layered time controls

start with a collection of chess engines all at or below some baseline strength.  by "engine" we typically mean instance of an engine with chosen parameters, typically time control parameters.

next, collect another set of engines that averages at least 2/3 points per game in matches against any engine in the baseline set.  choose the best engine from this second set, where best is defined as the engine that can achieve 2/3 points with the shortest time control.  let T1 be that time control.

consider all engines that meet the T1 time control.  we will call these the T1 engines.  the T1 engines only need to meet the time control; they do not need achieve 2/3 against the baseline above.  consider another set of engines that achieve at least 2/3 points per game against any engine among the T1 engines.  note well: these second tier engines must score 2/3 against any T1 engine, not just against the "best" engine above that defined the T1 time control.  the T1 engines might include engines which in general play poorly but are tuned just to play well against some candidate second tier engine (inspired by various engines nowadays attempting tune against Stockfish).  among these second tier engines, choose the best, again the one that can accomplish its 2/3 level of strength with the shortest time control.  note well, engines are playing at unequal time controls: the T1 engines at T1 time control, while the second tier engines each play at whatever time advantage it needs to achieve 2/3 point per game.  ponder must be disabled.  let the time control of the best engine be T2.

continuing this way, we can define a sequence of time controls, each one which has one engine which achieves at least 2/3 points against all engines at shorter time controls.  continue until the time control becomes so long that testing is infeasible.  the number of time controls, the number of levels, represents the strategic richness or depth of the game.

what should be the baseline?  perhaps search for mate in N, and if not possible, choose a random move.  what value of N?  (previously, mate in 1.)

improvements in software and hardware could result in changes to measured strategic richness.  interpret this as improving the measurement.  it helps, perhaps is necessary, for there to be a heterogenous and active community of engine programmers seeking to make better programs.

design games, e.g., chess variants, with high strategic richness, many levels.  with AlphaZero-style reinforcement-learning AI technology able to learn to play any game, measuring strategic richness could be fully automated.

previously, with people.

No comments :