Alastair Aitchison

At-a-glance functions for modelling utility-based game AI


One of my current projects involves developing a decision-making AI for autonomous characters (i.e. NPCs) in a computer game.

The idea is pretty straightforward – instead of switching between a set of finite states based on simple triggers (i.e. if Pacman has eaten power pill then “run away”, else “chase”), each character constantly assesses the actions available to them in their current environment, and assigns a utility (i.e. the benefit) from each of those actions on a continuous scale. Then, being rational agents, each character simply chooses to perform the action that provides them with the greatest utility. I’m no psychologist, but that seems like a reasonable model of human behaviour to me.

For example, when a character is already at full health, the utility from picking up a health-giving medikit is zero, so you’d never expect the character to perform that action. As the character’s health diminishes, the relative utility of the medikit increases, but in what way? Is it a linear scale?

Or, would a character remain relatively content with 90%, 80%, or 70% health, but then feel increasing urgency to look for medikits when their energy drops under 30%, say? More like this:

Likewise, consider how the utility of searching for an ammo box changes depending on how many bullets the player already has, or the utility of running away varies on the relative strength of an opponent’s weapon.

When deciding how to assign utility to different actions, I find it helpful to refer to the following graphs to consider what function might best describe how I think a rational human player would determine the utility of that action, based on different environment variables (remember, because this is artificial intelligence, there are no “correct” answers – what’s nice is that you can swap in and out different models until you find something that looks right, without worrying about whether it has sound psychological reasoning)


Step Function

If x > 0.5 then y = 1, else y = 0.

This is the equivalent to the simple Boolean trigger logic that the Pacman ghosts use – “Has Pacman eaten a power pill? Then definitely run away!”. Note however that the utility assigned to this action when the condition is true doesn’t have to be 100% – it’s possible to step up in stages, or to only step up a certain amount, so that other actions may still have greater utility, even when the condition is true.


Linear Function

y = mx + c

Remember that you can change both the gradient (m) and the intercept (c), but any increase in the underlying variable will always lead to a constant proportional increase in utility. To give the gradient a downwards slope, set m to be less than 0.


Increasing rate of increase (i.e. exponential increase)

y = x^a where a>1

As the independent variable increases, the marginal utility increases more dramatically.


Decreasing rate of increase (i.e. logarithmic increase)

y = x^a where 0 < a < 1

When the independent variable is small, a little increase leads to a big increase in utility from that action. As the independent variable gets larger, the marginal utility increase becomes less and less.


Exponential Decay

y = a^x where 0 < a < 1

When the independent variable is small, a little increase leads to a substantial decrease in marginal utility. As the independent variable increases, the marginal utility decrease diminishes.


Sigmoid  curve

y = 1/(1+ex) (or y = 1/1+e-x for reverse)

This gives an S-shaped curve that, as defined above, is centred about x=0, but is easy to shift to make the middle of the curve (where the gradient is steepest) lie wherever is appropriate.


Frequently, the utility of an action (y) varies not only with a single variable (x) as shown here, but with multiple variables – the utility of attacking another character may increase linearly with the value of the prize if successful, but decrease exponentially with the relative strength of that character. That’s no problem – just combine the utilities as:

Utility from attacking = w1(prize to be won) – w2(relative strength^2)

By changing the functions used to calculate the utility of each action, and adjusting the weights (w1 and w2 above), you can create surprisingly sophisticated AI decision-making behaviour using only the handful of functions above.


Choosing the Action with the Greatest Utility

As an example, suppose that your computer-controlled agent was capable of three actions: attack, heal, or reload, and that you had chosen utility curves for those actions based, respectively, on the perceived enemy strength, current health, and number of bullets held, as follows:

We can now determine what the “best” course of action would be for the character at any point in time by plugging in the current values of strength, health, and bullets and reading off the associated utility of each action. For example, starting from this situation:

The action with the greatest utility is to attack. So, the character starts to attack, and fires off a few bullets. The situation now becomes:

Now, the greatest utility comes from reloading. So the player refills the chamber of their gun. Unfortunately, in doing so they take a few hits, damaging their health:

At this point, health has become an issue, and the greatest utility now comes from healing.

By tweaking the utility curves, you can create differences in character’s personalities – for example, a “brave” character would derive greater utility from attacking than a more cowardly character would, so the attacking utility curve for the brave character would be higher than the cowardly one. Likewise, a cautious character might place greater emphasis on reloading and healing than on attacking.