Edict: Action Trees

Jul 02, 2019
ai edict gamedev
7 min read

A recent task I’ve been working on for `Edict' has been how to represent agent planning and behaviour.

A core idea of the Edict concept has always been indirect control of the village’s population through abstract rules. We might achieve this by allowing players to label regions of the world, modify scalar agent parameters, incentivise specific agent behaviours with learning algorithms, or any of a variety of other mechanics.

However, I am personally intrigued by some level of direct, runtime, interaction with the agents' programming.

Constraints

The game world we create aims to place up to a few dozen agents in the village for the player to guide. If we include animals and zombies in this figure we are looking at supporting perhaps an upper limit in the region of 100-200 agents. This rules out some high resource approaches but it’s still low enough to give us some flexibility.

Given the development team consists of two people it is hard to justify methods that are complex to implement or reason about; we need the ability to experiment fairly rapidly, and extend our capabilities incrementally.

This is also likely to benefit our players: I would love to provide everyone with similar tools to what we use internally.

Decision Trees

One technique that appears to fit our constraints is the `decision tree'.

We might represent a simple high level villager behaviour as below:

We show actions that an agent can perform as ovals, and conditions that lead to the agent changing behaviours as arrows.

eg: An idle agent that notices it is hungry may transition to the 'eat' node. And, if it lacks food, subsequently transition to the 'forage' node.

There are some immediate advantages for projects like Edict:

This tree structure is reasonably straightforward to manipulate.
The flow of execution has an obvious visual representation.
The behaviour exhibited by an agent is repeatable.

Ideally players will encounter surprising outcomes in our simulated world. Sometimes these surprises will be negative; this is fine if the underlying cause is traceable and the player has tools to avoid these circumstances in the future. Decision trees enable these moments of introspection fairly directly.

A notable downside, when emplying a straightforward implementation, is that there isn’t immense scope for agent learning or stochastic behaviours. However, we believe that these outcomes will flow naturally from the interactions of the whole population and their environment.

Concepts in Edict

Within Edict we have introduced three primary concepts to represent behaviours:

action	A named, stateless, bundle of planning or activity.
status	The result of executing an action
variable	A mechanism to store named, typed, values for each agent.

Action

Actions are the smallest available component of agent behaviour. They provide the mechanism that instructs an agent to perform one activity, or record planning data.

Some broad categories of currently implemented actions include:

movement	go to a coordinate, randomly walk the terrain
tasks	attack an agent, craft an item, sleep
query	find the closest hostile, choose a random accessible coordinate
compound	execute a sequence of child actions, execute a random child action

A user may specify a list of child actions to be executed at the parents discretion. This allows us to build up branching structures like those displayed in our original example.

To affect a change in the world we execute the root action every simulation tick [1]. This enacts the underlying behaviour and may dispatch further execute calls to any of its children.

And every action will receive a list of events that the evaluated agent can perceive. These include visibility changes, path blockages, agent deaths, attribute warnings, and many others.

random_explore

If we wanted to specify a naive world exploration algorithm we might do the following: choose a random point, move to the point, repeat. It’s not intelligent but it can get the job done for small worlds.

The sequence action executes its children left-to-right; recording its progress across invocations. In this case it first chooses a destination, then instructs the agent to move to the destination.

The repeat action re-evaluates its child until they fail.

The above is remarkably similar to the naive algorithm we use for testing right now. It can be conveniently serialised to the form below.

random_explore.json

{
    "action": [
        "action": "repeat",
        "children": [
            "destination::sample",
            "destination::move"
        ]
    ]
}

Status

Every call that an action handles will return a status value that indicates its progress.

success	The goal has been achieved
continue	More time is required, but the goal is (likely) achievable
failure	The goal can no longer be achieved

The values are typically used to:

avoid evaluating inconsistent actions; eg, don’t try to attack an agent that died last frame.
control which child actions are executed; eg, the success of a child of sequence allows the next sibling to execute.

Variables

However, if actions are stateless, how do we communicate between them? sequence needs to store an index to the next sibling somewhere.

For this we introduce the concept of a variable. A named, typed, persistent handle to a single piece of planning data. [2].

At load time we query each action for the list of variables it requires and then set aside sufficient memory for future use.

Each agent will own a unique instance of each of these variables; the same variable refers to a different instance for each agent. It is not possible (at this time) to operate on two agent’s variable instances simultaneously.

Random Exploration

Thus, in the 'random_walk' example from above, it is a variable that allows communication between the destination::sample and destination::move actions.

By convention we use the destination variable; a point2f that specifies the target of a move operation.

human_threatened

A slightly more complex example lifted from our test level is the rules used by humans that can see a zombie.

We didn’t have a lot to work with at this point and so settled on a very simple two pronged approach:

If another human is nearby then run up to them before attacking the zombie. Hopefully this draws the second human into the fight.
Otherwise, run away.

The 'find_friendly' section communicates by means of the target variable. Setting the target to the friendly human so we can move to it, then setting the target to the zombie so it can be attacked.

If for any reason this series of operations fails we will fail out of the sequence action, and allow the any action to execute its next child.

The goal of any is to run just one child through to success. If 'find_friendly' fails then it will try the 'alone' subtree instead. Critically, the failure which triggers this backtracking might be the target[friendly] action (if we happen to be alone).

The above can be compactly represented through the JSON definition below [3].

human_uninfected_threatened.json

{
    "action": {
        "type": "any",
        "children": [ [
                { "type": "target", "affiliation": "friendly" },
                { "type": "agent::destination" },
                { "type": "target", "affiliation": "hostile" },
                { "type": "attack" }
        ], [
                { "type": "target", "affiliation": "hostile" },
                { "type": "avoid" }
        ] ]
    }
}

Putting this together we can see the behaviour play out in our test level below:

Rationalisation

To justify all this work I will fall back on my favourite scenario from the Edict design document: the town bell.

A player formulating a town defense strategy might specify that:

When a hostile agent is sighted they are not to engage. Rather, they should return to the town centre and ring a bell.
When a bell is heard a villager should put down their current task, locate a weapon, and muster at a defense post.

The above consists of two conditions and five actions; something it seems clear we can specify quite succinctly using the methods outlined above. Yet it expresses a reasonably useful defense strategy.

Future Work

This system has been merged into the currently builds of Edict for a few weeks and has been serving our needs fairly well so far.

Upcoming work will mostly focus on getting a simple UI that allows us to experiment with these concepts.

Edict is not a programming game and so JSON isn’t an appropriate input method. Experimentation will shed some light on the most useful and fulfilling subset of these features to exposure to our players.

I’m excited to see what surprises players will give us with these tools in the near future.

There have been some mild concerns raised about performance issues when evaluating the entire tree every tick. It’s definitely something to keep in mind, however limited testing with 1000 agents hasn’t shown worrying slowdowns just yet. ↩
We record other attributes for each variable which are outside the scope of this description; such as read/write intentions, and global/local scoping ↩
We require variables typed as a constant to be specified at load time. The use of 'affiliation' as an object key in 'human_uninfected_threatened.json' is an example of this ↩

nerdcruft