wiki:Projects/UnitTesting

A Unit Testing Framework for GOAL

Introduction

We have created a unit testing framework for GOAL that takes modules as the basic unit for testing. Here, we provide several examples that illustrate how to use the framework and we discuss its main features.

We start with a simple agent that we have written for the Blocks World. The goal of this agent is to build towers by stacking blocks on top of each other. The agent that we use here has two modules clear and build(Tower). The clear module will put all blocks on the table. The build(Tower) module builds a given tower provided that all blocks in the tower are either clear, already in the right position, or stacked in such a way that picking up a block clears blocks needed later on.

The agent works but only because it will first put all blocks on the table. This isn't very efficient and we'd like to improve on that. However while doing so we'd also like to make sure that we don't break anything.

To do this we first create a test to make sure the build(Tower) module works correctly. Once we've created that test we'll improve the agent and test if it has the desired functionality. The test puts all blocks on the table, executes the build module to build two towers, and finally checks if each tower has been build.

Files:

Test file

Test are written in a .test2g file. We'll provide a brief introduction of layout of the test file.

%blocksworld.test2g

masTest {
	mas = "blocksworld.mas2g".
	timeout = 10. % seconds
	stackbuilder {	
		testBuildModule { 
			do clear.			

			evaluate {
				atend bel(tower([3,2,1])).
			} in do build([3,2,1]).
		
			evaluate {
				always bel(tower([3,2,1])).
				eventually bel(tower([4])).
			} in do build([9,8,7,6,5]) until bel(tower([9,8,7,6,5]).
		}
	}
}

In the unitTest section there are two annotations at the top.

mas
The file name or path of the multi-agent system under test. Files are resolved by verifying whether the file name refers to an absolute path, or whether the file can be found in the local directory from which the test is run.
timeout
Optional. Time out in seconds. A test fails if not all conditions have been satisfied within the specified time out.

In this case, the agent that is being tested is named 'stackbuilder'. This name should match with a name that can be found in the launch rule of the MAS file that is used in the test. The test section that follows can be given any name and can be used to give instructions to make the agent execute a module (or a single action), and evaluate test conditions while executing the module (or action). The scope of the test conditions is identical to the scope of the module or action being called. That is, these conditions are evaluated at the start up till termination of the module.

Three kinds of test sections can be incorporated into a test: a do section for performing actions or modules, an assert section for verifying whether a condition holds, and an evaluate in section for evaluating test conditions while executing a module.

do
The agent will invoke the subsequent action. This can be any action from the action spec, a module invocation or a mental action.
evaluate in
Evaluates temporal mental state conditions while executing a module or action.
assert
Evaluates a mental state condition (now, no temporal operators can be used).

The following temporal operators can be used in test conditions that occur in an evaluate in section:

eventually
The mental state condition should hold at some point in time, either before, during, or right after termination of the evaluate-in block.
always
The mental state condition should always hold, before, during, and right after termination of the evaluate-in block.
never
The mental state condition should never hold, before, during, and right after termination of the evaluate-in block.
atend
The mental state condition should hold right after the module has terminated.
Nesting these operators is possible by using the -> operator. In other words, If a certain test condition on the left part holds, the test condition on the right part is added to the set of conditions that are evaluated (keeping the current variable substitution intact).

The two following operators can indicate when we should stop evaluating an evaluate-in block; the default is to keep running until the corresponding module terminates or the timeout is reached:

until / while
The corresponding do-action will be executed until or while this mental state condition holds. A test fails if this condition is not met.

Running a test file

A test file can be ran from within Eclipse if you're using the Eclipse plugin for GOAL, or else from the command line.

In the Eclipse plug-in, you can either 'run' or 'debug' the test file (by right-clicking such a file). In debug mode, the agent stepper will break as soon as a condition fails, allowing you to inspect the agent's mental state at the code location that caused the condition to fail.

For running a test from the command line, open a console or terminal and do the following:

  • change the working directory to the GOAL installation directory
  • execute java -cp goal.jar -Djava.library.path=swifiles/libs goal.tools.Run <source> [options]

Where <source> is the .test2g file you want to run. For more information see the GOAL User Manual.

Small example

In the Blocks World example that we introduced above, the agent can build two towers when all blocks are on the table. We did not test however what happens when some blocks are already placed in a tower.

To limit the complexity of the test we will create a MAS file that launches a Blocks World with only 3 blocks. Next we create a test that will build a few towers.

We distinguish four test cases here:

  • Building a tower when all blocks are already in that tower.
  • Building a tower when all blocks are already in a tower in reverse order.
  • Building a tower when some blocks are in a tower in reverse order.
  • Building a tower when a block from the middle of the tower is stacked on top of a block at the bottom.

When running the test you may or may not notice that the agent never actually builds all towers.

masTest {
	mas = "smallblocksworld.mas2g".
	timeout = 10. %seconds
	stackbuilder {
		test { 
			do build([3,2,1]).	
			do build([3,2,1]).			
					
			do clear.

			do build([3,2,1]).	
			do build([1,2,3]).			
					
			do clear.
			
			do build([1,2]).
			do build([2,1,3]).		
			
			do clear.
					
			do build([3,2,1]).	
			do build([2,1,3]).
		}
	}
}

To verify that all towers are build, we add (temporal) test conditions to the build modules starting from the second one. Using the atend operator, we will check whether the agent has build the desired tower after the module has been executed. This test will fail when trying to build the last tower.

masTest {
	mas = "blocksworld.mas2g".
	timeout = 10. %seconds
	stackbuilder {
		test {
			do build([3,2,1]).
                        evaluate {
				atend bel(tower([3,2,1])).
			} in do build([3,2,1]).	

			do clear.

			do build([1,2,3]).
			evaluate {
				atend bel(tower([3,2,1])).
			} in do build([3,2,1]).	
					
			do clear.
			
			do build([1,2]).
			evaluate {
				atend bel(tower([3,2,1])).
			} in do build([3,2,1]).		
			
			do clear.
					
			do build([3,2,1]).	
			evaluate {
				atend bel(tower([2,1,3])).
			} in do build([2,1,3]).
		}
	}
}

The test fails because when tower([3,2,1]) is build, block #3 is clear and can be moved to the table, however block #1 is blocked by block #2. At this point the build module can take no further actions and exits. Since this failure is expected, we change the final condition such that it states that the tower has not been build.

			do build([3,2,1]).	
			evaluate {
				atend bel(not(tower([2,1,3]))).
			} in do build([2,1,3]).

Files:

Improving the agent

We now know which towers the agent can and can't build. Using this knowledge we can write knowledge rules to determine if the tower we have in mind can be build. We add these knowledge and new program rules to the main module of the stack builder.

Files:

  • blocksworld.mas2g
  • stackBuilder.goal
  • blocksworld.mas2g
    	knowledge {
    		available([]).
    		available(T) :- tower(T).
    		available([X|T]) :- clear(X), available(T). 
    		available([X,Y|T]) :- on(Y,X), available([Y|T]).
    	}
    
    	goals{
    		on(1,0), on(2,1), on(3,2).
    	}
    
    	program[order=linearall] {
    		if goal(tower(T)),bel(not(tower(T))), bel(available(T)) then build(T).
    		if bel(not(clear)) then clear + insert(clear).
    		
    	}
    

To verify the agent works correctly even when the build module can't build the tower, we build a tower that will block building the tower we need. This means that block #2 will need to be moved to the floor. We can test this using the eventually query. This also means that block #1 should always stay on the table. Using the always query we ensure this is the case. Finally we test if the desired tower has actually been build.

masTest {
	mas = "smallblocksworld.mas2g".
	timeout = 10. %seconds
	stackbuilder {
		test { 
			do build([2,3,1]).	
			evaluate {
				always bel(on(1,0)).
				eventually bel(on(2,0)).
				atend bel(tower([3,2,1])).
			} in do main.
		}
	}
}

In the small blocks world the agent now works correctly and efficiently. However, in a larger example, the clear module may actually undo work that has already been done. So next step would be replacing the if bel(not(clear)) then clear + insert(clear) rule with a rule that only removes a single block, preferably one that will unblock as much as possible; this is left as an exercise for the reader.

Multiple Agents

When testing multiple agents each agent has its own tests section. Because agents are inherently asynchronous, some synchronisation may be needed. An example of this is shown below. Ping will send a message to Pong which will reply with a message back. Ping will then reply again and exit. Pong will wait for this last message and also exit. The wait for modules simply repeat until their goal is met. These may be useful when setting up the test requires coordination between multiple agents.

Files:

  • pingpong.mas2g
  • pingpong.goal
  • pingpong.test2g
    masTest {
    	mas = "pingpong.mas2g".
    	timeout = 10. %seconds
    	ping {	
    		test { 
    			do waitFor(pong).
    			do send(pong,ping).
    			do waitForReceived(pong,pong).
    			do reply.
    		}
    	}
    	pong {
    		test { 
    			do waitFor(ping).		
    			do waitForReceived(ping,ping).		
    			do reply + delete(received(ping,ping).
    			do waitForReceived(ping,ping).			
    		}
    	}
    }
    
    

Unreal Tournament

Testing in unreal tournament show cases a few more tricks.

In addition to waiting for the server to appear, we also have to wait the first batch of percepts from the agent. This is done using the awaitSelf module.

The server agent can be used to manipulate the environment. This is done by sending messages to the server agent through the server module. In the example below the server module is called with the action the server should execute.

To separate the test actions and the actual implementation of the simplectf.goal agent imports test-util.mod2g and server.mod2g. This allows modules used for testing to be kept separated from the regular agent.

%ut3-navigatewithpickups.test2g
masTest {
	mas = "ut3-simplectf-test.mas2g".
	bot {
		test { 
			do await(server).
			do awaitSelf.
			assert bel(game(_,'CTF-FacingWorlds',_,_)).

			do server(spawnItem('PathNode_125',weapon,shock_rifle)).

			do server(respawn('UTTeamPlayerStart_17', rotation(0,0,0))).
			
			evaluate {
				atend bel(atLocation('PathNode_126')).
				atend bel(weapon(shock_rifle,20,20)). 
			} in do navigateWithPicksups('PathNode_126').
		}
	}
}

Files:

Last modified 3 years ago Last modified on Mar 6, 2015, 5:51:07 PM