Enhance AI Observation: Add Map Geometry To Game Space
The AI agent's ability to perceive and interact with its environment is paramount to its success in any virtual world. In the fast-paced and dynamic arena of Towerfall, understanding the physical layout of the map isn't just helpful – it's crucial. Imagine trying to navigate a complex level or plan a tactical move without any awareness of walls, platforms, or obstacles. That's precisely the challenge our AI agents have faced. This article dives into TASK-013: Add map geometry encoding to observation space, a vital upgrade that equips our machine learning agents with a fundamental understanding of the game's spatial environment. By encoding static map geometry, or blocks, directly into the agent's observation space, we're moving beyond simple object tracking to providing a richer, more intuitive sense of the game world. This enhancement is a direct extension of the foundational work done in TASK-012, which focused on observation space normalization, and paves the way for more sophisticated AI behaviors and decision-making.
This endeavor is part of the larger Epic #14 (Observation & Action Space), aiming to comprehensively define and refine how our AI agents perceive and act within the game. The core idea is simple yet powerful: bots should be able to see the map. The map data, represented in the Go backend by text files where 'B' characters denote solid blocks, is static per game session. Each 'B' translates into a BlockGameObject with defined polygon points, and the entire map operates on a grid system where 20 pixels equal one meter. Typical map dimensions hover around 40x30 blocks, translating to 800x600 pixels. This static, grid-based nature lends itself perfectly to an efficient encoding strategy, ensuring our agents don't just react to immediate threats but can proactively plan their movements and strategies based on the immutable structure of the game world. This is about building smarter, more spatially aware AI.
Understanding the Game's Canvas: Encoding Map Geometry
To effectively add map geometry encoding to the observation space, we need a robust and efficient method for representing the static map data. Given that the map is unchanging within a game session and laid out on a grid, a binary occupancy grid emerges as the most suitable encoding strategy. This approach simplifies the complex spatial information into a digestible format for our AI agents. Instead of processing individual block features, which could become cumbersome, we're creating a 2D grid where each cell represents a small portion of the map. A cell is marked as 1.0 if it's occupied by a solid block, indicating impassable terrain, or -1.0 if it's empty and passable. This binary representation provides a clear, unambiguous signal about the traversable and non-traversable areas of the map. The resolution of this grid is a key consideration, balancing the need for detail with the constraints of the observation vector size. We've defined several options: full resolution (matching the block size, resulting in 40x30 or 1200 values for a default map), downsampled 2x (a recommended default of 20x15, yielding 300 values), and downsampled 4x (10x8, or 80 values, for a more generalized, large-scale awareness). This flexibility allows us to tune the map representation based on the specific needs of the AI architecture and the computational resources available. The MapEncodingConfig dataclass serves as the central hub for managing these configuration parameters, ensuring consistency and ease of modification as we continue to refine our AI's perception capabilities. This meticulous approach to encoding ensures that the map's geometry is not just present but is meaningfully integrated into the agent's understanding of its surroundings, enabling more informed navigation and strategic gameplay.
Implementation: Bringing the Map into Focus
The implementation details for adding map geometry to the observation space are centered around creating new components within our bot2 framework and integrating them seamlessly. The core logic resides in bot2/observation/map_encoder.py, a new file dedicated to handling the conversion of block states into the grid-based observation format. This module works in tandem with bot2/observation/observation_space.py, where the ObservationBuilder is extended to incorporate this new data stream. At the heart of the configuration is the MapEncodingConfig dataclass. This structure holds crucial parameters such as grid_width, grid_height, and the pixel dimensions of the room (room_width_px, room_height_px). It also conveniently provides a total_size property, calculated as grid_width * grid_height, which helps in determining the final observation vector dimensions. For instance, the default configuration of 20x15 results in 300 values for the map component.
The MapEncoder class is the workhorse of this system. Its constructor accepts a MapEncodingConfig and is designed for efficiency. It maintains a cached version of the grid and a hash of the block IDs. The primary method, encode(blocks: list[BlockState]), takes a list of block states from the game state, calculates a hash of these blocks, and compares it against the cached hash. If they match, indicating no change in map geometry since the last observation, it returns the cached grid. Otherwise, it proceeds to convert the blocks into a 2D occupancy grid using _blocks_to_grid and then downsamples it to the configured resolution via _downsample_grid. The resulting flattened 1D array, with values in the [-1, 1] range, is returned. To parse the incoming block data, we've introduced a BlockState Pydantic model. This model gracefully handles the id and points (corner coordinates) of each block, providing convenient properties like center and grid_indices for easier manipulation and mapping to our occupancy grid. This structured approach ensures that the map data is accurately interpreted and transformed into a format that our AI agents can readily utilize for spatial reasoning.
Integrating Spatial Awareness into the Observation Vector
The integration of map geometry into the AI's observation space is a carefully orchestrated process, extending the existing ObservationConfig and ObservationBuilder. The ObservationConfig now includes new fields: map_encoding (an instance of the MapEncodingConfig dataclass, with a default factory for convenience) and include_map (a boolean flag to easily toggle map encoding on or off). The total_size property of ObservationConfig is dynamically updated to reflect the addition of the map grid's size when include_map is set to True. This ensures that the observation space dimensions are always accurate.
When the ObservationBuilder.build() method is called, it first constructs the existing observation components (own player state, other players, arrows, etc.). Subsequently, if include_map is True, it extracts the block information from the game_state. This block data, typically found within ObjectStates, is then passed to the map_encoder.encode() method. The resulting map observation vector, a flattened 1D array representing the occupancy grid, is then inserted into its designated slice within the overall observation vector. This meticulous placement ensures that the map data is consistently positioned, allowing the AI to rely on a predictable structure. The updated observation vector layout now includes the map grid, significantly increasing the total number of values. For example, with the default 20x15 map grid (300 values), the new total observation vector size becomes 414 values (14 for own player + 36 for other players + 64 for arrows + 300 for the map). This expanded vector provides a much more comprehensive picture of the game state, empowering the AI with the spatial context needed for intelligent decision-making.
Optimizing Perception: Caching and Player-Centric Views
Caching the encoded map geometry is a critical optimization, recognizing that the map is static throughout a single game session. Recomputing the occupancy grid from scratch for every single observation would be computationally wasteful. Therefore, our implementation incorporates a smart caching strategy. The MapEncoder class stores the generated grid and a hash of the block IDs. On subsequent calls to encode(), it first calculates the hash of the current block IDs. If this hash matches the cached hash, it means the map hasn't changed, and the encoder can immediately return the previously computed and stored grid. This significantly speeds up observation generation after the initial computation. Only when a new game starts, indicated by a change in the block ID hash, does the encode() method perform the full computation, updates the cache, and then returns the new grid. This ensures that the AI benefits from spatial awareness without incurring unnecessary performance overhead.
Beyond the global map encoding, we've also considered an alternative encoding strategy: a player-centric local view. This approach is particularly relevant for certain Reinforcement Learning architectures, such as Convolutional Neural Networks (CNNs), which often perform better when processing localized, grid-like inputs. Instead of a global map view, a LocalMapEncoder class can be implemented. This encoder would generate a map grid centered around the player's current position. By specifying a view_radius (e.g., 5 units), it creates a square grid of (2*view_radius + 1) x (2*view_radius + 1) cells. For a view_radius of 5, this results in an 11x11 grid, totaling 121 values. This local view dynamically updates as the player moves, always providing the AI with a relevant snapshot of the immediate surroundings. This player-centric perspective can be more efficient for agents that primarily focus on local interactions and navigation, offering a different, yet equally valuable, way to incorporate spatial information into the observation space. Both the global and local approaches offer distinct advantages, and the choice between them may depend on the specific AI model and task requirements.
Defining Success: Acceptance Criteria and Test Cases
To ensure the successful implementation of map geometry encoding, a clear set of acceptance criteria and robust test cases have been defined. The acceptance criteria serve as a checklist, guaranteeing that all essential components and functionalities are in place. These include the creation of the MapEncodingConfig dataclass with configurable grid dimensions, and the BlockState Pydantic model for accurate parsing of block data from the game state. Crucially, the MapEncoder class must correctly convert block lists into a normalized occupancy grid with values in the [-1, 1] range, where -1 signifies empty space and 1 denotes solid blocks. The implementation of caching to prevent redundant computations for static maps is also a key requirement. Furthermore, the ObservationConfig must be extended with map_encoding and include_map options, and the ObservationBuilder needs to seamlessly integrate the map encoding into the final observation vector. Adherence to type hinting guidelines and correct import structures are also specified.
To validate these implementations, a comprehensive suite of test cases is essential. These tests cover various scenarios to ensure the encoder's accuracy and robustness. Test Case 1: Empty map verifies that with no blocks present, all grid cells are correctly assigned -1. Test Case 2: Full floor map checks the representation of a simple, single-layer floor. Test Case 3: Complex map uses scenarios like gauntlet or tower_1 with multi-level platforms to ensure intricate geometries are handled properly. Test Case 4: Downsampling specifically tests the accuracy of the downsampling process, ensuring blocks spanning multiple grid cells are represented correctly. Test Case 5: Caching confirms that the caching mechanism works as expected, returning cached results for identical map configurations and recomputing when the map changes. Finally, Test Case 6: Edge cases examine scenarios like blocks at the map's boundaries or partially obscured blocks to ensure they are handled without errors. These tests, along with examples of map structures from the codebase like the default map, Tower_1, and Gauntlet map, provide a thorough validation framework, ensuring that the map geometry encoding functions reliably and accurately under diverse conditions.
Map Layouts: The Building Blocks of Our World
The visual representation and structure of maps are fundamental to understanding how the map geometry encoding to observation space functions. The codebase provides several example maps that illustrate the variety and complexity our system needs to handle. The Default map, a relatively simple 40x30 grid, features a single row of blocks forming a floor at the bottom, represented by a line of 'B' characters in its layout file (.BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB.). This serves as a basic test case for straightforward geometry. Moving to more complex environments, the Tower_1 map (also 40x30) introduces multi-level platforms at various heights and a central vertical structure, demanding more sophisticated spatial reasoning from the AI. Finally, the Gauntlet map (42x32) presents a highly intricate obstacle course with top and bottom borders, multiple platform levels, and distinct left and right wall sections. These varied layouts, from simple floors to complex mazes, are crucial for testing the robustness of our encoding strategy. They ensure that the binary occupancy grid, whether at full resolution or downsampled, accurately captures the traversable and non-traversable areas regardless of the map's complexity. By testing against these diverse examples, we confirm that the AI agent receives a faithful representation of its environment, enabling it to navigate, strategize, and ultimately perform better in the game. The efficiency of the grid encoding, as opposed to individual block features, is particularly evident here, as it scales effectively across these different map complexities while remaining memory-efficient.
Future Directions and Considerations
While the current implementation of map geometry encoding to observation space provides a significant leap forward in AI spatial awareness, there are several avenues for future enhancement and consideration. The grid encoding itself is a powerful tool, offering a more memory-efficient representation compared to encoding each block as a separate feature. This efficiency is crucial as we aim to increase the complexity of observable features without exponentially growing the observation space. The current downsampling strategy strikes a balance between spatial precision and observation size, but further research into optimal downsampling ratios for different AI architectures could yield even better results. For instance, tailoring the downsampling based on the specific needs of a CNN versus a traditional feed-forward network might be beneficial.
Looking ahead, integrating CNN-based architectures that can directly process the 2D grid representation is a natural and promising next step. Instead of flattening the grid into a 1D vector, feeding the 2D occupancy grid directly into a CNN could allow the network to learn spatial hierarchies and patterns more effectively, potentially leading to more sophisticated navigation and tactical behaviors. Furthermore, while the current caching strategy effectively handles static maps within a single session, future iterations could explore dynamic map updates or procedurally generated environments, requiring more advanced caching or real-time encoding mechanisms. The consideration of a player-centric local view as an alternative encoding is also a key development, offering flexibility for different AI models. Ultimately, the goal is to provide the AI with the most informative and efficient representation of the game world possible, enabling it to master the intricacies of Towerfall. The map is encoded only once per game session due to caching, a testament to our focus on performance and efficiency as we continuously evolve our AI's capabilities.
For further exploration into AI and game development, you can find valuable resources at ** OpenAI** and ** DeepMind**.