A-mars : Smart Traffic Signal

Fuzzy Logic Base Traffic Control

1. Introduction

Transportation research has the goal to optimize transportation flow of people and goods. As the number of road users constantly increases, and resources provided by current infrastructures are limited, intelligent control of traffic will become a very important issue in the future. However, some limitations to the usage of intelligent traffic control exist. Avoiding traffic jams for example is thought to be beneficial to both environment and economy, but improved traffic-flow may also lead to an increase in demand [Levinson, 2003]. There are several models for traffic simulation. In our research we focus on microscopic models that model the behavior of individual vehicles, and thereby can simulate dynamics of groups of vehicles. Research has shown that such models yield realistic behavior [Nagel and Schreckenberg, 1992, Wahle and Schreckenberg, 2001].

Cars in urban traffic can experience long travel times due to inefficient traffic light control. Optimal control of traffic lights using sophisticated sensors and intelligent optimization algorithms might therefore be very beneficial. Optimization of traffic light switching increases road capacity and traffic flow, and can prevent traffic congestions. Traffic light control is a complex optimization problem and several intelligent algorithms, such as fuzzy logic, evolutionary algorithms, and reinforcement learning (RL) have already been used in attempts to solve it. In this paper we describe a model-based, multi-agent reinforcement learning algorithm for controlling traffic lights.

In our approach, reinforcement learning [Sutton and Barto, 1998, Kaelbling et al., 1996] with road-user-based value functions [Wiering, 2000] is used to determine optimal decisions for each traffic light. The decision is based on a cumulative vote of all road users standing for a traffic junction, where each car votes using its estimated advantage (or gain) of setting We compare the performance of our model-based RL method to that of other controllers using the Green Light District simulator (GLD). GLD is a traffic simulator that allows us to design arbitrary infrastructures and traffic patterns, monitor traffic flow statistics such as average waiting times, and test different traffic light controllers. The experimental results show that in crowded traffic, the RL controllers outperform all other tested non-adaptive controllers. We also test the use of the learned average waiting times for choosing routes of cars through the city (co-learning), and show that by using co-learning road users can avoid bottlenecks.

2 Modelling and Controlling Traffic

In this section, we focus on the use of information technology in transportation. A lot of ground can be gained in this area, and Intelligent Transportation Systems (ITS) gained interest of several governments and commercial companies [Ten-T expert group on ITS, 2002, White Paper, 2001, EPA98, 1998]. ITS research includes in-car safety systems, simulating effects of infrastructural changes, route planning, optimization of transport, and smart infrastructures. Its main goals are: improving safety, minimizing travel time, and increasing the capacity of infrastructures. Such improvements are beneficial to health, economy, and the environment, and this shows in the allocated budget for ITS.

In this paper we are mainly interested in the optimization of traffic flow, thus effectively minimizing average traveling (or waiting) times for cars. A common tool for analyzing traffic is the traffic simulator. In this section we will first describe two techniques commonly used to model traffic. We will then describe how models can be used to obtain real-time traffic information or predict traffic conditions. Afterwards we describe how information can be communicated as a means of controlling traffic, and what the effect of this communication on traffic conditions will be. Finally, we describe research in which all cars are controlled using computers.

2.1 Modelling Traffic

Traffic dynamics bare resemblance with, for example, the dynamics of fluids and those of sand in a pipe. Different approaches to modelling traffic flow can be used to explain phenomena specific to traffic, like the spontaneous formation of traffic jams. There are two common approaches for modelling traffic; macroscopic and microscopic models.

2.1.1 Macroscopic models.

Macroscopic traffic models are based on gas-kinetic models and use equations relating traffic density to velocity [Lighthill and Whitham, 1955, Helbing et al., 2002]. These equations can be extended with terms for build-up and relaxation of pressure to account for phenomena like stop-and-go traffic and spontaneous congestions [Helbing et al., 2002, Jin and Zhang, 2003, Broucke and Varaiya, 1996]. Although macroscopic models can be tuned to simulate certain driver behaviors, they do not offer a direct, flexible, way of modelling and optimizing them, making them less suited for our research.

2.1.2 Microscopic models.

In contrast to macroscopic models, microscopic traffic models offer a way of simulating various driver behaviors. A microscopic model consists of an infrastructure that is occupied by a set of vehicles. Each vehicle interacts with its environment according to its own rules. Depending on these rules, different kinds of behavior emerge when groups of vehicles interact.

· Cellular Automata.

One specific way of designing and simulating (simple) driving rules of cars on an infrastructure, is by using cellular automata (CA). CA use discrete partially connected cells that can be in a specific state. For example, a road-cell can contain a car or is empty. Local transition rules determine the dynamics of the system and even simple rules can lead to chaotic dynamics. Nagel and Schreckenberg (1992) describe a CA model for traffic simulation. At each discrete time-step, vehicles increase their speed by a certain amount until they reach their maximum velocity. In case of a slower moving vehicle ahead, the speed will be decreased to avoid collision. Some randomness is introduced by adding for each vehicle a small chance of slowing down. Experiments showed realistic behavior of this CA model on a single road with emerging behaviors like the formation of start-stop waves when traffic density increases.

· Cognitive Multi-Agent Systems.

A more advanced approach to traffic simulation and optimization is the Cognitive Multi-Agent System approach (CMAS), in which agents interact and communicate with each other and the infrastructure. A cognitive agent is an entity that autonomously tries to reach some goal state using minimal effort. It receives information from the environment using its sensors, believes certain things about its environment, and uses these beliefs and inputs to select an action. Because each agent is a single entity, it can optimize (e.g., by using learning capabilities) its way of selecting actions. Furthermore, using heterogeneous multi-agent systems, different agents can have different sensors, goals, behaviors, and learning capabilities, thus allowing us to experiment with a very wide range of (microscopic) traffic models. Dia (2002) used a CMAS based on a study of real drivers to model the drivers’ response to travel information. In a survey taken at a congested corridor, factors influencing the choice of route and departure time were studied. The results were used to model a driver population, where drivers respond to presented travel information differently. Using this population, the effect of different information systems on the area where the survey was taken could be simulated. The research seems promising, though no results were presented.

2.2 Predicting Traffic

The ability to predict traffic conditions is important for optimal control. For example, if we would know that some road will become congested after some time under current conditions, this information could be transmitted to road users that can circumvent this road, thereby allowing the whole system to relieve from congestion. Furthermore, if we can accurately predict the consequences of different driving strategies, an optimal (or at least optimal for the predicted interval) decision can be made by comparing the predicted results.

The simplest form of traffic prediction at a junction is by measuring traffic over a certain time, and assuming that conditions will be the same for the next period. One approach to predicting is presented in [Ledoux, 1996], where neural networks are used to perform long-term prediction of the queue length at a traffic light. A multi-layer perceptron [Rumelhart et al., 1986] is trained to predict the queue length for the next time-step, and long-term predictions can be made by iterating the one-step predictor. The resulting network is quite accurate when predicting ten steps ahead, but has not yet been integrated into a controller.

A traffic prediction model that has been applied to a real-life situation, is described in [Wahle and Schreckenberg, 2001]. The model is a multi-agent system (MAS) where driving agents occupy a simulated infrastructure similar to a real one. Each agent has two layers of control; one for the (simple) driving decision, and one for tactical decisions like route choice. The real world situation was modelled by using detection devices already installed. From these devices, information about the number of cars entering and leaving a stretch of road are obtained. Using this information, the number of vehicles that take a certain turn at each junction can be inferred. By instantiating this information in a faster than real-time simulator, predictions on actual traffic can be made. A system installed in Duisburg uses information from the existing traffic control center and produces real-time information on the Internet. Another system was installed on the freeway system of North Rhine-Westphalia, using data from about 2.500 inductive loops to predict traffic on 6000 km of roads.

2.3 Controlling traffic by communicating traffic conditions

Once accurate (predictive) information is available, there are several ways of communicating it to road users. Drivers could be presented with information through dynamic road signs, radio, or even on-board navigation systems. Several studies have shown the effects of the availability of relevant information. Levinson (2003) uses a micro-economic model to consider the cost of a trip, and increases system reliability, since congestions can be better avoided. This results in a supply curve shift. Experiments show that increasing the percentage of informed drivers reduces the average travel time for both informed and uninformed drivers. The travel time reduction is largest in crowded traffic. In the case of unexpected congestions (for example due to accidents) informed travellers reduce their travel time by switching routes, but as a result of this the alternative routes become more crowded, possibly increasing travel times for uninformed drivers.

Emmerink et al. (1996) present the results of a survey taken amongst drivers in Amsterdam. The results show that 70% of the drivers sometimes use information presented through radio or variable message signs to adapt their routes. Both media are used in similar ways, and commuters are less likely to be influenced by information than people with other trip purposes. Business drivers indicated that they would be willing to pay for in-vehicle information. Simulation could be used to test control strategies before they are implemented in real-life environments. Ben-Akiva et al. (2003) tested different strategies for a large highway project, such as highway access control, drivers’ route choice, and lane control. The simulator offered a way of testing different configurations of vehicle-detectors, and showed that interacting control systems could actually worsen traffic conditions. Integration of strategies requires careful analysis of the impact of each component to avoid interference. A simulator shows to be a useful tool for such an analysis.

2.4 Vehicle Control

It is a well-known fact that traffic flow would increase drastically if all drivers would drive at the same (maximum) speed. Another fact is that this will never happen if you let drivers decide. In this section we first show how vehicles could learn to cooperate. We then describe an ambitious research program that aims to control all vehicles by on-board computers. Moriarty and Langley (1998) have used reinforcement learning for distributed traffic control. Their approach enabled cars to learn lane selection strategies from experience with a traffic simulator. Experimental studies showed that learned strategies let drivers more closely match their desired speeds than hand-crafted controllers and reduce the number of lane changes. Their approach, like ours, focuses on distributed car-based controllers, which makes it easy to take specific desires/goals of drivers into account such as desired speed or destination.

In the California Partners for Advanced Transit and Highways (PATH) program, the Automated Highway System (PATH-AHS) project aims to completely automate traffic [Horowitz and Varaiya, 2000]. Cars on special roads would travel in so-called platoons. A platoon is a number of cars that travel at high speed, with little distance in between. Each car controls its own speed and lateral movement, and makes sure it follows the leader. The leader navigates the platoon, and makes sure that there is enough space between platoons. In order to optimize flow, a platoon leader receives information about the optimal speed from a roadside coordinating system. Because of this, and the fact that there is little distance in between cars in a platoon, an AHS is said to be able to increase road capacity by a factor of about four. Another aspect of traffic control is controlling traffic lights in a way that minimizes the time drivers have to wait. We will describe previous research in this area and our car-based, multi-agent reinforcement learning algorithm in section 4. First we will discuss reinforcement learning. learning (RL) agent is able to learn a policy (or plan) that optimizes the cumulative reward.

3. Traffic Light Control

Traffic light optimization is a complex problem. Even for single junctions there might be no obvious optimal solution. With multiple junctions, the problem becomes even more complex, as the state of one light influences the flow of traffic towards many other lights. Another complication is the fact that flow of traffic constantly changes, depending on the time of day, the day of the week, and the time of year. Roadwork and accidents further influence complexity and performance.

In practice most traffic lights are controlled by fixed-cycle controllers. A cycle of configurations is defined in which all traffic gets a green light at some point. The split time determines for how long the lights should stay in each state. Busy roads can get preference by adjusting the split time. The cycle time is the duration of a complete cycle. In crowded traffic, longer cycles lead to better performance. The offset of a cycle defines the starting time of a cycle relative to other traffic lights. Offset can be adjusted to let several lights cooperate, and for example create green waves. Fixed controllers have to be adapted to the specific situation to perform well. Often a table of time-specific settings is used to enable a light to adapt to recurring events like rush hour traffic. Setting the control parameters for fixed controllers is a lot of work, and controllers have to be updated regularly due to changes in traffic situation. Unique events cannot be handled well, since they require a lot of manual changes to the system. Fixed controllers could respond to arriving traffic by starting a cycle only when traffic is present, but such vehicle actuated controllers still require lots of fine-tuning. Most research in traffic light control focuses on adapting the duration or the order of the control cycle. In our approach we do not use cycles, but let the decision depend on the actual traffic situation around a junction, which can lead to much more accurate control. Of course, our approach requests that information about the actual traffic situation can be obtained by using different sensors or communication systems. We will first describe related work on intelligent traffic light control, and then describe our car-based reinforcement learning algorithm.

3.1 Related Work in Intelligent Traffic Light Control

3.1.1 Expert Systems.

An expert system uses a set of given rules to decide upon the next action. In traffic light control, such an action can change some of the control parameters. Findler and Stapp (1992) describe a network of roads connected by traffic light-based expert systems. The expert systems can communicate to allow for synchronization. Performance on the network depends on the rules that are used. For each traffic light controller, the set of rules can be optimized by analyzing how often each rule fires, and the success it has. The system could even learn new rules. Findler and Stapp showed that their system could improve performance, but they had to make some simplifying assumptions to avoid too much computation.

3.1.2 Prediction-based optimization.

Tavladakis and Voulgaris (1999) describe a traffic light controller using a simple predictor. Measurements taken during the current cycle are used to test several possible settings for the next cycle, and the setting resulting in the least amount of queued vehicles is executed. The system seems highly adaptive, and maybe even too much so. Since it only uses data of one cycle, it could not handle strong fluctuations in traffic flow well. In this case, the system would adapt too quickly, resulting in poor performance.

Liu et al. (2002) introduce a way to overcome problems with fluctuations. Traffic detectors at both sides of a junction and vehicle identification are used to measure delay of vehicles at a junction. This is projected to an estimated average delay time using a filter function to smooth out random fluctuations. The control system tries to minimize not only the total delay, but the summed deviations from the average delay as well. Since it is no longer beneficial to let a vehicle wait for a long time, even if letting it pass would increase the total waiting time, this introduces a kind of fairness. Data of about 15 minutes is used to determine the optimal settings for the next cycle, and even using a simple optimization algorithm, the

system performs well compared to preset and actuated controllers.

3.1.3 Evolutionary Algorithms.

Taale et al. (1998) compare using evolutionary algorithms (a (μ, _) evolution strategy [Rechenberg,1989]) to evolve a traffic light controller for a single simulated intersection to using the common traffic light controller in the Netherlands (the RWS C-controller). They found comparable results for both systems. Unfortunately they did not try their system on multiple coupled intersections, since dynamics of such networks of traffic nodes are much more complex and learning or creating controllers for them could show additional interesting behaviors and research questions.

3.2 Our Approach

We are interested in the behavior of cars and traffic lights. Each cycle, cars can wait 1 time step at a traffic light, they can drive to the next position of a road-lane, or they can cross an intersection and go to another road-lane. We define a (waiting) queue as all cars that are immediately affected by the setting of their traffic light, because they will have to brake for a red light. Note that a car standing on the first place of a road-lane is always in the queue. If a car is the first in the queue and the light is green, it crosses the intersection and starts on the last place (cell) of the new road-lane for the next traffic light. After one or more time-steps, it will finally end up at some place in a new queue at the next traffic light or it reaches its destination (and exits the infrastructure). The goal is to minimize the cumulative waiting time of all cars before all traffic lights met before exiting the city. There are two ways of approaching this problem:

• Traffic-light based controllers.

We can make a controller for each traffic node, taking into account environmental inputs such as the number of cars waiting at each of the 8 directions and learning a value function mapping environmental states and traffic node decisions to the overall waiting time until all cars standing at the intersection have exited the city.

• Car-based controllers. We can make a predictor for each car to estimate the waiting time of the car alone when the light is green or red and combine all car predictions to make the decision of the traffic light. Making a control policy and learning a value function for each traffic node (as done by Thorpe) has the disadvantage that the number of situations for a traffic node can be huge.

Furthermore, it is difficult to compute total trip waiting times for all road users standing at the traffic node, since this quantity has a huge variance. Although we could cope with these issues by designing a special traffic-node representation summing individual waiting times of cars, this system does not allow for communicating information to cars or for making decisions for cars (e.g. which paths they should take).

3.2.1 Car-based traffic light control.

Therefore, we chose the second possibility that allows for a lot of flexibility, since we have the option to model each car’s destination, position, and possibly speed. This is also a natural system, since if cars would exactly know their overall waiting time until their destination (note that this would require static traffic patterns) when the light is green or red, a voting system that adds all waiting times for different traffic node decisions can be used to minimize the overall waiting time. Furthermore, the number of states for a car is not so large. For example if we use the information that the car is at a traffic node, occupies some place, is at some direction, and has some destination, this makes a feasible number of car-states which can be stored in lookup tables. Note that we take the destination address of cars into account. This allows us to compute the global waiting time until the car has reached its destination.

Finally, by computing different expected waiting times for a car when it is standing at one of the different directions (traffic lights) before a traffic node, we can also use this information to choose a path to the destination address for each car. This makes co-adaptivity (co-learning) possible in which both the traffic nodes and the driving policies of cars are optimized.

3.2.2 Car-based value functions.

To optimize the settings of traffic lights, we can simply sum all expected waiting times of all given all possible choices of a traffic light, and select the decision which minimizes the overall waiting time at the intersection. For this, we need to learn a value function which estimates for each car how long its waiting time will be until it has reached its destination

address given that the light is green or red.

3.2.3 Global design of the system.

Each car is at a specific traffic-node (node), a direction at that node (dir)1, a position in the queue (place), and has a particular destination address (des). We are interested in estimating the total waiting time for all traffic lights for each car until it arrives at the destination address given its current node, direction, place, and the decision of the light (red or green). We write Q([node, dir, place, destination], action) to denote this value, which is sometimes shortly denoted as Q([n, d, p, des], L). We write V ([node, dir, place, destination]) to denote the average waiting time (without knowing the traffic light decision) for a car at (node, dir, place) until it has reached its destination address. Given the current traffic situation, we can make a choice for each traffic node node as follows. All cars standing at those directions of the traffic node which are made green by the decision by measuring and communicating the distance to the traffic light, or by reliable inductive loops in the roads). Furthermore, the algorithm uses the destination address of cars which should be communicated to the traffic light.

3.2.4 Some notes on the behavior of the controllers.

If we analyze the learning equations and the voting mechanism, we can observe how the behavior of the adaptive traffic light controller will be:

• If many cars are waiting at a traffic light, they count more in the overall decision process than if only a single car has to wait. Therefore long queues are usually passed first.

• If the first car has a large advantage of exiting immediately, instead of entering a new long queue, this car may get a stronger vote since we expect its gain to be larger.

• If cars at some light have to wait a long time, the probability of waiting gets large. Note that if this probability goes to 1, the overall computed expected waiting time for the red light goes to infinity2. Since large waiting probabilities mean large expected waiting times and thus large votes against having to wait, the individual waiting times of cars are likely to be spread across the traffic lights of a traffic node. In this perspective, the traffic light controllers are very fair.

• The traffic light controllers are adapting themselves all the time. Therefore they can handle changing traffic flow patterns, although it may cost some time for the algorithm to adjust the parameters.

3.2.5 Adapting the system parameters.

For adapting the system, we only need to compute the state transition probabilities, since the reward function is fixed (standing still costs 1, otherwise the reward/cost is 0). First of all we compute P(L|[node, dir, place, des]), the probability that a light is red or green for a car with some destination at a specific place. For computing this probability we update counters after each simulation step: C([node, dir, place, des], L) counts the number of times the light was red or green when a car was standing in the queue at the position determined by place. C([node, dir, place, des]) denotes the number of cars with some destination which have been standing in the queue at place. From this, we can compute the probability:

P(L|[node, dir, place, des]) = C([node, dir, place, des], L) C([node, dir, place, des])

Furthermore, we have to compute the state transition probabilities for the first (or other cars). In the general version this is: P([node, dir, place, des], L, [n, d, p]). For cars which are not able to cross an intersection, this is quite trivial. There are only few possible statetransitions; a next place or the current place. State transitions are stored in variables:

C([node, dir, place, des], L, [n, d, p]). If the car has made a transition to the next traffic light, it arrives at a new place at the next road-lane, and we also update the counter variables. To compute the requested probability, we normalize these counters by dividing each counter value by the total number of times a car was standing at a particular road-cell (with the specific destination of the car):

P([node, dir, place, des], L, [n, d, p]) = C([node, dir, place, des], L, [n, d, p]) C([node, dir, place, des], L)

3.3 Discussion

As mentioned before, most traffic light controllers are fixed-cycle controllers, in which all alternative traffic lights settings get a particular time-interval for being green. In our approach, we use the actual situation to set the traffic lights, and therefore we have much more possibilities for optimizing the traffic light settings. Furthermore, our algorithm learns automatically, and does not require any manual tuning. This also leads to an adaptive system which can cope with changing traffic conditions. Since we use car-based value functions instead of traffic-light based value functions as used by Thorpe (1997), we do not need to use an approximation technique such as neural networks for learning the value function. Instead by using voting, we are able to get very accurate predictions of the consequences of different traffic light settings. Furthermore, our system is very fair; it never lets one vehicle wait all the time, not even if competing road-lanes are much more crowded. The fairness is a result of the learning and voting mechanism, and we did not need to design this ad-hoc. Finally, we can use the value functions for optimizing driving policies, where cars drive to minimize their waiting times, and traffic light controllers set the lights to minimize the waiting times as well.

This co-learning system exploits the learned value functions and leads to very interesting co-operative multi-agent system dynamics. We have not yet included green waves in our system, although we have included a simple technique for dealing with congested traffic. The bucket algorithm, which we use, propagates gains of one traffic light to the next lane, if the first car on a lane cannot drive onwards, because the next lane is full. This will lead to higher gain values for setting green lights to road-lanes that are full, and for which other cars have to wait if their light is set to green. This bucket technique is used in our current experiments, but needs more research to work fruitfully with our adaptive model-based RL algorithm.

Figure 1: A screenshot from the Green Light District simulator.

4. Green Light District

We used the Green Light District (GLD)3 traffic simulator for our experiments. GLD consists of an editor to define infrastructures (based on cellular automata), a Multi-Agent System (MAS) to run the simulation, and a set of controllers for the agents. The simulator has several statistical functions to measure performance. In the next section we will describe the simulator.

4.1 The Simulator

4.1.1 Infrastructures.

An infrastructure consists of roads and nodes. A road connects two nodes, and can have several lanes in each direction (see Figure 1). The length of each road is expressed in units. A node is either a junction where traffic lights are operational (although when it connects only two roads, no traffic lights are used), or an edge-node.

4.1.2 Agents.

There are two types of agents that occupy an infrastructure; vehicles and traffic lights. All agents act autonomously, following some simple rules, and get updated every time-step. Vehicles enter the network at the edge-nodes. Each edge-node has a certain probability of generating a vehicle at each time step. Each vehicle that is generated is assigned a destination, which is one of the other edge-nodes. The distribution of destinations for each edge-node can be adjusted.

There are several types of vehicles, defined by their speed, length, and number of passengers. For our experiments we only used cars, which move at a speed of two units (or one or 3GLD is free, open-source software, and can be downloaded from {http : //sourceforge.net/projects/stoplicht/} zero if they have to brake) per time step, have a length of two units, and have two passengers.

The state of each vehicle is updated every time step. It either moves with the distance given by its speed, or stops when there is another vehicle or a red traffic light ahead. At a junction, a car decides to which lane it should go next according to its driving policy. Once a car has entered a lane, it cannot switch lanes.

Junctions can be occupied by traffic lights. For each junction, there are a number of possible ways of switching the lights that are safe. At each time-step, the traffic light controller decides which of these is the best. It can use information on waiting vehicles and their destination, and about other traffic lights to make this decision.

4.2 Driving Policies

At a junction, a car has to decide which lane to go to next in order to reach its destination. The way this decision is made can be changed by using different driving policies.

4.2.1 Shortest Path.

For all junctions, there is a list of shortest paths to every destination. All paths that are no more than ten percent longer than the shortest path are in this list. When a decision has to be made, one of these shortest paths is selected randomly.

4.2.2 Co-learning.

The “theory” of co-learning was already explained in section 4.2. In GLD, Co-learning is implemented as follows. When a driving decision has to be made, all the shortest paths to the destination are collected. The shortest path with the lowest co-learn value is selected for the car intending to cross the intersection. The selected path should be the path to the destination with minimal expected waiting time.

5. Conclusions

In this article we first showed that traffic control is an important research area, and its benefits make investments worthwhile. We described how traffic can be modelled, and showed the practical use of some models. In section 3 we explained reinforcement learning, and showed its use as an optimization algorithm for various control problems. We then described the problem of traffic light control and several intelligent traffic light controllers, before showing how car-based reinforcement learning can be used for the traffic light control problem. In our approach we let cars estimate their gain of setting their lights to green and let all cars vote to generate the traffic light decision. Co-learning is a special feature of our car-based reinforcement learning algorithm that allows drivers to choose the shortest route with lowest expected waiting time.

We performed three series of experiments, using the Green Light District traffic simulator. We described how this simulator works, and which traffic light controllers were tested. The experiments were performed on three different infrastructures. The first experiment, which uses a large grid, shows that reinforcement learning is efficient in controlling traffic, and that the use of co-learning further improves performance. The second experiment shows that using co-learning vehicles avoid crowded intersections.

This way, vehicles avoid having to wait, and actively decrease pressure on crowded intersections. The third experiment shows that RL algorithms on more complex and city-like infrastructure again outperform the fixed controllers by reducing waiting time with more than 25%. The third experiment also shows that in some situations a simplified version of the reinforcement learning algorithm performs as well as the complete version, and that co-learning not always increases performance.

5.1 Further Research

Although the reinforcement learning algorithm presented here outperforms a number of fixed algorithms, there are several improvements that could be researched. For example, we can use communication between road-lanes to make green waves possible, and let estimated waiting times depend on the amount of traffic on the next road-lane.

The co-learning driving policy might be improved as well. The current implementation suffers from saturation and oscillation. Because all drivers on a route choose the optimal lane, this lane might become crowded. Only when the performance of such a lane decreases because of this crowding, drivers will choose another lane. A less greedy form of co-learning might prevent this effect. Although the bucket algorithm works well for the fixed algorithms, it did not work well together with the RL algorithms. We have to study this more carefully, since the bucket algorithm, when designed well, may help in creating green waves in very crowded traffic conditions.

The simulator might be refined as well, to allow for comparison with other research. Refinements could include more complex dynamics for the vehicles and other road users, aswell as an implementation of fixed-cycle traffic-light controllers.

A-mars

A-mars RC

Labels

Monday, 19 May 2014

CONTROL OF TRAFFIC USING SMART SIGNALING

Fuzzy Logic Base Traffic Control

Popular Posts