Fuzzy Logic Base Traffic Control
1. Introduction
Transportation
research has the goal to optimize transportation flow of people and goods. As
the number of road users constantly increases, and resources provided by
current infrastructures are limited, intelligent control of traffic will become
a very important issue in the future. However, some limitations to the usage of
intelligent traffic control exist. Avoiding traffic jams for example is thought
to be beneficial to both environment and economy, but improved traffic-flow may
also lead to an increase in demand [Levinson, 2003]. There are several models
for traffic simulation. In our research we focus on microscopic models that
model the behavior of individual vehicles, and thereby can simulate dynamics of
groups of vehicles. Research has shown that such models yield realistic
behavior [Nagel and Schreckenberg, 1992, Wahle and Schreckenberg, 2001].
Cars in urban
traffic can experience long travel times due to inefficient traffic light
control. Optimal control of traffic lights using sophisticated sensors and
intelligent optimization algorithms might therefore be very beneficial.
Optimization of traffic light switching increases road capacity and traffic
flow, and can prevent traffic congestions. Traffic light control is a complex
optimization problem and several intelligent algorithms, such as fuzzy logic,
evolutionary algorithms, and reinforcement learning (RL) have already been used
in attempts to solve it. In this paper we describe a model-based, multi-agent
reinforcement learning algorithm for controlling traffic lights.
In our approach,
reinforcement learning [Sutton and Barto, 1998, Kaelbling et al., 1996] with
road-user-based value functions [Wiering, 2000] is used to determine optimal
decisions for each traffic light. The decision is based on a cumulative vote of
all road users standing for a traffic junction, where each car votes using its
estimated advantage (or gain) of setting We compare the performance of our
model-based RL method to that of other controllers using the Green Light District simulator (GLD). GLD is a traffic simulator that allows us to
design arbitrary infrastructures and traffic patterns, monitor traffic flow
statistics such as average waiting times, and test different traffic light
controllers. The experimental results show that in crowded traffic, the RL
controllers outperform all other tested non-adaptive controllers. We also test
the use of the learned average waiting times for choosing routes of cars
through the city (co-learning), and show that by using co-learning road users
can avoid bottlenecks.
2 Modelling and Controlling Traffic
In
this section, we focus on the use of information technology in transportation.
A lot of ground can be gained in this area, and Intelligent Transportation Systems
(ITS) gained interest of several governments and commercial companies [Ten-T
expert group on ITS, 2002, White Paper, 2001, EPA98, 1998]. ITS research
includes in-car safety systems, simulating effects of infrastructural changes, route
planning, optimization of transport, and smart infrastructures. Its main goals
are: improving safety, minimizing travel time, and increasing the capacity of
infrastructures. Such improvements are beneficial to health, economy, and the
environment, and this shows in the allocated budget for ITS.
In
this paper we are mainly interested in the optimization of traffic flow, thus
effectively minimizing average traveling (or waiting) times for cars. A common
tool for analyzing traffic is the traffic simulator. In this section we will
first describe two techniques commonly used to model traffic. We will then
describe how models can be used to obtain real-time traffic information or
predict traffic conditions. Afterwards we describe how information can be communicated
as a means of controlling traffic, and what the effect of this communication on
traffic conditions will be. Finally, we describe research in which all cars are
controlled using computers.
2.1 Modelling Traffic
Traffic
dynamics bare resemblance with, for example, the dynamics of fluids and those
of sand in a pipe. Different approaches to modelling traffic flow can be used
to explain phenomena specific to traffic, like the spontaneous formation of
traffic jams. There are two common approaches for modelling traffic; macroscopic
and microscopic models.
2.1.1 Macroscopic models.
Macroscopic
traffic models are based on gas-kinetic models and use equations relating
traffic density to velocity [Lighthill and Whitham, 1955, Helbing et al.,
2002]. These equations can be extended with terms for build-up and relaxation
of pressure to account for phenomena like stop-and-go traffic and spontaneous
congestions [Helbing et al., 2002, Jin and Zhang, 2003, Broucke and Varaiya,
1996]. Although macroscopic models can be tuned to simulate certain driver
behaviors, they do not offer a direct, flexible, way of modelling and
optimizing them, making them less suited for our research.
2.1.2 Microscopic models.
In
contrast to macroscopic models, microscopic traffic models offer a way of
simulating various driver behaviors. A microscopic model consists of an
infrastructure that is occupied by a set of vehicles. Each vehicle interacts
with its environment according to its own rules. Depending on these rules,
different kinds of behavior emerge when groups of vehicles interact.
·
Cellular Automata.
One
specific way of designing and simulating (simple) driving rules of cars on an
infrastructure, is by using cellular automata (CA). CA use discrete partially connected
cells that can be in a specific state. For example, a road-cell can contain a
car or is empty. Local transition rules determine the dynamics of the system
and even simple rules can lead to chaotic dynamics. Nagel and Schreckenberg
(1992) describe a CA model for traffic simulation. At each discrete time-step,
vehicles increase their speed by a certain amount until they reach their
maximum velocity. In case of a slower moving vehicle ahead, the speed will be
decreased to avoid collision. Some randomness is introduced by adding for each
vehicle a small chance of slowing down. Experiments showed realistic behavior
of this CA model on a single road with emerging behaviors like the formation of
start-stop waves when traffic density increases.
·
Cognitive Multi-Agent Systems.
A
more advanced approach to traffic simulation and optimization is the Cognitive
Multi-Agent System approach (CMAS), in which agents interact and communicate
with each other and the infrastructure. A cognitive agent is an entity that autonomously
tries to reach some goal state using minimal effort. It receives information from
the environment using its sensors, believes certain things about its
environment, and uses these beliefs and inputs to select an action. Because
each agent is a single entity, it can optimize (e.g., by using learning
capabilities) its way of selecting actions. Furthermore, using heterogeneous
multi-agent systems, different agents can have different sensors, goals, behaviors,
and learning capabilities, thus allowing us to experiment with a very wide
range of (microscopic) traffic models. Dia (2002) used a CMAS based on a study
of real drivers to model the drivers’ response to travel information. In a
survey taken at a congested corridor, factors influencing the choice of route
and departure time were studied. The results were used to model a driver
population, where drivers respond to presented travel information differently.
Using this population, the effect of different information systems on the area
where the survey was taken could be simulated. The research seems promising,
though no results were presented.
2.2 Predicting Traffic
The
ability to predict traffic conditions is important for optimal control. For
example, if we would know that some road will become congested after some time
under current conditions, this information could be transmitted to road users
that can circumvent this road, thereby allowing the whole system to relieve
from congestion. Furthermore, if we can accurately predict the consequences of
different driving strategies, an optimal (or at least optimal for the predicted
interval) decision can be made by comparing the predicted results.
The
simplest form of traffic prediction at a junction is by measuring traffic over
a certain time, and assuming that conditions will be the same for the next period.
One approach to predicting is presented in [Ledoux, 1996], where neural
networks are used to perform long-term prediction of the queue length at a
traffic light. A multi-layer perceptron [Rumelhart et al., 1986] is trained to
predict the queue length for the next time-step, and long-term predictions can
be made by iterating the one-step predictor. The resulting network is quite
accurate when predicting ten steps ahead, but has not yet been integrated into
a controller.
A
traffic prediction model that has been applied to a real-life situation, is
described in [Wahle and Schreckenberg, 2001]. The model is a multi-agent system
(MAS) where driving agents occupy a simulated infrastructure similar to a real
one. Each agent has two layers of control; one for the (simple) driving
decision, and one for tactical decisions like route choice. The real world
situation was modelled by using detection devices already installed. From these
devices, information about the number of cars entering and leaving a stretch of
road are obtained. Using this information, the number of vehicles that take a
certain turn at each junction can be inferred. By instantiating this
information in a faster than real-time simulator, predictions on actual traffic
can be made. A system installed in Duisburg uses information from the existing
traffic control center and produces real-time information on the Internet.
Another system was installed on the freeway system of North Rhine-Westphalia, using
data from about 2.500 inductive loops to predict traffic on 6000 km of roads.
2.3 Controlling traffic by
communicating traffic conditions
Once
accurate (predictive) information is available, there are several ways of
communicating it to road users. Drivers could be presented with information
through dynamic road signs, radio, or even on-board navigation systems. Several
studies have shown the effects of the availability of relevant information. Levinson
(2003) uses a micro-economic model to consider the cost of a trip, and
increases system reliability, since congestions can be better avoided. This
results in a supply curve shift. Experiments show that increasing the
percentage of informed drivers reduces the average travel time for both
informed and uninformed drivers. The travel time reduction is largest in crowded
traffic. In the case of unexpected congestions (for example due to accidents)
informed travellers reduce their travel time by switching routes, but as a
result of this the alternative routes become more crowded, possibly increasing
travel times for uninformed drivers.
Emmerink
et al. (1996) present the results of a survey taken amongst drivers in
Amsterdam. The results show that 70% of the drivers sometimes use information
presented through radio or variable message signs to adapt their routes. Both
media are used in similar ways, and commuters are less likely to be influenced
by information than people with other trip purposes. Business drivers indicated
that they would be willing to pay for in-vehicle information. Simulation could
be used to test control strategies before they are implemented in real-life environments.
Ben-Akiva et al. (2003) tested different strategies for a large highway
project, such as highway access control, drivers’ route choice, and lane
control. The simulator offered a way of testing different configurations of
vehicle-detectors, and showed that interacting control systems could actually
worsen traffic conditions. Integration of strategies requires careful analysis
of the impact of each component to avoid interference. A simulator shows to be
a useful tool for such an analysis.
2.4 Vehicle Control
It
is a well-known fact that traffic flow would increase drastically if all
drivers would drive at the same (maximum) speed. Another fact is that this will
never happen if you let drivers decide. In this section we first show how
vehicles could learn to cooperate. We then describe an ambitious research
program that aims to control all vehicles by on-board computers. Moriarty and
Langley (1998) have used reinforcement learning for distributed traffic
control. Their approach enabled cars to learn lane selection strategies from
experience with a traffic simulator. Experimental studies showed that learned
strategies let drivers more closely match their desired speeds than
hand-crafted controllers and reduce the number of lane changes. Their approach,
like ours, focuses on distributed car-based controllers, which makes it easy to
take specific desires/goals of drivers into account such as desired speed or destination.
In
the California Partners for Advanced Transit and Highways (PATH) program, the
Automated Highway System (PATH-AHS) project aims to completely automate traffic
[Horowitz and Varaiya, 2000]. Cars on special roads would travel in so-called
platoons. A platoon is a number of cars that travel at high speed, with little
distance in between. Each car controls its own speed and lateral movement, and
makes sure it follows the leader. The leader navigates the platoon, and makes
sure that there is enough space between platoons. In order to optimize flow, a
platoon leader receives information about the optimal speed from a roadside coordinating
system. Because of this, and the fact that there is little distance in between cars
in a platoon, an AHS is said to be able to increase road capacity by a factor
of about four. Another aspect of traffic control is controlling traffic lights
in a way that minimizes the time drivers have to wait. We will describe
previous research in this area and our car-based, multi-agent reinforcement
learning algorithm in section 4. First we will discuss reinforcement learning. learning
(RL) agent is able to learn a policy (or plan) that optimizes the cumulative
reward.
3. Traffic Light Control
Traffic
light optimization is a complex problem. Even for single junctions there might
be no obvious optimal solution. With multiple junctions, the problem becomes
even more complex, as the state of one light influences the flow of traffic
towards many other lights. Another complication is the fact that flow of
traffic constantly changes, depending on the time of day, the day of the week,
and the time of year. Roadwork and accidents further influence complexity and
performance.
In
practice most traffic lights are controlled by fixed-cycle controllers. A cycle
of configurations is defined in which all traffic gets a green light at some
point. The split time determines for how long the lights should stay in each
state. Busy roads can get preference by adjusting the split time. The cycle
time is the duration of a complete cycle. In crowded traffic, longer cycles
lead to better performance. The offset of a cycle defines the starting time of
a cycle relative to other traffic lights. Offset can be adjusted to let several
lights cooperate, and for example create green waves. Fixed controllers have to
be adapted to the specific situation to perform well. Often a table of
time-specific settings is used to enable a light to adapt to recurring events
like rush hour traffic. Setting the control parameters for fixed controllers is
a lot of work, and controllers have to be updated regularly due to changes in
traffic situation. Unique events cannot be handled well, since they require a
lot of manual changes to the system. Fixed controllers could respond to
arriving traffic by starting a cycle only when traffic is present, but such
vehicle actuated controllers still require lots of fine-tuning. Most research
in traffic light control focuses on adapting the duration or the order of the
control cycle. In our approach we do not use cycles, but let the decision
depend on the actual traffic situation around a junction, which can lead to
much more accurate control. Of course, our approach requests that information
about the actual traffic situation can be obtained by using different sensors
or communication systems. We will first describe related work on intelligent
traffic light control, and then describe our car-based reinforcement learning algorithm.
3.1 Related Work in Intelligent
Traffic Light Control
3.1.1 Expert Systems.
An
expert system uses a set of given rules to decide upon the next action. In
traffic light control, such an action can change some of the control
parameters. Findler and Stapp (1992) describe a network of roads connected by
traffic light-based expert systems. The expert systems can communicate to allow
for synchronization. Performance on the network depends on the rules that are
used. For each traffic light controller, the set of rules can be optimized by
analyzing how often each rule fires, and the success it has. The system could
even learn new rules. Findler and Stapp showed that their system could improve
performance, but they had to make some simplifying assumptions to avoid too
much computation.
3.1.2 Prediction-based optimization.
Tavladakis
and Voulgaris (1999) describe a traffic light controller using a simple
predictor. Measurements taken during the current cycle are used to test several
possible settings for the next cycle, and the setting resulting in the least
amount of queued vehicles is executed. The system seems highly adaptive, and
maybe even too much so. Since it only uses data of one cycle, it could not
handle strong fluctuations in traffic flow well. In this case, the system would
adapt too quickly, resulting in poor performance.
Liu
et al. (2002) introduce a way to overcome problems with fluctuations. Traffic
detectors at both sides of a junction and vehicle identification are used to
measure delay of vehicles at a junction. This is projected to an estimated
average delay time using a filter function to smooth out random fluctuations.
The control system tries to minimize not only the total delay, but the summed
deviations from the average delay as well. Since it is no longer beneficial to
let a vehicle wait for a long time, even if letting it pass would increase the
total waiting time, this introduces a kind of fairness. Data of about 15
minutes is used to determine the optimal settings for the next cycle, and even
using a simple optimization algorithm, the
system
performs well compared to preset and actuated controllers.
3.1.3
Evolutionary Algorithms.
Taale
et al. (1998) compare using evolutionary algorithms (a (μ, _) evolution strategy [Rechenberg,1989]) to evolve a traffic
light controller for a single simulated intersection to using the common
traffic light controller in the Netherlands (the RWS C-controller). They found
comparable results for both systems. Unfortunately they did not try their
system on multiple coupled intersections, since dynamics of such networks of
traffic nodes are much more complex and learning or creating controllers for
them could show additional interesting behaviors and research questions.
3.2
Our Approach
We
are interested in the behavior of cars and traffic lights. Each cycle, cars can
wait 1 time step at a traffic light, they can drive to the next position of a
road-lane, or they can cross an intersection and go to another road-lane. We
define a (waiting) queue as all cars that are immediately affected by the
setting of their traffic light, because they will have to brake for a red
light. Note that a car standing on the first place of a road-lane is always in
the queue. If a car is the first in the queue and the light is green, it
crosses the intersection and starts on the last place (cell) of the new
road-lane for the next traffic light. After one or more time-steps, it will
finally end up at some place in a new queue at the next traffic light or it
reaches its destination (and exits the infrastructure). The goal is to minimize
the cumulative waiting time of all cars before all traffic lights met before
exiting the city. There are two ways of approaching this problem:
• Traffic-light based controllers.
We
can make a controller for each traffic node, taking into account environmental
inputs such as the number of cars waiting at each of the 8 directions and
learning a value function mapping environmental states and traffic node
decisions to the overall waiting time until all cars standing at the
intersection have exited the city.
• Car-based controllers. We can make a predictor for each car to
estimate the waiting time of the car alone when the light is green or red and
combine all car predictions to make the decision of the traffic light. Making a
control policy and learning a value function for each traffic node (as done by Thorpe)
has the disadvantage that the number of situations for a traffic node can be
huge.
Furthermore,
it is difficult to compute total trip waiting times for all road users standing
at the traffic node, since this quantity has a huge variance. Although we could
cope with these issues by designing a special traffic-node representation
summing individual waiting times of cars, this system does not allow for
communicating information to cars or for making decisions for cars (e.g. which
paths they should take).
3.2.1 Car-based traffic light control.
Therefore,
we chose the second possibility that allows for a lot of flexibility, since we
have the option to model each car’s destination, position, and possibly speed.
This is also a natural system, since if cars would exactly know their overall
waiting time until their destination (note that this would require static
traffic patterns) when the light is green or red, a voting system that adds all
waiting times for different traffic node decisions can be used to minimize the
overall waiting time. Furthermore, the number of states for a car is not so
large. For example if we use the information that the car is at a traffic node,
occupies some place, is at some direction, and has some destination, this makes
a feasible number of car-states which can be stored in lookup tables. Note that
we take the destination address of cars into account. This allows us to compute
the global waiting time until the car has reached its destination.
Finally,
by computing different expected waiting times for a car when it is standing at
one of the different directions (traffic lights) before a traffic node, we can
also use this information to choose a path to the destination address for each
car. This makes co-adaptivity (co-learning) possible in which both the traffic
nodes and the driving policies of cars are optimized.
3.2.2 Car-based value functions.
To
optimize the settings of traffic lights, we can simply sum all expected waiting
times of all given all possible choices of a traffic light, and select the
decision which minimizes the overall waiting time at the intersection. For
this, we need to learn a value function which estimates for each car how long
its waiting time will be until it has reached its destination
address
given that the light is green or red.
3.2.3 Global design of the system.
Each
car is at a specific traffic-node (node), a direction at that node (dir)1, a position in the queue (place), and has a particular destination
address (des). We are interested in estimating the
total waiting time for all traffic lights for each car until it arrives at the destination
address given its current node, direction, place, and the decision of the light
(red or green). We write Q([node, dir, place, destination], action) to
denote this value, which is sometimes shortly denoted as Q([n,
d, p, des], L). We write V ([node, dir, place, destination]) to denote the average waiting time
(without knowing the traffic light decision) for a car at (node, dir, place) until it has reached its destination
address. Given the current traffic situation, we can make a choice for each
traffic node node
as follows. All cars standing at those
directions of the traffic node which are made green by the decision by
measuring and communicating the distance to the traffic light, or by reliable
inductive loops in the roads). Furthermore, the algorithm uses the destination
address of cars which should be communicated to the traffic light.
3.2.4 Some notes on the behavior of
the controllers.
If
we analyze the learning equations and the voting mechanism, we can observe how
the behavior of the adaptive traffic light controller will be:
• If many cars are waiting at a traffic light, they count more
in the overall decision process than if only a single car has to wait.
Therefore long queues are usually passed first.
• If the first car has a large advantage of exiting
immediately, instead of entering a new long queue, this car may get a stronger
vote since we expect its gain to be larger.
• If cars at some light have to wait a long time, the
probability of waiting gets large. Note that if this probability goes to 1, the
overall computed expected waiting time for the red light goes to infinity2. Since large waiting probabilities mean large expected waiting times
and thus large votes against having to wait, the individual waiting times of
cars are likely to be spread across the traffic lights of a traffic node. In
this perspective, the traffic light controllers are very fair.
• The traffic light controllers are adapting themselves all
the time. Therefore they can handle changing traffic flow patterns, although it
may cost some time for the algorithm to adjust the parameters.
3.2.5 Adapting the system parameters.
For
adapting the system, we only need to compute the state transition
probabilities, since the reward function is fixed (standing still costs 1,
otherwise the reward/cost is 0). First of all we compute P(L|[node,
dir, place, des]),
the probability that a light is red or green for a car with some destination at
a specific place. For computing this probability we update counters after each
simulation step: C([node, dir, place, des], L) counts the number of times the light was
red or green when a car was standing in the queue at the position determined by
place. C([node, dir, place, des]) denotes the number of cars with some
destination which have been standing in the queue at place. From this, we can compute the
probability:
P(L|[node, dir, place, des]) = C([node, dir, place, des], L) C([node, dir, place, des])
Furthermore,
we have to compute the state transition probabilities for the first (or other cars).
In the general version this is: P([node, dir, place, des], L, [n, d, p]). For cars which are not able to
cross an intersection, this is quite trivial. There are only few possible
statetransitions; a next place or the current place. State transitions are
stored in variables:
C([node, dir, place, des],
L, [n, d, p]).
If the car has made a transition to the next traffic light, it arrives at a new
place at the next road-lane, and we also update the counter variables. To
compute the requested probability, we normalize these counters by dividing each
counter value by the total number of times a car was standing at a particular
road-cell (with the specific destination of the car):
P([node, dir, place, des],
L, [n, d, p]) =
C([node,
dir, place, des], L, [n, d, p]) C([node,
dir, place, des], L)
3.3 Discussion
As
mentioned before, most traffic light controllers are fixed-cycle controllers,
in which all alternative traffic lights settings get a particular time-interval
for being green. In our approach, we use the actual situation to set the
traffic lights, and therefore we have much more possibilities for optimizing
the traffic light settings. Furthermore, our algorithm learns automatically, and
does not require any manual tuning. This also leads to an adaptive system which
can cope with changing traffic conditions. Since we use car-based value
functions instead of traffic-light based value functions as used by Thorpe
(1997), we do not need to use an approximation technique such as neural
networks for learning the value function. Instead by using voting, we are able
to get very accurate predictions of the consequences of different traffic light
settings. Furthermore, our system is very fair; it never lets one vehicle wait
all the time, not even if competing road-lanes are much more crowded. The
fairness is a result of the learning and voting mechanism, and we did not need
to design this ad-hoc. Finally, we can use the value functions for optimizing
driving policies, where cars drive to minimize their waiting times, and traffic
light controllers set the lights to minimize the waiting times as well.
This
co-learning system exploits the learned value functions and leads to very
interesting co-operative multi-agent system dynamics. We have not yet included
green waves in our system, although we have included a simple technique for
dealing with congested traffic. The bucket algorithm, which we use, propagates gains
of one traffic light to the next lane, if the first car on a lane cannot drive
onwards, because the next lane is full. This will lead to higher gain values
for setting green lights to road-lanes that are full, and for which other cars
have to wait if their light is set to green. This bucket technique is used in
our current experiments, but needs more research to work fruitfully with our
adaptive model-based RL algorithm.
Figure
1: A screenshot from the Green Light District simulator.
4. Green Light District
We
used the Green Light District (GLD)3 traffic
simulator for our experiments. GLD consists of an editor to define
infrastructures (based on cellular automata), a Multi-Agent System (MAS) to run
the simulation, and a set of controllers for the agents. The simulator has several
statistical functions to measure performance. In the next section we will
describe the simulator.
4.1 The Simulator
4.1.1 Infrastructures.
An
infrastructure consists of roads and nodes. A road connects two nodes, and can
have several lanes in each direction (see Figure 1). The length of each road is
expressed in units. A node is either a junction where traffic lights are
operational (although when it connects only two roads, no traffic lights are
used), or an edge-node.
4.1.2 Agents.
There
are two types of agents that occupy an infrastructure; vehicles and traffic
lights. All agents act autonomously, following some simple rules, and get
updated every time-step. Vehicles enter the network at the edge-nodes. Each
edge-node has a certain probability of generating a vehicle at each time step.
Each vehicle that is generated is assigned a destination, which is one of the
other edge-nodes. The distribution of destinations for each edge-node can be
adjusted.
There
are several types of vehicles, defined by their speed, length, and number of
passengers. For our experiments we only used cars, which move at a speed of two
units (or one or 3GLD is free, open-source software, and
can be downloaded from {http : //sourceforge.net/projects/stoplicht/} zero if they have to brake) per time step, have a length of
two units, and have two passengers.
The
state of each vehicle is updated every time step. It either moves with the
distance given by its speed, or stops when there is another vehicle or a red
traffic light ahead. At a junction, a car decides to which lane it should go
next according to its driving policy. Once a car has entered a lane, it cannot
switch lanes.
Junctions
can be occupied by traffic lights. For each junction, there are a number of possible
ways of switching the lights that are safe. At each time-step, the traffic
light controller decides which of these is the best. It can use information on
waiting vehicles and their destination, and about other traffic lights to make
this decision.
4.2 Driving Policies
At a
junction, a car has to decide which lane to go to next in order to reach its
destination. The way this decision is made can be changed by using different
driving policies.
4.2.1 Shortest Path.
For
all junctions, there is a list of shortest paths to every destination. All
paths that are no more than ten percent longer than the shortest path are in
this list. When a decision has to be made, one of these shortest paths is
selected randomly.
4.2.2 Co-learning.
The
“theory” of co-learning was already explained in section 4.2. In GLD,
Co-learning is implemented as follows. When a driving decision has to be made,
all the shortest paths to the destination are collected. The shortest path with
the lowest co-learn value is selected for the car intending to cross the
intersection. The selected path should be the path to the destination with
minimal expected waiting time.
E
5. Conclusions
In
this article we first showed that traffic control is an important research
area, and its benefits make investments worthwhile. We described how traffic
can be modelled, and showed the practical use of some models. In section 3 we
explained reinforcement learning, and showed its use as an optimization
algorithm for various control problems. We then described the problem of
traffic light control and several intelligent traffic light controllers, before
showing how car-based reinforcement learning can be used for the traffic light
control problem. In our approach we let cars estimate their gain of setting
their lights to green and let all cars vote to generate the traffic light
decision. Co-learning is a special feature of our car-based reinforcement
learning algorithm that allows drivers to choose the shortest route with lowest
expected waiting time.
We
performed three series of experiments, using the Green Light District traffic
simulator. We described how this simulator works, and which traffic light
controllers were tested. The experiments were performed on three different
infrastructures. The first experiment, which uses a large grid, shows that
reinforcement learning is efficient in controlling traffic, and that the use of
co-learning further improves performance. The second experiment shows that using
co-learning vehicles avoid crowded intersections.
This way, vehicles avoid having to wait, and
actively decrease pressure on crowded intersections. The third experiment shows
that RL algorithms on more complex and city-like infrastructure again
outperform the fixed controllers by reducing waiting time with more than 25%.
The third experiment also shows that in some situations a simplified version of
the reinforcement learning algorithm performs as well as the complete version,
and that co-learning not always increases performance.
5.1 Further Research
Although
the reinforcement learning algorithm presented here outperforms a number of
fixed algorithms, there are several improvements that could be researched. For
example, we can use communication between road-lanes to make green waves
possible, and let estimated waiting times depend on the amount of traffic on
the next road-lane.
The
co-learning driving policy might be improved as well. The current
implementation suffers from saturation and oscillation. Because all drivers on
a route choose the optimal lane, this lane might become crowded. Only when the
performance of such a lane decreases because of this crowding, drivers will
choose another lane. A less greedy form of co-learning might prevent this
effect. Although the bucket algorithm works well for the fixed algorithms, it
did not work well together with the RL algorithms. We have to study this more
carefully, since the bucket algorithm, when designed well, may help in creating
green waves in very crowded traffic conditions.
The
simulator might be refined as well, to allow for comparison with other
research. Refinements could include more complex dynamics for the vehicles and
other road users, aswell as an implementation of fixed-cycle traffic-light
controllers.