graph/doc/kevin_bacon.html
CromwellEnage bf217327df Documentation cleanup
Replace links to Hewlett Packard's retired Silicon Graphics STL website.
2019-01-21 16:09:17 -05:00

318 lines
10 KiB
HTML

<HTML>
<!--
Copyright (c) Jeremy Siek 2000
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt)
-->
<Head>
<Title>Kevin Bacon Example</Title>
<BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b"
ALINK="#ff0000">
<IMG SRC="../../../boost.png"
ALT="C++ Boost" width="277" height="86">
<BR Clear>
<H1>Six Degrees of Kevin Bacon</H1>
<P>
A fun application of graph theory comes up in the popular game ``Six
Degrees of Kevin Bacon''. The idea of the game is to connect an actor
to Kevin Bacon through a trail of actors who appeared together in
movies, and do so in less than six steps. For example, Theodore
Hesburgh (President Emeritus of the University of Notre Dame) was in
the movie ``Rudy'' with the actor Gerry Becker, who was in the movie
``Sleepers'' with Kevin Bacon. Why Kevin Bacon? For some reason the
three students who invented the game, Mike Ginelli, Craig Fass, and
Brian Turtle decided that Kevin Bacon is the center of the
entertainment world. The Kevin Bacon game is quite similar to the <a
href="http://www.oakland.edu/~grossman/erdoshp.html">Erd&ouml;s
Number</a> which has ``been a part of the folklore of
mathematicians throughout the world for many years''.
<P>
The ``Six Degrees of Kevin Bacon'' game is really a graph problem. If
you assign each actor to a vertex, and add an edge between two actors
if they have appeared together in a movie, then you have a graph that
represents the data for this game. Then the problem of finding a trail
of actors to Kevin Bacon becomes a traditional graph problem: that of
finding a <I>path</I> between two vertices. Since we wish to find a
path that is shorter than six steps, ideally we would find the
<I>shortest path</I> between the vertices. One might think to apply
Dijkstra's shortest path algorithm to this problem, but that would be
overkill since Dijkstra's algorithm is meant for situations when each
edge has an associated length (or weight) and the goal is to find the
path with the shortest cumulative length. Since we are only concerned
with finding the shortest paths in terms of the number of edges the
breadth-first search algorithm will solve the problem (and use less
time than Dijkstra's).
<P>
<H2>Input File and Graph Setup</H2>
<P>
For this example we will use a much smaller graph than all movies and
all actors. The full source code for this example is in <a
href="../example/kevin-bacon.cpp"><TT>examples/kevin-bacon.cpp</TT></a>.
We have supplied a file <TT>kevin-bacon.dat</TT> which contains a list
of actor pairs who appeared in the same movie. Each line of the file
contains an actor's name, a movie, and another actor that appeared in
the movie. The ``;'' character is used as a separator. Here is an
excerpt.
<P>
<PRE>
Patrick Stewart;Prince of Egypt, The (1998);Steve Martin
</PRE>
<P>
Our first task will be to read in the file and create a graph from
it. We use a <TT>std::ifstream</TT> to input the file.
<P>
<PRE>
std::ifstream datafile("./kevin-bacon.dat");
if (!datafile) {
std::cerr &lt;&lt; "No ./kevin-bacon.dat file" &lt;&lt; std::endl;
return EXIT_FAILURE;
}
</PRE>
<P>
We will want to attach the actor's names to the vertices in the graph,
and the movie names to the edges. We use the <TT>property</TT> class to
specify the addition of these vertex and edge properties to the graph.
We choose to use an undirected graph, because the relationship of
actors appearing together in a movie is symmetric.
<P>
<PRE>
using namespace boost;
typedef adjacency_list&lt;vecS, vecS, undirectedS,
property&lt;vertex_name_t, string&gt;,
property&lt;edge_name_t, string &gt; &gt;
&gt; Graph;
</PRE>
<P>
To access the properties, we will need to obtain property map
objects from the graph. The following code shows how this is done.
<P>
<PRE>
property_map&lt;Graph, vertex_name_t&gt;::type actor_name = get(vertex_name, g);
property_map&lt;Graph, edge_name_t&gt;::type connecting_movie = get(edge_name, g);
</PRE>
<P>
Next we can start to loop through the file, parsing each line into a
sequence of tokens using the <a href="../../tokenizer/index.html">Boost
Tokenizer Library</a>.
<P>
<PRE>
for (std::string line; std::getline(datafile, line);) {
char_delimiters_separator&lt;char&gt; sep(false, "", ";");
tokenizer&lt;&gt; line_toks(line, sep);
...
}
</PRE>
<P>
Each line of the input corresponds to an edge in the graph, with the
two actors as the vertices incident to the edge, and the movie name
will be a property attached to the edge. One challenge in creating the
graph from this format of input is that the edges are specified by a
pair of actor names. As we add vertices to the graph, we'll need to
create a map from actor names to their vertices so that later
appearances of the same actor (on a different edge) can be linked with
the correct vertex in the graph. The <a
href="http://www.boost.org/sgi/stl/Map.html"><TT>std::map</TT></a>
can be used to implement this mapping.
<P>
<PRE>
typedef graph_traits&lt;Graph&gt;::vertex_descriptor Vertex;
typedef std::map&lt;string, Vertex&gt; NameVertexMap;
NameVertexMap actors;
</PRE>
<P>
The first token will be an actor's name. If the actor is not already
in our actor's map we will add a vertex to the graph, set the name
property of the vertex, and record the vertex descriptor in the map.
If the actor is already in the graph, we will retrieve the vertex
descriptor from the map's <TT>pos</TT> iterator.
<P>
<PRE>
NameVertexMap::iterator pos;
bool inserted;
Vertex u, v;
tokenizer&lt;&gt;::iterator i = line_toks.begin();
std::string actors_name = *i++;
boost::tie(pos, inserted) = actors.insert(std::make_pair(actors_name, Vertex()));
if (inserted) {
u = add_vertex(g);
actor_name[u] = actors_name;
pos-&gt;second = u;
} else
u = pos-&gt;second;
</PRE>
<P>
The second token is the name of the movie, and the third token is the
other actor. We use the same technique as above to insert the second
actor into the graph.
<P>
<PRE>
std::string movie_name = *i++;
boost::tie(pos, inserted) = actors.insert(std::make_pair(*i, Vertex()));
if (inserted) {
v = add_vertex(g);
actor_name[v] = *i;
pos-&gt;second = v;
} else
v = pos-&gt;second;
</PRE>
<P>
The final step is to add an edge connecting the two actors, and record
the name of the connecting movie.
<P>
<PRE>
graph_traits&lt;Graph&gt;::edge_descriptor e;
boost::tie(e, inserted) = add_edge(u, v, g);
if (inserted)
connecting_movie[e] = movie_name;
</PRE>
<P>
<H2>A Custom Visitor Class</H2>
<P>
We create a custom visitor class to record the Kevin Bacon
numbers. The Kevin Bacon number will be the shortest distances from
each actor to Kevin Bacon. This distance has also been referred to as
the ``Bacon Number'' in memory of the ``Erdos Number'' used to rank
mathematicians according to how many publications separate them from
Paul Erdos. The shortest distance to from Kevin Bacon to each actor
will follow the breadth-first tree. The BFS algorithm identifies edges
that belong to the breadth-first tree and calls our visitor's
<tt>tree_edge</tt> function. We record the distance to the target
vertex as one plus the distance to the source vertex.
<PRE>
template &lt;typename DistanceMap&gt;
class bacon_number_recorder : public default_bfs_visitor
{
public:
bacon_number_recorder(DistanceMap dist) : d(dist) { }
template &lt;typename Edge, typename Graph&gt;
void tree_edge(Edge e, const Graph& g) const
{
typename graph_traits&lt;Graph&gt;::vertex_descriptor
u = source(e, g), v = target(e, g);
d[v] = d[u] + 1;
}
private:
DistanceMap d;
};
// Convenience function
template &lt;typename DistanceMap&gt;
bacon_number_recorder&lt;DistanceMap&gt;
record_bacon_number(DistanceMap d)
{
return bacon_number_recorder&lt;DistanceMap&gt;(d);
}
</PRE>
<P>
We will use a vector to store the bacon numbers.
<P>
<PRE>
std::vector&lt;int&gt; bacon_number(num_vertices(g));
</PRE>
<H2>Apply the Breadth-First Search</H2>
<P>
The BFS algorithm requires a source vertex as input; of course this
will be Kevin Bacon. We now call the BGL BFS algorithm, passing in the
graph, source vertex, and the visitor.
<P>
<PRE>
Vertex src = actors["Kevin Bacon"];
bacon_number[src] = 0;
breadth_first_search(g, src, visitor(record_bacon_number(&amp;bacon_number[0])));
</PRE>
<P>
We can output the Bacon number for each actor simply by looping
through all the vertices in the graph and use them to index into the
<TT>bacon_number</TT> vector.
<P>
<PRE>
graph_traits&lt;Graph&gt;::vertex_iterator i, end;
for (boost::tie(i, end) = vertices(g); i != end; ++i)
std::cout &lt;&lt; actor_name[*i] &lt;&lt; "'s bacon number is "
&lt;&lt; bacon_number[*i] &lt;&lt; std::endl;
</PRE>
<P>
Note that vertex descriptor objects can not always be used as indices
into vectors or arrays such as <TT>bacon_number</TT>. This is valid
with the <TT>adjacency_list</TT> class with <TT>VertexList=vecS</TT>,
but not with other variations of <TT>adjacency_list</TT>. A more
generic way to index based on vertices is to use the ID property
map (<TT>vertex_index_t</TT>) in coordination with the <A
HREF="../../property_map/doc/iterator_property_map.html"><TT>iterator_property_map</TT></a>.
<P>
Here are some excepts from the output of the program.
<PRE>
William Shatner's bacon number is 2
Denise Richards's bacon number is 1
Kevin Bacon's bacon number is 0
Patrick Stewart's bacon number is 2
Steve Martin's bacon number is 1
...
</PRE>
<br>
<HR>
<TABLE>
<TR valign=top>
<TD nowrap>Copyright &copy; 2000-2001</TD><TD>
<A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>)
</TD></TR></TABLE>
</BODY>
</HTML>
<!-- LocalWords: gif Hesburgh Ginelli Fass Erd vertices cerr namespace vecS
-->
<!-- LocalWords: undirectedS Tokenizer sep tokenizer toks NameVertexMap pos
-->
<!-- LocalWords: bool Erdos BFS typename DistanceMap bfs const num BGL src
-->
<!-- LocalWords: indices Shatner's Richards's siek htm
-->