When constructing a JGraphT graph, it’s important to select the vertex and edge types carefully in order to ensure correct behavior while satisfying application requirements. This page walks through a number of variations based on common application use cases:
Vertex and edge objects are used as keys inside of the default graph implementation, so when choosing their types, you must follow these rules:
java.lang.Object
for both equals and hashCode.equals
or hashCode
, you must override them bothhashCode
must produce a value which does not change over the lifetime of the objectThis article explains some of the nuances.
Additional guidelines are provided in the scenario-specific sections below.
Applications interested only in graph structure (e.g. graph theory
research) may want to save memory by keeping vertices as minimalist as
possible. In this case, just use
java.lang.Object
as the vertex type. In this case, the vertex hashCode
may serve as
a “nearly” unique identifier during debugging.
More common is for each vertex to represent a key value, as in the
HelloJGraphT example (where each vertex
corresponds to a website identified by a URL). For this use case, a
String
, Integer
, or similar class is a good choice for vertex
type. In this case, it’s important to note that the value must be
unique within the graph. In other words, for a vertex of type
String
, the String
value is an identifier, not a mere label.
More sophisticated applications may need to associate non-key attributes (possibly multiple) with each vertex. The obvious way to do this is with a class that stores the attributes, but there are a few different cases to consider.
If all attributes on the vertex are non-key, then the approach is straightforward. For example, suppose you are modeling molecular structures, where the vertices are atoms and the edges are the bonds between them. Then you might define a class
class AtomVertex
{
Element element; // using enum of the periodic table
int formalCharge; // for bookkeeping purposes
... other atomic properties ...
}
In this case, you should not override equals
and hashCode
,
since there may be many distinct atoms with the exact same properties.
Conversely, if all attributes are components of a key, then the approach is also simple. Suppose your application is a software package manager, and each vertex in your graph corresponds to a package, with edges representing package dependencies. Then you might define a class like
class SoftwarePackageVertex
{
final String orgName;
final String packageName;
final String packageVersion;
... constructor etc ...
public String toString()
{
return orgName + "-" + packageName + "-" + packageVersion;
}
public int hashCode()
{
return toString().hashCode();
}
public boolean equals(Object o)
{
return (o instanceof SoftwarePackageVertex) && (toString().equals(o.toString()));
}
}
Here, you almost certainly do want to override equals
and
hashCode
, since there should not be more than one vertex object
representing the same package version. And you’ll be able to access
an existing vertex in a graph just by constructing it, without having
to iterate over all vertices in the graph to find it.
Note that the fields are declared final; this is important since vertices and edges are used as hash keys, meaning their hash codes must never change after construction.
Now we come to the problematic case. Continuing the previous example,
suppose you want to add a releaseDate
field to
SoftwarePackageVertex
in order to track when the package version was
released. This new field is not part of the key; it’s just data. But
what do we do about equals
/hashCode
?
releaseDate
into them for a number of reasons. For example, if we want to reference the vertex for a package by its identifier, but we don’t know its release date, how do we find the vertex? And what if the release date changes for an unreleased package? How do we avoid two distinct vertex objects representing the same package version?releaseDate
into equals
/hashCode
, then we could end up with inconsistencies due to two vertex objects with different releaseDate
values being treated as equivalent.So if you try to implement this case, beware that you’re likely to run into unforeseen pitfalls.
A more flexible way to handle the situation above is to make the
vertex refer to an external object rather than containing data itself.
For the example above, the vertex type could be
SoftwarePackageVertex
as originally defined, without the release
date. Then additional information such as the release date would be
stored via a separate SoftwarePackageVersion
class, with a map keyed
on SoftwarePackageVertex
for lookups. This keeps the graph
representation clear, but adds some lookup cost.
An optimization is to implement the vertex as a direct reference:
class SoftwarePackageVertex
{
final SoftwarePackageVersion version;
public String toString()
{
return version.keyString();
}
public int hashCode()
{
return toString().hashCode();
}
public boolean equals(Object o)
{
return (o instanceof SoftwarePackageVertex) && (toString().equals(o.toString()));
}
}
class SoftwarePackageVersion
{
final String orgName;
final String packageName;
final String packageVersion;
Date releaseDate;
public String keyString()
{
return orgName + "-" + packageName + "-" + packageVersion;
}
}
This way, we can construct a vertex from a package version at any time, and given a vertex, we can directly access package version information without any map lookup required. The application is still responsible for avoiding inconsistencies due to the existence of multiple SoftwarePackageVersion objects with the same key, but now that responsibility is separate from the graph representation.
Now let’s move on to edge types. The most common case is that there
is no information associated with an edge other than its connectivity.
Generally, you can use the DefaultEdge
class provided by JGraphT for
this and not think about it any further. However, there is one point
you should be aware of:
DefaultEdge
-based graphs by using an intrusive technique in which the connectivity information is stored inside the edge object itself (rather than inside the graph).As a result, if you need to add the same edge object to two different
graphs, then those graphs must have the same vertex connections for
that edge, otherwise undefined behavior will result. If this (rare)
case applies to your application, then instead of using DefaultEdge
,
just use java.lang.Object
as your edge type. Note that this adds a
map lookup penalty to connectivity accessor methods.
It’s common in JGraphT for edge objects to be reused across graphs; for example, an algorithm may return a subgraph of the input graph as its result, and the subgraph will reuse subsets of the input graph’s vertex and edge sets. In these cases, the connectivity equivalence is valid (or if it’s not, the algorithm avoids reuse).
Another common case is for each edge to bear a double-valued weight as
its only attribute (e.g. a physical network with latency measured for
each link). For this case, JGraphT supplies the DefaultWeightedEdge
class, which extends the optimization mentioned in the previous
section by storing the weight directly on the edge object.
The same caveats apply, with the additional restriction that if a
DefaultWeightedEdge
is reused in another graph, it will have the
same weight in both graphs. Again, if this presents a problem, then
use java.lang.Object
as your edge class instead.
Sometimes, applications may be able to associate a unique key value
with each edge. For example, consider a graph representing money
transfers between bank accounts, where the vertices represent the
accounts and the edges represent the transfers. In this case, the
application could use a String
containing the transfer transaction
ID as the edge type.
DefaultEdge
above.However, it would definitely be incorrect to use the transaction amount as the edge type, since this is not unique across the entire graph. (A weighted edge could instead be used for this purpose.)
For edges with multiple attributes or non-key attributes, the same
considerations as those
given previously for vertices apply. In
addition, when defining a class which will be used as an edge type,
applications will typically want to subclass either DefaultEdge
or
DefaultWeightedEdge
(subject to the caveats already mentioned).
Those base classes do not override equals
/hashCode
, but
applications are free to do so in subclasses as appropriate.
Note that when overriding equals
/hashCode
for an edge class, it is
incorrect to incorporate the edge source/target into the operations;
the edge identity is always independent of its connectivity.
For an example of how to apply a String
attribute as a non-key label
on each edge, see the LabeledEdges demo. JGraphT
does not provide a labeled edge class since there are many different
ways to implement this pattern.
In some cases, an application may want to use existing complex objects
as vertex and/or edge types directly. For example, consider an
application in which the graph is used in a manager thread to optimize
concurrent dataflow, with each vertex representing a worker thread and
each edge representing a dataflow producer/consumer queue. In this
case, it would be OK to use
java.lang.Thread
for the vertex type and
LinkedBlockingDeque
for the edge type (since these classes do no override
equals
/hashCode
).
However, if the queue implementation were such that it allowed two
queue instances to be compared for value-equality via equals
, then this
would not be a good choice for edge type. In this case, it would
be necessary to wrap the queue in a custom edge class which references
it,
similar to what was described for vertices above.
This is one case for which JGraphT does not currently support a 100%
efficient implementation. Suppose we want to represent a
finite state machine
using a pseudograph (allowing both self-loops and multiple edges
between vertices). Vertices will represent states, and edges
will represent transitions. For the vertex type, we might choose
String
, but for the edge type, we can’t use String
since
transition names are not unique across the entire graph; they are only
unique within the subset of edges between a given pair of vertices.
Instead, we can use a labeled edge class as described above. However, suppose we want to find an existing edge given a pair of vertices and a transition name. This requires invoking getAllEdges for the vertex pair and then searching through the result, filtering by transition name. If many transitions are defined, this may become slow.
It would be nice if there were a faster solution for this problem, especially since the graph’s edge set already provides an index into all edges in the graph. There are kludgy ways to accomplish a constant time lookup, but we don’t recommend them, so we won’t go into them here.