< WebGraph Adapters | JGraphT
JGraphT Logo

WebGraph Adapters

WebGraph is a framework for storing and accessing graphs (and in particular web graphs) in a compressed form, making it possible to load and access very large graphs with a moderate amount of memory. You can download ready-made graphs in this form from the LAW web site, or compress your own graphs using the instructions provided in the package overview.

For example, the memory footprint of a snapshot of web sites from Indochina in 2004 with 200 million edges would be a few gigabytes in a trivial representation, it is 260MB in JGraphT’s sparse representation, but it is just 59MB in WebGraph.

The adapters in the package org.jgrapht.webgraph make it possible to use graphs in WebGraph format in JGraphT.

The typical use case for these adapters is:

Such metadata can be easily stored in an array indexed by the vertices, or possibly by a fastutil big array if you have more than 231 vertices (lists and big lists are another option).

If you need to associate metadata with the arcs, and manage the graph in a compact format, a succinct representation from the package org.jgrapht.sux4j might be more appropriate, as those representation associate with each edge an integer in a contiguous range starting from zero. A separate guide is available for succinct graph adapters.

WebGraph has two versions: the standard version manages graph with at most 231 vertices, whereas the big version manages graphs with at most 263 vertices. For each version, there is a directed adapter and an undirected adapter. The Javadoc documentation of ImmutableDirectedGraphAdapter and ImmutableUndirectedGraphAdapter contain examples of how to load graphs in webgraph and use them to build an adapter. The big adapters work in the same way.

Note that WebGraph has two main representations: BVGraph uses compression techniques that work well with web graphs; EFGraph uses succinct techniques, which might be more useful with less repetitive graphs such as social graphs. In particular, EFGraph implements a fast adjacency test. You should choose the representation that better suits your data and access primitives.