BookRiff

If you don’t like to read, you haven’t found the right book

What is Google Pregel?

Pregel (a portmanteu of the words Parallel, Graph, and Google) is a data flow paradigm and system for large-scale graph processing created at Google to solve problems that are hard or expensive to solve using only the MapReduce framework.

What is Pregel used for?

Pregel is the system at Google that powers PageRank, which makes it a very interesting system to study. It is also the inspiration for Apache Giraph, which Facebook use to analyze their social graph. There are single machine-sized graph processing problems, and then there are distributed graph processing problems.

What is a Superstep?

A superstep consists of a unit of generic programming, which through a global communication component, makes thousands of parallel processing on a mass of data and sends it to a “meeting” called synchronization barrier. At this point, the data are grouped, and passed on to the next superstep chain.

What is Pregel API?

Pregel is a vertex-centric computation model to define your own algorithms via a user-defined compute function. Within that function, a node can receive messages from other nodes, typically its neighbors. Based on the received messages and its currently stored value, a node can compute a new value.

Is Pregel open source?

Pregel+ is not just another open-source Pregel implementation, but a substantially improved distributed graph computing system with effective message reduction. Compared with existing Pregel-like systems, Pregel+ provides simpler programming interface and yet achieves higher computational efficiency.

What is GraphX spark?

GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

What is Spark Pregel?

Simple Pregel in Spark. Separate RDDs for immutable graph state and. for vertex states and messages at each iteration. Use groupByKey to perform each step. Cache the resulting vertex and message RDDs.

What is TinkerPop?

Apache TinkerPop™ is an open source, vendor-agnostic, graph computing framework distributed under the commercial friendly Apache2 license. When a data system is TinkerPop-enabled, its users are able to model their domain as a graph and analyze that graph using the Gremlin graph traversal language.