PageRank

PageRank (Preparations)


The database you start with must contain all of the data you loaded in the setup for this course.

This is what you will see when you click the database icon database icon.

DataLoaded

If you do not see this in your Neo4j Browser, you will need to perform the setup and Load Data steps.

PageRank (Graph Catalog)


The projected graph roads must be created and stored in the GDS graph catalog.

This is what you will see when you execute the following query:

CALL gds.graph.list()
YIELD graphName, nodeCount, relationshipCount

 

LoadedDatabase

If you do not see this in your Neo4j Browser, you will need to perform the Graph Catalog steps.

PageRank (Overview)


PageRank measures the transitive influence or connectivity of nodes. It can be computed by iteratively distributing one node’s rank (originally based on degree) over its neighbors.

In this exercise, you will execute PageRank on the European Roads dataset:

  • Part 1: Perform the PageRank algorithm.

  • Part 2: Perform the weighted variant of the PageRank algorithm.

  • Part 2: Perform the Personalized PageRank algorithm.

Go to the next page to start this exercise.

Part 1: Perform the PageRank algorithm. (Instructions)


You will start by running the PageRank algorithm on the European Roads network. The PageRank algorithm will determine the most influential Place nodes in the network. Remember that PageRank interprets each connection in the network as a vote of importance or influence. The places with the most connections to other well-connected cities will rank the highest. In the context of the European Roads network, you can use the PageRank algorithm to find the most well-connected places in the system.

Write Cypher code to perform PageRank analysis using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Specify a maximum of 20 iterations.

  • Use the stream mode of the PageRank algorithm.

  • Use the gds.util.asNode() function to fetch the node from the nodeId value and return its name.

  • Order the results by PageRank score descending.

  • Limit it to the top ten results.

Hint: Call gds.pageRank.stream.

Part 1: Perform the PageRank algorithm. (Solution)


Write Cypher code to perform PageRank analysis using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Specify a maximum of 20 iterations.

  • Use the stream mode of the PageRank algorithm.

  • Use the gds.util.asNode() function to fetch the node from the nodeId value and return its name.

  • Order the results by PageRank score descending.

  • Limit it to the top ten results.

Hint: Call gds.pageRank.stream.

Here is the solution code:

CALL gds.pageRank.stream('roads',
    {maxIterations:20})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS place, score
ORDER BY score DESC
LIMIT 10

The results returned will look like this:

EXPR.1

 

Here you see that Berlin, Budapest, and Praha are the most well-connected places in the networks judging by the PageRank centrality score.

Perform the weighted variant of the PageRank algorithm. (Instructions)


Next, you will run the weighted variant of the PageRank algorithm. Just like with the Community Detection algorithms, the PageRank algorithm also deems that a higher relationship weight value represents a stronger connection. Again, you will use the inverse_distance relationship property as the input relationship weight.

Write Cypher code to perform the weighted variant of the PageRank analysis using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the stream mode of the PageRank algorithm.

  • Specify a maximum of 20 iterations.

  • The relationship weight property name is inverse_distance.

  • Use the gds.util.asNode() function to fetch the node from the nodeId value and return its name.

  • Order the results by PageRank score descending.

  • Limit it to the top ten results.

Hint: Call gds.pageRank.stream.

Perform the weighted variant of the PageRank algorithm. (Solution)


Write Cypher code to perform the weighted variant of the PageRank analysis using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the stream mode of the PageRank algorithm.

  • Specify a maximum of 20 iterations.

  • Use the gds.util.asNode() function to fetch the node from the nodeId value and return its name.

  • The relationship weight property name is inverse_distance.

  • Order the results by PageRank score descending.

  • Limit it to the top ten results.

Hint: Call gds.pageRank.stream.

CALL gds.pageRank.stream('roads',{
    maxIterations:20,
    relationshipWeightProperty:'inverse_distance'})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS place, score
ORDER BY score DESC
LIMIT 10

The results returned will look like this:

EXPR.2

 

Using the weighted variant of the PageRank algorithm, you can observe that now the most well-connected place is Basel, which was in 6th place of the unweighted variant. Berlin lost the 1st place but is still going strong in second place. Praha keeps the third place. On the other hand, Budapest has dropped down to 9th place. You can observe how using various configuration settings can influence the PageRank ranking score.

Perform the Personalized PageRank algorithm. (Instructions/Solution)


In the last example, you will use the Personalized PageRank algorithm. Personalized PageRank is a variation of PageRank, which is biased towards a set of source nodes. You need to match the predefined source nodes and add them as an input to the sourceNodes parameter.

Write Cypher code to perform the Personalized PageRank analysis using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the stream mode of the PageRank algorithm.

  • Add the sourceNodes parameter.

  • Specify all the Place nodes with the country code E as the source nodes.

  • Limit it to the top ten results.

Hint: Call gds.pageRank.stream.

MATCH (p:Place)
WHERE p.countryCode = 'E'
WITH collect(p) as source_nodes
CALL gds.pageRank.stream('roads', {
    sourceNodes:source_nodes})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS place, score
ORDER BY score DESC
LIMIT 10
EXPR.3

 

All the top ten ranked places are located in Spain, which makes sense as the Personalized PageRank algorithm is biased towards the cities located in Spain. You can try to use source nodes located in other countries and observe the results.

PageRank: Taking it further


  1. Change the iterations and dampening factor to see how it affects the results.

  2. Change the sourceNodes parameter to see how it affects the results.

  3. Try using the non-stream version of the algorithm.

PageRank (Summary)


PageRank measures the transitive influence or connectivity of nodes. It can be computed by iteratively distributing one node’s rank (originally based on degree) over its neighbors.

In this exercise, you analyzed PageRank for the European Roads dataset.