Local Clustering Coefficient

Local Clustering Coefficient (Preparations)


The database you start with must contain all of the data you loaded in the setup for this course.

This is what you will see when you click the database icon database icon.

DataLoaded

If you do not see this in your Neo4j Browser, you will need to perform the setup and Load Data steps.

Local Clustering Coefficient (Graph Catalog)


The projected graphs roads and dach-region must be created and stored in the GDS graph catalog.

This is what you will see when you execute the following query:

CALL gds.graph.list()
YIELD graphName, nodeCount, relationshipCount
LoadedDatabase

 

If you do not see this in your Neo4j Browser, you will need to perform the Graph Catalog steps.

Local Clustering Coefficient (Overview)


The Local Clustering Coefficient algorithm computes the local clustering coefficient for each node in the graph. The LCC of a node describes the likelihood that the neighbours of the node are also connected. It is used to determine how tightly-knit the network is.

In this exercise, you will gain some experience with writing Cypher to implement the local clustering coefficient algorithm using the European Roads dataset.

  • Part 1: Determine the average local clustering coefficient.

  • Part 2: Determine the local clustering coefficient for each Place node.

  • Part 3: Verify the clustering coefficient results

Go to the next page to start this exercise.

Part 1: Determine the average local clustering coefficient (Instructions)


To get an overall feeling of how tightly-connected the European Roads network is, you will execute the stats mode of the Local Clustering Coefficient algorithm to calculate the average clustering coefficient of the whole network.

Write a query to determine the average local clustering coeffient for the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • YIELD the following result: averageClusteringCoefficient.

Hint: You will call gds.localClusteringCoefficient.stats.

Part 1: Determine the average local clustering coefficient (Solution)


Write a query to determine the average local clustering coeffient for the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • YIELD the following result: averageClusteringCoefficient.

Hint: You will call gds.localClusteringCoefficient.stats.

CALL gds.localClusteringCoefficient.stats('roads')
YIELD averageClusteringCoefficient
EX3.1

 

The average clustering coefficient for the European roads network is fairly low, which indicates that the network is not tighly-knit. In fact, the network is quite sparse as the LCC value has a range between 0 to 1 and the result of 0.014 is very close to zero. In a typical online social network, you can expect much higher average clustering coefficient scores of 0.5 or even higher.

Part 2: Determine the triangle count for each Place node. (Instructions)


To inspect the nodes with the highest LCC value, you will use the stream mode of the LCC algorithm.

Write a query to determine the local clustering coeffient for each Place node in the European road network.

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • YIELD the following results: nodeId, localClusteringCoefficient

  • Use the gds.util.asNode() function to fetch the node from the nodeId value and return its name.

  • Order the results by local clustering coefficient score descending.

  • Limit it to the top ten results.

Hint: You will call gds.localClusteringCoefficient.stream.

Part 2: Determine the triangle count for each Place node. (Solution)


Write a query to determine the local clustering coeffient for each Place node in the European road network.

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • YIELD the following results: nodeId, localClusteringCoefficient

  • Use the gds.util.asNode() function to fetch the node from the nodeId value and return its name.

  • Order the results by local clustering coefficient score descending.

  • Limit it to the top ten results.

Hint: You will call gds.localClusteringCoefficient.stream.

CALL gds.localClusteringCoefficient.stream('roads')
YIELD nodeId, localClusteringCoefficient
RETURN gds.util.asNode(nodeId).name AS place, localClusteringCoefficient
ORDER BY localClusteringCoefficient DESC
LIMIT 10
EXLCC.2.2

 

A node has a maximum value of local clustering coefficient(1), when each of its neighbours has a connection to all the other neighbours. There are actually four places that have all their neighbors connected. Given the results from previous algorithms on the European Roads network, you might hypothesize that they are probably nodes with only 2 neighbours that are connected.

Part 3: Verify the clustering coefficient results. (Instructions)


Validate the hypothesis and inspect the neighbouring nodes and their connections for the Bradford node.

Write a query to match Bradford and its neighbours. Include connections between neighbours.

Hint: Use the variable-length pattern matching.

Part 3: Verify the clustering coefficient results. (Solution)


Let’s confirm that the clustering coefficient score of 1 for the Bradford node is correct.

Write a query to match Bradford and its neighbours. Include connections between neighbours.

Hint: Use the variable-length pattern matching.

Here is the solution code:

MATCH path=(p:Place{name:'Bradford'})-[*..2]-(neighbour)
WHERE (p)--(neighbour)
RETURN path

The results will be:

EX3.3

 

Bradford belongs to a single triangle and has no other connections. This confirms that the correct clustering coefficient score for the Bradford node is indeed 1 and also our hypothesis.

Local Clustering Coefficient: Taking it further


  1. Try using the write mode of the algorithm.

  2. Try different configuration values.

Local Clustering Coefficient (Summary)


In this exercise, you gained some experience with writing Cypher to implement the Local Clustering coefficient algorithm to return the clustering coefficient for the Place nodes of the European Roads dataset.