Examples

In the project’s repository there are many examples on how to use GraphRepo to index and mine data.

Please note that in order to run the plotting examples you have to install pandas and plotly, for example using pip:

$ pip install pandas

1. Index data

In this example, we index all data from PyDriller in Neo4j. The example assumes you are running a Neo4j instance in Docker, as indicated in Configuration.

In order to run the example, clone the projects using the following commands:

$ git clone --recurse-submodules https://github.com/NullConvergence/GraphRepo
$ cd graphrepo
$ mkdir repos
$ cd repos
$ git clone https://github.com/ishepard/pydriller

In this step we cloned the GraphRepo project, which includes the example scripts to run and the PyDriller project, which we want to experiment with.

In order to run the indexing example, make sure to configure the config file in examples/configs/pydriller.yml and set the neo object to your database settings.

Then run:

$ python -m examples.index_all --config=examples/config/pydriller.yml

After indexing finishes, you can go to http://<database-url>:7474/browser/ and explore the project, with a query like: MATCH (n) RETURN n.

2. Retrieve all data

This step assumes you already indexed the PyDriller repository in Neo4j, as indicated at Step 1. In order to retrieve all information for PyDriller, we can run the following example:

$ python -m examples.mine_all --config=examples/config/pydriller.yml

This script will print the number of nodes indexed in the database.

3. Plot file complexity over time

This step assumes you already indexed the PyDriller repository in Neo4j, as indicated at Step 1. In this example we will use the miners to retrieve a file and plot its complexity evolution over time. The file used is examples/file_complexity.py. The complexity is stored in the UpdateFile relationship (see Schema). The get_change_history from the File miner retrieves all the UpdateFile relationships that point to the file.

For plotting, in the example we map the data to a pandas DataFrame and use Plotly, although any other libraries can be used.

In order to display the plot, run:

$ python -m examples.file_complexity --config=examples/configs/pydriller.yml

3. Plot file methods complexity over time

This step assumes you already indexed the PyDriller repository in Neo4j, as indicated at Step 1. In this example we will use the miners to retrieve and plot the complexity evolution over time of all methods in a file. The file used is examples/all_method_complexity.py. The complexity is stored in the UpdateFile relationship (see Data Structure). We first get all the methods for a file, then, for each method, we get the update information as in Step 2.

For plotting, in the example we map the data to a pandas DataFrame and use Plotly, although any other libraries can be used.

In order to display the plot, run:

$ python -m examples.all_method_complexity --config=examples/configs/pydriller.yml