CS3907/CS6444 Big Data and Analytics
Class project #1
Due October 4, 2021 COB
R and Graph Analytics
1. Data Set: CAIDA AS Relationships Datasets
The dataset contains 122 CAIDA AS graphs, from January 2004 to November 2007 – http://www.caida.org/data/active/as-relationships/.
The file contains a full AS graph derived from a set of RouteViews BGP table snapshots.
Note that at the end of the website above are links to references and to additional information you might find useful.
Dataset statistics are calculated for the graph with the highest number of nodes – dataset from November 5 2007.
|Dataset statistics for graph with highest number of nodes – 11 5 2007|
|Nodes in largest WCC||26475 (1.000)|
|Edges in largest WCC||106762 (1.000)|
|Nodes in largest SCC||26475 (1.000)|
|Edges in largest SCC||106762 (1.000)|
|Average clustering coefficient||0.2082|
|Number of triangles||36365|
|Fraction of closed triangles||0.002452|
|Diameter (longest shortest path)||17|
|90-percentile effective diameter||4.7|
Due Date: Lecture 4 COB
J. Leskovec, J. Kleinberg and C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2005.
2. Install the igraph package from one of the CRAN mirrors. Determine how to create a graph and plot. Show the plot in your report.
When displaying your graph, label all nodes with one of the four labels mentioned in the data set. See diagrams in the referenced website.
3. Apply some of the functions that I have shown in the Introduction to Graph Analytics document on Blackboard on the graph generated from the data set. Present the results in your write-up.
Results should show the function call with parameters and the results.
For screenshots, remember to crop the image so that you just show the results of the function.
This is a medium data set. You may have to simplify the graph somewhat in order to execute this project. If so, describe how you simplified the graph. You may use the simplify function, but you will have to do more than that.
4. Explore other functions in the igraph package – at least 10 of them. Apply them to the graph generated above. Do NOT pick ones mentioned in the Introduction to Graph Analytics! You may have to do a little programming in R. There are numerous books posted on the Blackboard.
5. Determine the (a) central nodes(s) in the graph, (b) longest path(s), (c) largest clique(s), (d) ego(s), and (e) power centrality. We will talk about some of these in Lecture 2.
6. Deliverables: You will deliver, by putting a zipfile in your group’s Blackboard file, with the following naming convention: Group-N-Project-1.zip, where N is your group number. Your deliverable should encompass the following items:
- A listing of all R functions that you have written
- The plot(s) of the graph including successive simplifications that make it somewhat readable for #2
- Demonstrations for #3
- Demonstrations of the igraph functions that you have explored as per #4.
- Results for #5
- Discuss your approach to working this project: loading data, simplifying, etc. (1 point)
7. Project #1 Value: 25 points
a. Discuss your approach to working this project: loading data, simplifying, and what you learned from it (3 points)
b. Demonstrations for #2 (3 points)
c Demonstrations for #3 (4 points)
d. Demonstrations for #4 (10 points)
e. Item #5 demonstrations – 5 points (1 each for a to e)
Be clear about what you are doing with each function. Identify any problems you had and how to solve them.
Remember to save your workspace! In your Group area would be a good place so all members can get to it.
Include in your Word document the results required
(use a CTRL-ALT-PrintScreen) to grab the screen.
You may use Irfanview 4.58, email@example.com or later. Paste in the screen image, and copy the image as JPEG to drop into your Word document.
If you are using a Mac, you can use Powerpoint to crop images.
If you need help, do not hesitate to ask for it.
You will need help in at least one or two areas.