[ad_1]
In Part 1, we explored link analysis, specifically social network analysis in investigating and understanding relationships between individuals and entities. Then, we introduced social network analysis (SNA), a specific type of link analysis that focuses on people and groups and their relationships. We reviewed the basic concepts of SNA, including nodes (representing individuals) and edges (representing connections between individuals). Then, we discussed how SNA can be used to understand social influence, group formation, and information flow using metrics such as degree centrality and betweenness centrality using Billy Corgan and his relationship to the founding members of Smashing Pumpkins as a simple example.
In that example, we kept the network small and simple. In this tutorial, we will continue to use Python and NetworkX to examine Billy Corgan’s sphere of influence. We will also expand Billy Corgan’s network to make it more complex and increase our understanding of degree centrality and betweenness centrality. As we work through this example, we will discuss context and how domain knowledge is essential to maximizing the benefits of social network analysis.
Domain knowledge and research are essential components of social network analysis because they provide the necessary context, theoretical framework, and understanding of the social and cultural factors that shape social networks. Without this understanding, you risk producing misleading or incorrect findings that fail to accurately capture the complexity and nuance of social network data.
Before you start…
- Do you have basic knowledge of Python? If not, start here.
- Are you familiar with basic concepts in social network analysis, like nodes and edges, or metrics like centrality? If not, start here.
So what kind of data do we need to start investigating Billy Corgan’s sphere of influence? Let’s start with all of his bandmates from the Smashing Pumpkins, current and former.
Using Wikipedia, we can get a fairly reliable list of all the musicians that played in the Smashing Pumpkins since 1988. By the way — did you know that Billy Corgan (briefly) had another band named Zwan in the early aughties? Spoiler alert, it did not end well. Let’s make a list of them too.
Then, open up your favorite IDE, import the relevant libraries, and make two lists — one for Smashing Pumpkins and one for Zwan.
Our next task is to build out some lists of tuples to represent the relationships between Billy Corgan and each of these band members. We also need to consider the relationship between each of the band members and all of the other band members.
In graph theory, this kind of relation is known as symmetric. If Billy is in a band with Jimmy, Jimmy is also in a band with Billy.
To accomplish this, we can use Python to build a simple function that will ingest each list of band members and return all the possible combinations of the pairs.
Then, we can apply to each list and combine the results to create a list of tuples that contain the relationships between all the band members of Zwan and the Smashing Pumpkins.
The output will look something like this:
[('Billy Corgan', 'James Iha'),
('Billy Corgan', 'Jimmy Chamberlin'),
('Billy Corgan', 'Katie Cole'),
('Billy Corgan', "D'arcy Wretzky"),
('Billy Corgan', 'Melissa Auf der Maur'),
('Billy Corgan', 'Ginger Pooley'),
('Billy Corgan', 'Mike Byrne'),
('Billy Corgan', 'Nicole Fiorentino'),
('James Iha', 'Jimmy Chamberlin'),
('James Iha', 'Katie Cole'),
('James Iha', "D'arcy Wretzky"),
('James Iha', 'Melissa Auf der Maur'),
('James Iha', 'Ginger Pooley'),
('James Iha', 'Mike Byrne'),
('James Iha', 'Nicole Fiorentino'),
('Jimmy Chamberlin', 'Katie Cole'),
('Jimmy Chamberlin', "D'arcy Wretzky"),
('Jimmy Chamberlin', 'Melissa Auf der Maur'),
('Jimmy Chamberlin', 'Ginger Pooley'),
('Jimmy Chamberlin', 'Mike Byrne'),
('Jimmy Chamberlin', 'Nicole Fiorentino'),
('Katie Cole', "D'arcy Wretzky"),
('Katie Cole', 'Melissa Auf der Maur'),
('Katie Cole', 'Ginger Pooley'),
('Katie Cole', 'Mike Byrne'),
('Katie Cole', 'Nicole Fiorentino'),
("D'arcy Wretzky", 'Melissa Auf der Maur'),
("D'arcy Wretzky", 'Ginger Pooley'),
("D'arcy Wretzky", 'Mike Byrne'),
("D'arcy Wretzky", 'Nicole Fiorentino'),
('Melissa Auf der Maur', 'Ginger Pooley'),
('Melissa Auf der Maur', 'Mike Byrne'),
('Melissa Auf der Maur', 'Nicole Fiorentino'),
('Ginger Pooley', 'Mike Byrne'),
('Ginger Pooley', 'Nicole Fiorentino'),
('Mike Byrne', 'Nicole Fiorentino'),
('Billy Corgan', 'Jimmy Chamberlin'),
('Billy Corgan', 'Paz Lenchantin'),
('Billy Corgan', 'David Pajo'),
('Billy Corgan', 'Matt Sweeney'),
('Jimmy Chamberlin', 'Paz Lenchantin'),
('Jimmy Chamberlin', 'David Pajo'),
('Jimmy Chamberlin', 'Matt Sweeney'),
('Paz Lenchantin', 'David Pajo'),
('Paz Lenchantin', 'Matt Sweeney'),
('David Pajo', 'Matt Sweeney')]
Next, we can loop over the list of tuples to generate a graph with Network X.
Which generates this graph:
Let’s discuss two key observations that can be gleaned about the network from this graph.
- The upper right corner where the Smashing Pumpkins band members appear is more complex than the lower left corner where the members of Zwan are because there are fewer members in Zwan.
- Billy Corgan and Jimmy Chamberlin appear in the center because they are in both bands.
Next, let’s consider how these observations may be reflected in degree centrality and betweenness centrality.
Degree Centrality and Betweenness Centrality with NetworkX
In Part 1, we calculated the degree centrality and betweenness centrality for Billy Corgan and the founding members of the Smashing Pumpkins. To accomplish this, we called on two methods in NetworkX, and wrote a simple script to execute them. This time, since we have our graph assembled, we can simply input the graph to calculate the centrality measures.
This will generate the following output:
Let’s discuss how to interpret these results.
What does this table tell us about the degree centrality of all of the band members?
1. Billy Corgan has the highest degree centrality score of 1.000, indicating that he has the highest number of connections or collaborations within Smashing Pumpkins and Zwan. He is directly connected to every other member of both of the bands.
2. Jimmy Chamberlin also has a degree centrality score of 1.000, suggesting that he, like Billy Corgan, has direct connections to every other member of the two bands.
3. James Iha, Katie Cole, D’arcy Wretzky, Melissa Auf der Maur, Ginger Pooley, Mike Byrne, Nicole Fiorentino, Paz Lenchantin, David Pajo, and Matt Sweeney all have the same degree centrality score of 0.727273, suggesting that they have similar levels of connections or collaborations within the bands.
What does this table tell us about the betweenness centrality of all of the band members?
1. Billy Corgan and Jimmy Chamberlin also have the highest betweenness centrality scores of 0.190909, indicating that they are likely important intermediaries or bridges between other band members in terms of communication or collaboration.
2. None of the band members, except Billy Corgan and Jimmy Chamberlin, have a non-zero betweenness centrality score, indicating that they are not central in terms of bridging connections between other members.
Strengthening Inferences with Domain Knowledge
While centrality metrics provide data points from which we can draw inferences, these inferences are based solely on the information provided in the table.
To make more specific conclusions about Billy Corgan’s sphere of influence, you would need knowledge regarding nineties alternative music and musicians to offer a fully-fledged hypothesis on the dynamics between the members of these bands.
So if you are a nineties music aficionado, let me know what you think about these results in the comments. Be sure to stay tuned for Part 3, where we expand the network so we can explore closeness centrality, clustering, and communities in social network analysis.
If you would like the fully annotated Python script for this tutorial, visit my GitHub!
👩🏻💻 Christine Egan | medium | github | linkedin
[ad_2]
Source link