bookssland.com » Other » Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗

Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic



1 ... 121 122 123 124 125 126 127 128 129 ... 193
Go to page:
graph.

The vertex-betweenness index reflects the amount of control exerted by a given vertex over the interactions between the other vertices in the graph. The other approach to measure betweenness is to concentrate on links in the graph instead of nodes. Edge-betweenness centrality is related to the frequency of an edge placed on the shortest paths between all pairs of vertices. The betweenness centrality of an edge in a network is given by the sum of the edge-betweenness values for all pairs of nodes in the graph going through the given edge. The edges with highest betweenness values are most likely to lie between subgraphs, rather than inside a subgraph. Consequently, successively removing edges with the highest edge-betweenness will eventually isolate subgraphs consisting of nodes that share connections only with other nodes in the same subgraph. This gives the edge-betweenness index a central role in graph-clustering algorithms where it is necessary to separate a large graph into smaller highly connected subgraphs. Edge- (also called link-) betweenness centrality is traditionally determined in two steps:

1. Compute the length and number of shortest paths between all pairs of nodes through the link.

2. Sum all link dependencies.

The overall betweenness centrality of a link v is obtained by summing up its partial betweenness values for this link, calculated using the graph transformation on breadth first strategy from each node. For example, at the beginning it is given a graph in Figure 12.10a and it is necessary to find link betweenness measures for all links in the graph. In the first step, we build a “modified” graph that starts with node A, and specifies all links in the graph, layer by layer: first neighbors, second neighbors, and so on. The resulting graph is given in Figure 12.10b. This graph is the starting point for partial betweenness computation. The total betweeness will be the sum of partial scores obtained for transformed graphs with root nodes A to K. The process with each transformed graph in (b) consists of a forward phase and a backward phase, and will be illustrated with activities on the graph in Figure 12.10b. In the forward phase, the count of shortest paths from A to all other nodes of the network is determined. The computation is performed iteratively, layer by layer. For example, the number of shortest paths from the initial node A to the node I is computed based on the number of shortest paths from node A to the nodes F and G.

Figure 12.10. Preparatory phase for link-betweenness computation. (a) Initial graph; (b) transformed graph with the root in node A.

The result of the completed forward phase is given in Figure 12.11a. Each node is labeled with the number of shortest paths from the root node A. For example, the node J has three shortest paths, two of them through the node H (ADHJ and AEHJ) and one through the node G (ADGJ).

Figure 12.11. Computation of the partial link-betweenness measure. (a) Forward phase; (b) backward phase.

The backward phase starts from the bottom of the layered graph structure, in our example, from node K. If there are multiple paths from the given node up, count betweenness measure of each link fractionally. The proportion is determined by the number of shortest paths to these nodes on the previous layer. What is the amount we are splitting between these links? The amount is defined as 1 + sum of all betweenness measures entering into the node from below. For example, from node K there are two paths toward nodes I and J, and because both nodes have the same number of shortest paths (3), the amount we are splitting is 1 + 0 = 1, and the partial betweenness measure for links IK and JK is 0.5. In a similar way, we may compute betweennes measures for node G. Total betweenness value for splitting is 1 + 0.5 + 0.5 = 2. There is only one node up; it is D, and the link-betweenness measure for GD is 2.

When we compute betweenness measures for all links in the graph, the procedure should be repeated for the other nodes in the graph, such as the root nodes, until each node of the network is explored. Finally, all partial link scores determined for different graphs should be added to determine the final link-betweenness score.

Graph-mining applications are far more challenging to implement because of the additional constraints that arise from the structural nature of the underlying graph. The problem of frequent pattern mining has been widely studied in the context of mining transactional data. Recently, the techniques for frequent-pattern mining have also been extended to the case of graph data. This algorithm attempts to find interesting or commonly occurring subgraphs in a set of graphs. Discovery of these patterns may be the sole purpose of the systems, or the discovered patterns may be used for graph classification or graph summarization. The main difference in the case of graphs is that the process of determining support is quite different. The problem can be defined in different ways, depending upon the application domain. In the first case, we have a group of graphs, and we wish to determine all the patterns that support a fraction of the corresponding graphs. In the second case, we have a single large graph, and we wish to determine all the patterns that are supported at least a certain number of times in this large graph. In both cases, we need to account for the isomorphism issue in determining whether one graph is supported by another or not. However, the problem of defining the support is much more challenging if overlaps are allowed between different embeddings. Frequently occurring subgraphs in a large graph or a set of graphs could represent important motifs in the real-world data.

Apriori-style algorithms can be extended to the case of discovering frequent subgraphs in graph data, by using a similar level-wise strategy of generating (k + 1) candidates from k-patterns. Various measures to mine substructure

1 ... 121 122 123 124 125 126 127 128 129 ... 193
Go to page:

Free e-book «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗» - read online now

Comments (0)

There are no comments yet. You can be the first!
Add a comment