bookssland.com » Other » Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗

Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic



1 ... 120 121 122 123 124 125 126 127 128 ... 193
Go to page:
matrix representation of a graph (in the case of undirected and unlabeled graphs) is the k × k adjacency matrix, where both rows and columns are defined by nodes. The graph-structured data can be transformed without much computational effort into an adjacency matrix, which is a well-known representation of a graph in mathematical graph theory. On the intersection (i,j) is the value of 1 if the nodes ni and nj are connected by a link; otherwise it is 0 value (Fig. 12.4).

Figure 12.4. Matrix representations of graphs. (a) Incidence matrix:nodes × links; (b) adjancency matrix:nodes ×nodes.

If a graph has labeled links, the following conversion of the graph to a new graph that has labels at its nodes only is applied. This transformation reduces the ambiguity in the adjacency matrix representation. Given a node pair (u, v) and a directed or undirected link {u, v} between the nodes, that is, node(u)−link({u, v})−node(v) where node() and link() stand for the labels of a node and a link, respectively. The link() label information can be removed and transformed into a new node(u, v), and the following triangular graph may be deduced where the original information of the node pair and the link between them are preserved.

This operation on each link in the original graph can convert the graph into another graph representation having no labels on the links while preserving the topological information of the original graph.

Accordingly, the adjacency matrix of Figure 12.5a is converted to that of Figure 12.5b as follows.

Figure 12.5. Preprocessing of labeled links and self-looped nodes in a graph.

The aforementioned explanation was for directed graphs. But the identical preprocessing may be applied to undirected graphs. The difference from the case of directed graphs is that the adjacency matrix becomes diagonally symmetric. The adjacency matrix of Figure 12.6a is represented in Figure 12.6b.

Figure 12.6. Undirected graph representations.

For the sake not only of efficiency in memory consumption but also of efficiency in computations with graphs, we define a code representation of an adjacency matrix as follows. In the case of an undirected graph, the code of an adjacency matrix Xk, that is, code(Xk ), is represented by a binary number. It is obtained by scanning the upper triangular elements of the matrix along each column until diagonal elements (for an undirected graph). For example, the code of the adjacency matrix in Figure 12.6b is given by

A variety of operations is defined in graph theory, and corresponding algorithms are developed to efficiently perform these operations. The algorithms work with graphical, matrix, or code representations of graphs. One of the very important operations is the joining of two graphs to form a new, more complex graph. This operation is used in many graph-mining algorithms, including frequent-pattern mining in graphs. The join operation is demonstrated through the example depicted in Figure 12.7. The examples of the adjacency matrices X4, Y4, and Z5 are given in (d) representing the graphs at (a), (b), and (c).

Figure 12.7. An example of join operation between graphs (a) and (b).

Graph analysis includes a number of parameters that describe the important characteristics of a graph, and they are used as fundamental concepts in developing graphmining algorithms. Over the years, graph-mining researchers have introduced a large number of centrality indices, measures of the importance of the nodes in a graph according to one criterion or another.

Perhaps the simplest centrality measure is degree, which is the number of links for a given node. Degree is a measure in some sense of the “popularity” of a node in the presented graph. Nodes with a high degree are considered to be more central. However, this weights a node only by its immediate neighbors and not by, for example, its two-hop and three-hop neighbors. A more sophisticated centrality measure is closeness, which is the mean geodesic (i.e., shortest path) distance between a vertex and all other vertices reachable from it. Examples of computations for both measures are given in Figure 12.8. Closeness can be regarded as a measure of how long it will take for information to spread from a given node to others in the graph. Nodes that have a low distance to all other nodes in the graph have high closeness centrality.

Figure 12.8. Degree and closeness parameters of the graph.

Another important class of centrality measures is the class of betweenness measures. Betweenness is a measure of the extent to which a vertex lies on the paths between others. The simplest and most widely used betweenness measure is shortest path betweenness, or simply betweenness. The betweenness of a node i is defined to be the fraction of shortest paths between any pair of vertices in a graph that passes through node i. This is, in some sense, a measure of the influence a node has over the spread of connections through the network. Nodes with high betweenness values occur on a larger number of shortest paths and are presumably more important than nodes with low betweenness. The parameter is costly to compute especially when the graphs are complex with a large number of nodes and links. Currently, the fastest known algorithms require O(n3) time complexity and O(n2) space complexity, where n is the number of nodes in the graph.

Illustrative examples of centrality measures and their interpretations are given in Figure 12.9. Node X has importance because it bridges the structural hole between the two clusters of interconnected nodes. It has the highest betweenness measure compared with all the other nodes in the graph. Such nodes get lots of brokerage opportunities and can control the flow in the paths between subgraphs. On the other hand, node Y is in the middle of a dense web of nodes that provides easy, short path access to neighboring nodes; thus, Y also has a good central position in the subgraph. This characteristic of the node Y is described with the highest degree measure.

Figure 12.9. Different types of a node’s importance in a

1 ... 120 121 122 123 124 125 126 127 128 ... 193
Go to page:

Free e-book «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗» - read online now

Comments (0)

There are no comments yet. You can be the first!
Add a comment