# Models¶

## Erdos-Reyni models¶

class graspy.models.EREstimator(directed=True, loops=False)[source]

Erdos-Reyni Model

The Erdos-Reyni (ER) model is a simple random graph model in which the probability of any potential edge in the graph existing is the same for any two nodes $$i$$ and $$j$$.

$$P_{ij} = p$$ for all i, j

Parameters: directed : boolean, optional (default=True) Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed. loops : boolean, optional (default=False) Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself. p_ : float Value between 0 and 1 (inclusive) representing the probability of any edge in the ER graph model p_mat_ : np.ndarray, shape (n_verts, n_verts) Probability matrix $$P$$ for the fit model, from which graphs could be sampled.

References

 [Rce4f95a7a410-1] https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model
fit(self, graph, y=None)[source]

Fit the SBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters: graph : array_like or networkx.Graph Input graph to fit y : array_like, length graph.shape[0], optional Categorical labels for the block assignments of the graph
bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph bic : float The lower the better
get_params(self, deep=True)

Get parameters for this estimator.

Parameters: deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. params : mapping of string to any Parameter names mapped to their values.
mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph mse : float Mean square error for the model's fit P matrix
sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters: n_samples : int (default 1), optional The number of graphs to sample graphs : np.array (n_samples, n_verts, n_verts) Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs. Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph score : float sum of log-loglikelihoods for each potential edge in input graph
score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph clip : scalar or None, optional (default=None) Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model. sample_scores : np.ndarray (size of graph) log-likelihood per potential edge in the graph
set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns: self
class graspy.models.DCEREstimator(directed=True, loops=False, degree_directed=False)[source]

Degree-corrected Erdos-Reyni Model

The Degree-corrected Erdos-Reyni (DCER) model is an extension of the ER model in which each node has an additional "promiscuity" parameter $$\theta_i$$ that determines its expected degree in the graph.

$$P_{ij} = \theta_i \theta_j p$$

Parameters: directed : boolean, optional (default=True) Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed. loops : boolean, optional (default=False) Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself. degree_directed : boolean Whether to allow seperate degree correction parameters for the in and out degree of each node. Ignored if directed is False. p_ : float The $$p$$ parameter as described in the above model, which weights the overall probability of connections between any two nodes. p_mat_ : np.ndarray, shape (n_verts, n_verts) Probability matrix $$P$$ for the fit model, from which graphs could be sampled. degree_corrections_ : np.ndarray, shape (n_verts, 1) or (n_verts, 2) Degree correction vector(s) $$theta$$. If degree_directed parameter was False, then will be of shape (n_verts, 1) and element i represents the degree correction for node $$i$$. Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.

Notes

The DCER model is rarely mentioned in literature, though it is simply a special case of the DCSBM where there is only one community.

References

 [R52fcc65416a4-1] https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model
 [R52fcc65416a4-2] Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.
fit(self, graph, y=None)[source]

Fit the DCSBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters: graph : array_like or networkx.Graph Input graph to fit y : array_like, length graph.shape[0], optional Categorical labels for the block assignments of the graph self : DCSBMEstimator object Fitted instance of self
bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph bic : float The lower the better
get_params(self, deep=True)

Get parameters for this estimator.

Parameters: deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. params : mapping of string to any Parameter names mapped to their values.
mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph mse : float Mean square error for the model's fit P matrix
sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters: n_samples : int (default 1), optional The number of graphs to sample graphs : np.array (n_samples, n_verts, n_verts) Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs. Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph score : float sum of log-loglikelihoods for each potential edge in input graph
score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph clip : scalar or None, optional (default=None) Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model. sample_scores : np.ndarray (size of graph) log-likelihood per potential edge in the graph
set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns: self

## Stochastic block models¶

class graspy.models.SBMEstimator(directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]

Stochastic Block Model

The stochastic block model (SBM) represents each node as belonging to a block (or community). For a given potential edge between node $$i$$ and $$j$$, the probability of an edge existing is specified by the block that nodes $$i$$ and $$j$$ belong to:

$$P_{ij} = B_{\tau_i \tau_j}$$

where $$B \in \mathbb{[0, 1]}^{K x K}$$ and $$\tau$$ is an n_nodes length vector specifying which block each node belongs to.

Parameters: directed : boolean, optional (default=True) Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed. loops : boolean, optional (default=False) Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself. n_components : int, optional (default=None) Desired dimensionality of embedding for clustering to find communities. n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension(). min_comm : int, optional (default=1) The minimum number of communities (blocks) to consider. max_comm : int, optional (default=10) The maximum number of communities (blocks) to consider (inclusive). cluster_kws : dict, optional (default={}) Additional kwargs passed down to GaussianCluster embed_kws : dict, optional (default={}) Additional kwargs passed down to AdjacencySpectralEmbed block_p_ : np.ndarray, shape (n_blocks, n_blocks) The block probability matrix $$B$$, where the element $$B_{i, j}$$ represents the probability of an edge between block $$i$$ and block $$j$$. p_mat_ : np.ndarray, shape (n_verts, n_verts) Probability matrix $$P$$ for the fit model, from which graphs could be sampled. vertex_assignments_ : np.ndarray, shape (n_verts) A vector of integer labels corresponding to the predicted block that each node belongs to if y was not passed during the call to fit. block_weights_ : np.ndarray, shape (n_blocks) Contains the proportion of nodes that belong to each block in the fit model.

References

 [R09567a76fb04-1] Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks, 5(2), 109-137.
fit(self, graph, y=None)[source]

Fit the SBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters: graph : array_like or networkx.Graph Input graph to fit y : array_like, length graph.shape[0], optional Categorical labels for the block assignments of the graph
bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph bic : float The lower the better
get_params(self, deep=True)

Get parameters for this estimator.

Parameters: deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. params : mapping of string to any Parameter names mapped to their values.
mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph mse : float Mean square error for the model's fit P matrix
sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters: n_samples : int (default 1), optional The number of graphs to sample graphs : np.array (n_samples, n_verts, n_verts) Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs. Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph score : float sum of log-loglikelihoods for each potential edge in input graph
score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph clip : scalar or None, optional (default=None) Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model. sample_scores : np.ndarray (size of graph) log-likelihood per potential edge in the graph
set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns: self
class graspy.models.DCSBMEstimator(degree_directed=False, directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]

Degree-corrected Stochastic Block Model

The degree-corrected stochastic block model (DCSBM) represents each node as belonging to a block (or community). For a given potential edge between node $$i$$ and $$j$$, the probability of an edge existing is specified by the block that nodes $$i$$ and $$j$$ belong to as in the SBM. However, an additional "promiscuity" parameter $$\theta$$ is added for each node, allowing the vertices within a block to have heterogeneous expected degree distributions:

$$P_{ij} = \theta_i \theta_j B_{\tau_i \tau_j}$$

where $$B \in \mathbb{[0, 1]}^{K x K}$$ $$\tau$$ is an n_nodes length vector specifying which block each node belongs to, and $$\theta$$ is an n_nodes length vector specifiying the degree correction for each node.

The degree_directed parameter of this model allows the degree correction parameter to be different for the in and out degree of each node:

$$P_{ij} = \theta_i \eta_j B_{\tau_i \tau_j}$$

where $$\theta$$ and $$\eta$$ need not be the same.

Parameters: directed : boolean, optional (default=True) Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed. degree_directed : boolean, optional (default=False) Whether to fit an "in" and "out" degree correction for each node. In the degree_directed case, the fit model can have a different expected in and out degree for each node. loops : boolean, optional (default=False) Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself. n_components : int, optional (default=None) Desired dimensionality of embedding for clustering to find communities. n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension(). min_comm : int, optional (default=1) The minimum number of communities (blocks) to consider. max_comm : int, optional (default=10) The maximum number of communities (blocks) to consider (inclusive). cluster_kws : dict, optional (default={}) Additional kwargs passed down to GaussianCluster embed_kws : dict, optional (default={}) Additional kwargs passed down to LaplacianSpectralEmbed block_p_ : np.ndarray, shape (n_blocks, n_blocks) The block probability matrix $$B$$, where the element $$B_{i, j}$$ represents the expected number of edges between block $$i$$ and block $$j$$. p_mat_ : np.ndarray, shape (n_verts, n_verts) Probability matrix $$P$$ for the fit model, from which graphs could be sampled. degree_corrections_ : np.ndarray, shape (n_verts, 1) or (n_verts, 2) Degree correction vector(s) $$theta$$. If degree_directed parameter was False, then will be of shape (n_verts, 1) and element $$i$$ represents the degree correction for node $$i$$. Otherwise, the first column contains out degree corrections and the second column contains in degree corrections. vertex_assignments_ : np.ndarray, shape (n_verts) A vector of integer labels corresponding to the predicted block that each node belongs to if y was not passed during the call to fit. block_weights_ : np.ndarray, shape (n_blocks) Contains the proportion of nodes that belong to each block in the fit model.

Notes

Note that many examples in the literature describe the DCSBM as being sampled with a Poisson distribution. Here, we implement this model with a Bernoulli. When individual edge probabilities are relatively low these two distributions will yield similar results.

References

 [R3dc0ad7dd21a-1] Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.
fit(self, graph, y=None)[source]

Fit the DCSBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters: graph : array_like or networkx.Graph Input graph to fit y : array_like, length graph.shape[0], optional Categorical labels for the block assignments of the graph self : DCSBMEstimator object Fitted instance of self
bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph bic : float The lower the better
get_params(self, deep=True)

Get parameters for this estimator.

Parameters: deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. params : mapping of string to any Parameter names mapped to their values.
mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph mse : float Mean square error for the model's fit P matrix
sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters: n_samples : int (default 1), optional The number of graphs to sample graphs : np.array (n_samples, n_verts, n_verts) Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs. Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph score : float sum of log-loglikelihoods for each potential edge in input graph
score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph clip : scalar or None, optional (default=None) Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model. sample_scores : np.ndarray (size of graph) log-likelihood per potential edge in the graph
set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns: self

## Latent position models¶

class graspy.models.RDPGEstimator(loops=False, n_components=None, ase_kws={}, diag_aug_weight=1, plus_c_weight=1)[source]

Random Dot Product Graph

Under the random dot product graph model, each node is assumed to have a "latent position" in some $$d$$-dimensional Euclidian space. This vector dictates that node's probability of connection to other nodes. For a given pair of nodes $$i$$ and $$j$$, the probability of connection is the dot product between their latent positions:

$$P_{ij} = \langle x_i, y_j \rangle$$

where $$x_i$$ is the left latent position of node $$i$$, and $$y_j$$ is the right latent position of node $$j$$. If the graph being modeled is is undirected, then $$x_i = y_i$$. Latent positions can be estimated via AdjacencySpectralEmbed.

Parameters: loops : boolean, optional (default=False) Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself. n_components : int, optional (default=None) The dimensionality of the latent space used to model the graph. If None, the method of Zhu and Godsie will be used to select an embedding dimension. ase_kws : dict, optional (default={}) Dictionary of keyword arguments passed down to AdjacencySpectralEmbed, which is used to fit the model. diag_aug_weight : int or float, optional (default=1) Weighting used for diagonal augmentation, which is a form of regularization for fitting the RDPG model. plus_c_weight : int or float, optional (default=1) Weighting used for a constant scalar added to the adjacency matrix before embedding as a form of regularization. latent_ : tuple, length 2, or np.ndarray, shape (n_verts, n_components) The fit latent positions for the RDPG model. If a tuple, then the graph that was input to fit was directed, and the first and second elements of the tuple are the left and right latent positions, respectively. The left and right latent positions will both be of shape (n_verts, n_components). If latent_ is an array, then the graph that was input to fit was undirected and the left and right latent positions are the same. p_mat_ : np.ndarray, shape (n_verts, n_verts) Probability matrix $$P$$ for the fit model, from which graphs could be sampled.

References

 [R8a60fabb19f1-1] Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., ... & Sussman, D. L. (2018). Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226), 1-92.
 [R8a60fabb19f1-2] Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918-930.
fit(self, graph, y=None)[source]

Calculate the parameters for the given graph model

bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph bic : float The lower the better
get_params(self, deep=True)

Get parameters for this estimator.

Parameters: deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. params : mapping of string to any Parameter names mapped to their values.
mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray Input graph mse : float Mean square error for the model's fit P matrix
sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters: n_samples : int (default 1), optional The number of graphs to sample graphs : np.array (n_samples, n_verts, n_verts) Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs. Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.
score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph score : float sum of log-loglikelihoods for each potential edge in input graph
score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters: graph : np.ndarray input graph clip : scalar or None, optional (default=None) Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model. sample_scores : np.ndarray (size of graph) log-likelihood per potential edge in the graph
set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns: self