Phenograph

Used to cluster high dimensional data. An R wrapper around the Python Phenograph module found at https://github.com/jacoblevine/PhenoGraph

phenograph(
  rdf,
  k = 30,
  directed = FALSE,
  prune = FALSE,
  min_cluster_size = 10,
  jaccard = TRUE,
  primary_metric = "euclidean",
  n_jobs = NULL,
  q_tol = 0.001,
  louvain_time_limit = 2000,
  nn_method = "kdtree"
)

Arguments

rdf

data to cluster, or sparse matrix of k-nearest neighbor graph If ndarray, n-by-d array of n cells in d dimensions If sparse matrix, n-by-n adjacency matrix

k

Number of nearest neighbors to use in first step of graph construction (default = 30)

directed

Whether to use a symmetric (default) or asymmetric ("directed") graph. The graph construction process produces a directed graph, which is symmetrized by one of two methods (see below)

prune

Whether to symmetrize by taking the average (prune=FALSE) or product (prune=TRUE) between the graph and its transpose

min_cluster_size

Cells that end up in a cluster smaller than min_cluster_size are considered outliers and are assigned to -1 in the cluster labels

jaccard

If TRUE, use Jaccard metric between k-neighborhoods to build graph. If FALSE, use a Gaussian kernel.

primary_metric

Distance metric to define nearest neighbors. Options include: 'euclidean', 'manhattan', 'correlation', 'cosine' Note that performance will be slower for correlation and cosine.

n_jobs

Nearest Neighbors and Jaccard coefficients will be computed in parallel using n_jobs. If n_jobs=NULL, the number of jobs is determined automatically

q_tol

Tolerance (i.e., precision) for monitoring modularity optimization

louvain_time_limit

Maximum number of seconds to run modularity optimization. If exceeded the best result so far is returned

nn_method

Whether to use brute force or kdtree for nearest neighbor search. For very large high-dimensional data sets, brute force (with parallel computation) performs faster than kdtree.

Value

data.frame with community membership infomation