Package 'LPS' reference manual

Title:	Linear Predictor Score, for Binary Inference from Multiple Continuous Variables
Description:	An implementation of the Linear Predictor Score approach, as initiated by Radmacher et al. (J Comput Biol 2001) and enhanced by Wright et al. (PNAS 2003) for gene expression signatures. Several tools for unsupervised clustering of gene expression data are also provided.
Authors:	Sylvain Mareschal
Maintainer:	Sylvain Mareschal <[email protected]>
License:	GPL (>= 3)
Version:	1.0.17
Built:	2025-03-01 04:50:59 UTC
Source:	https://github.com/maressyl/r.lps

Hierarchical clustering heat maps

Description

This function draws a heat map ordered according to hierarchical clusterings, similarly to heatmap. It offers more control on layout and allows multiple row annotations.

hclust.ward is derivated from 'stats' package hclust, with an alternative default (as arguments can not be passed to it).

dist.COR mimics 'stats' package dist, computing distances as 1 - Pearson's correlation coefficient.

Usage

  clusterize(expr, side = NULL, cex.col = NA, cex.row = NA, mai.left = NA,
    mai.bottom = NA, mai.right = 0.1, mai.top = 0.1, side.height = 1, side.col = NULL,
    side.srt = 0, side.cex = 1, col.heatmap = heat(), zlim = "0 centered",
	zlim.trim = 0.02, norm = c("rows", "columns", "none"), norm.clust = TRUE,
	norm.robust = FALSE, customLayout = FALSE, getLayout = FALSE, plot = TRUE,
	widths = c(1, 4), heights = c(1, 4), order.genes = NULL, order.samples = NULL,
	fun.dist = dist.COR, fun.hclust = hclust.ward, clust.genes = NULL,
	clust.samples = NULL)
  dist.COR(input)
  hclust.ward(input)
clusterize(expr, side = NULL, cex.col = NA, cex.row = NA, mai.left = NA,
    mai.bottom = NA, mai.right = 0.1, mai.top = 0.1, side.height = 1, side.col = NULL,
    side.srt = 0, side.cex = 1, col.heatmap = heat(), zlim = "0 centered",
	zlim.trim = 0.02, norm = c("rows", "columns", "none"), norm.clust = TRUE,
	norm.robust = FALSE, customLayout = FALSE, getLayout = FALSE, plot = TRUE,
	widths = c(1, 4), heights = c(1, 4), order.genes = NULL, order.samples = NULL,
	fun.dist = dist.COR, fun.hclust = hclust.ward, clust.genes = NULL,
	clust.samples = NULL)
  dist.COR(input)
  hclust.ward(input)

Arguments

`expr`	A numeric matrix, holding features (genes) in columns and observations (samples) in rows. Rows and columns will be ordered according to hierarchical clustering results.
`side`	To be passed to `heat.map`.
`cex.col`	To be passed to `heat.map`.
`cex.row`	To be passed to `heat.map`.
`mai.left`	To be passed to `heat.map`.
`mai.bottom`	To be passed to `heat.map`.
`mai.right`	To be passed to `heat.map`.
`mai.top`	To be passed to `heat.map`.
`side.height`	To be passed to `heat.map`.
`side.col`	To be passed to `heat.map`.
`side.srt`	To be passed to `heat.map`.
`side.cex`	To be passed to `heat.map`.
`col.heatmap`	To be passed to `heat.map`.
`zlim`	To be passed to `heat.map`.
`zlim.trim`	To be passed to `heat.map`.
`norm`	To be passed to `heat.map`.
`norm.clust`	Single logical value, whether to apply normalization before clustering or after. Normalization applied depends on `norm`.
`norm.robust`	To be passed to `heat.map`.
`customLayout`	Single logical value, as `layout` does not allow nested calls, set this to TRUE to make your own call to layout and embed this plot in a wider one.
`getLayout`	Single logical value, whether to only return the `layout` arguments that would be used with the set of arguments provided or not. It can prove useful to build custom layouts, e.g. merging this plot to an other. See also `customLayout`.
`plot`	To be passed to `heat.map`.
`widths`	To be passed to `layout`.
`heights`	To be passed to `layout`.
`order.genes`	A function taking the gene dendrogram and `expr` as arguments, and returning the same dendrogram ordered in a custom way.
`order.samples`	A function taking the sample dendrogram and `expr` as arguments, and returning the same dendrogram ordered in a custom way.
`fun.dist`	A function to be used for distance computation in clustering. Default value uses 1 - Pearson's correlation as distance. See `dist` for further details.
`fun.hclust`	A function to be used for agglomeration in clustering. See `hclust` for further details.
`clust.genes`	If not `NULL`, an object coercible to the `dendrogram` class (typically the output from `hclust()`) to use instead of a fresh hierarchical clustering of genes. The `FALSE` value can also be used to disable computation and/or plotting of the dendrogram.
`clust.samples`	If not `NULL`, an object coercible to the `dendrogram` class (typically the output from `hclust()`) to use instead of a fresh hierarchical clustering of samples. The `FALSE` value can also be used to disable computation and/or plotting of the dendrogram.
`input`	See `hclust` and `dist` respectively for further details.

Value

clusterize invisibly returns the same list as heat.map, plus :

`genes`	The gene dendrogram.
`samples`	The sample dendrogram.

See hclust and dist respectively for the other functions.

Author(s)

Sylvain Mareschal

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)[,1:100]
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Simple heat map
  clusterize(expr)
  
  # With annotation (row named data.frame)
  side <- data.frame(group, row.names=rownames(expr))
  clusterize(expr, side=side)
# Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)[,1:100]
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Simple heat map
  clusterize(expr)
  
  # With annotation (row named data.frame)
  side <- data.frame(group, row.names=rownames(expr))
  clusterize(expr, side=side)

Heatmap palette generation

Description

This function generates a ramp of colors for heat.map derivated functions.

Usage

  heat(colors = c("#8888FF", "#000000", "#FF4444"), n = 256, shapeFun = heat.exp, ...)
  heat.exp(n, part, base = 1.015)
  heat.lin(n, part)
heat(colors = c("#8888FF", "#000000", "#FF4444"), n = 256, shapeFun = heat.exp, ...)
  heat.exp(n, part, base = 1.015)
  heat.lin(n, part)

Arguments

`colors`	Character vector of length 3, determining starting, middle and final colors.
`n`	Single integer value, amount of colors / values to generate.
`shapeFun`	Function taking at least 2 arguments : `n` and `part`. `heat.exp` and `heat.lin` are provided as examples.
`...`	Further arguments to `heat` will be passed to `shapeFun`.
`part`	Single integer, defined as 1 while generating colors between the first two boundaries, and 2 otherwise.
`base`	Single numeric value, base for exponential slope.

Value

heat returns a character vector of colors in hexadecimal representation.

heat.lin and heat.expr return n numeric values, defining a curve whose slope will be mimiced during color interpolation.

Author(s)

Sylvain Mareschal

Examples

  # Classical heatmap colors
  palette <- heat(c("green", "black", "red"))
  heat.scale(zlim=c(-2,2), col.heatmap=palette)
  
  # Two distinct shapes provided
  heat.scale(zlim=c(-2,2), col.heatmap=heat(shapeFun=heat.lin))
  heat.scale(zlim=c(-2,2), col.heatmap=heat(shapeFun=heat.exp))
# Classical heatmap colors
  palette <- heat(c("green", "black", "red"))
  heat.scale(zlim=c(-2,2), col.heatmap=palette)
  
  # Two distinct shapes provided
  heat.scale(zlim=c(-2,2), col.heatmap=heat(shapeFun=heat.lin))
  heat.scale(zlim=c(-2,2), col.heatmap=heat(shapeFun=heat.exp))

Enhanced heat map ploting

Description

This function draws a heatmap from a matrix, similarly to image. It also offers normalization and annotation features, with more control than heatmap.

side can provide multiple sample annotations, and are handled differently depending on their class :

numeric: are attributed grey shades from the minimum to the maximum, which are provided in the legend
factor: have their levels attributed colors using a default or custom palette. Hexadecimal color codes starting with # and color names known by R are used "as is".
character: are printed as is in a blank cell. Hexadecimal color codes starting with # and color names known by R are used as background colors instead of text.
logical: are ploted in dark (TRUE) or light (FALSE) gray, leaving NAs in white.

Usage

  heat.map(expr, side = NULL, cex.col = NA, cex.row = NA, mai.left = NA,
    mai.bottom = NA, mai.right = 0.1, mai.top = 0.1, side.height = 1, side.col = NULL,
    side.srt = 0, side.cex = 1, col.heatmap = heat(), zlim = "0 centered",
	zlim.trim = 0.02, norm = c("rows", "columns", "none"), norm.robust = FALSE,
	customLayout = FALSE, getLayout = FALSE, font = c(1, 3), xaxt = "s", yaxt = "s")
heat.map(expr, side = NULL, cex.col = NA, cex.row = NA, mai.left = NA,
    mai.bottom = NA, mai.right = 0.1, mai.top = 0.1, side.height = 1, side.col = NULL,
    side.srt = 0, side.cex = 1, col.heatmap = heat(), zlim = "0 centered",
	zlim.trim = 0.02, norm = c("rows", "columns", "none"), norm.robust = FALSE,
	customLayout = FALSE, getLayout = FALSE, font = c(1, 3), xaxt = "s", yaxt = "s")

Arguments

`expr`	A numeric matrix, holding features (genes) in columns and observations (samples) in rows. Column and row order will not be altered.
`side`	An annotation `data.frame` for `expr`, or `NULL`. Must contain at least a row for each `expr` row, and one or many annotation column. Merging is performed on row names, so rows must be named following the same conventions as `expr`. Hexadecimal color definitions will be used "as is", other values will be attributed colors according to `side.col`.
`cex.col`	Single numeric value, character exapansion factor for column names. `NA` will compute a value from `expr` size, similarly to `heatmap`.
`cex.row`	Single numeric value, character exapansion factor for row names. `NA` will compute a value from `expr` size, similarly to `heatmap`.
`mai.left`	Single numeric value, left margin in inches (for row names). Use `NA` for an automatic value computed from row name lengths. See `par`.
`mai.bottom`	Single numeric value, bottom margin in inches (for column names). Use `NA` for an automatic value computed from column name lengths. See `par`.
`mai.right`	Single numeric value, right margin in inches (for higher level functions). See `par`.
`mai.top`	Single numeric value, top margin in inches. See `par`.
`side.height`	Single numeric value, scaling factor for annotation track.
`side.col`	A function returning as many colors as requested by its sole argument, defining the colors to be used for `side` legend. Default uses a custom palette for few values, and a derivative of `rainbow` if more than 8 colors are needed.
`side.srt`	Single numeric value, determining the string rotation angle when writing character side columns (default is 0, horizontal, 90 is suggested for vertical text on busy heat maps).
`side.cex`	Single numeric value, the character expansion factor to use for character side columns.
`col.heatmap`	Character vector of colors, to be used for the cells of the heat map.
`zlim`	Numeric vector of length two, defining minimal and maximal `expr` values that will be mapped to colors in `col.heatmap`. Values outside of this range will be rounded to the mearest boundary. Two special values are also allowed : "0 centered" to get a symetrical range around 0 (with the default palette, it enforces 0 as the center color), and "range" to get `expr` range after normalization.
`zlim.trim`	Single numeric value between 0 and 1, defining the proportion of extreme values (equally split on both sides) to remove before computing "0 centered" or "range" `zlim`.
`norm`	Single character value, normalization to be performed (use "none" to perform no normalization). "rows" will center and scale genes, while "columns" will center and scale samples. The functions used depend on `norm.robust`.
`norm.robust`	Single logical value, if `TRUE` `median` and `mad` will be used for centering and scaling, else `mean` and `sd`.
`customLayout`	Single logical value, as `layout` does not allow nested calls, set this to TRUE to make your own call to layout and embed this plot in a wider one. See also `getLayout`.
`getLayout`	Single logical value, whether to only return the `layout` arguments that would be used with the set of arguments provided or not. It can prove useful to build custom layouts, e.g. merging this plot to an other. See also `customLayout`.
`font`	Integer vector of length two, the `font` used to draw X and Y axis labels respectively (see `par`). Default is to print X labels (usually samples) in normal font and Y labels (usually genes) in italic font.
`xaxt`	Single letter, whether to print column labels ("s") or not ("n").
`yaxt`	Single letter, whether to print row labels ("s") or not ("n").

Value

Invisibly returns a named list :

`zlim`	Final value of the `zlim` argument.
`col.heatmap`	Final value of the `col.heatmap` argument.
`legend`	If `side` is used, a named character vector of colors used for annotation.
`cex.col`	Final value of the `cex.col` argument.
`cex.row`	Final value of the `cex.row` argument.
`mai.left`	Final value of the `mai.left` argument.
`mai.bottom`	Final value of the `mai.bottom` argument.

Author(s)

Sylvain Mareschal

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)[,1:100]
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Simple heat map
  heat.map(expr)
  
  # With annotation (row named data.frame)
  side <- data.frame(group, row.names=rownames(expr))
  heat.map(expr, side=side)
# Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)[,1:100]
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Simple heat map
  heat.map(expr)
  
  # With annotation (row named data.frame)
  side <- data.frame(group, row.names=rownames(expr))
  heat.map(expr, side=side)

Plots a heat map color scale, for legend

Description

This function plots a color scale using a custom color palette, to legend heat.map derivated functions.

Usage

  heat.scale(zlim, col.heatmap, at = -10:10, labels = NULL, horiz = TRUE,
    robust = FALSE, customMar = FALSE, title=NA)
heat.scale(zlim, col.heatmap, at = -10:10, labels = NULL, horiz = TRUE,
    robust = FALSE, customMar = FALSE, title=NA)

Arguments

`zlim`	Numeric vector of length 2, minimum and maximum of values in the palette. Should correspond to `zlim` in `heat.map`, consider to use `heat.map` invisible return to get special values.
`col.heatmap`	Character vector of colors used in the heat map. Should correspond to `col.heatmap` in `heat.map`, consider to use `heat.map` invisible return to get special values.
`at`	Numeric vector, values shown in the axis.
`labels`	Character vector as long as `at`, defining the values to show at `at`.
`horiz`	Single logical value, whether to plot an horizontal or a vertical scale.
`robust`	Single logical value, whether to legend `median` and `mad` or `mean` and `sd`. Should correspond to `heat.map` `norm.robust` value.
`customMar`	Single logical value, whether to skip the call to `par` to set `mar` or not.
`title`	Single character value, the axis title to use (`NA` for automatic generation).

Author(s)

Sylvain Mareschal

Linear Predictor Score fitting

Description

This function trains a Linear Predictor Score model, given pre-computed coefficients. It uses data with known classes to fit the model.

It has numerous way to be called, and all the arguments are not mandatory. See the 'Examples' section.

Usage

  LPS(data, coeff, response, k, threshold, formula, method = "fdr", ...)
LPS(data, coeff, response, k, threshold, formula, method = "fdr", ...)

Arguments

`data`	Continuous data used to retrieve classes, as a `data.frame` or `matrix`, with samples in rows and features (genes) in columns. Rows and columns should be named. Some precautions must be taken concerning data normalization, see the corresponding section below.
`coeff`	Pre-computed coefficients for the model, as returned by `LPS.coeff` (see there for format details).
`response`	Already known classes for the samples provided in `data`, preferably as a two-level `factor`. Can be missing if a `formula` with a response element is provided, but this argument precedes.
`k`	Single `integer` value, amount of features to include in the model, in decreasing order of coefficient. Can be missing if `threshold` or `formula` are provided, but this argument precedes other both of them.
`threshold`	Single `numeric` value, p-value threshold to apply for feature selection. Can be missing if `k` or `formula` are provided, but `k` precedes on it and it precedes on `formula`.
`formula`	A `formula` object, describing the model to fit (several templates are handled, see 'Examples'). The formula response element (before the "~" sign) can replace the `response` argument if it is not provided. The variables (after the "~" sign) can be a single integer (standing for the `k` argument), a single numeric (standing for the `threshold` argument) or a sum of feature names to use directly. "." is also handled in the usual way (all `data` columns), and "1" is a more efficient way to refer to all numeric columns of `data`.
`method`	Single character value, to be passed to `p.adjust` when `threshold` is provided.
`...`	Further arguments are passed to `model.frame` if `response` is missing (thus defined via `formula`). `subset` and `na.action` may be particularly useful for cross-validation schemes, see `model.frame.default` for details. `subset` is always handled but masked in "..." for compatibility reasons.

Value

An object of (S3) class "LPS" :

`coeff`	Named numeric vector, the coefficients used in the model.
`classes`	Character vector, the labels of the two groups to be predicted.
`scores`	List of two numeric vectors, training dataset scores sorted by group.
`means`	Numeric vector, score means of each group in the training dataset.
`sds`	Numeric vector, score `sd` of each group in the training dataset.
`ovl`	Numeric value, overlapping coefficient as returned by `OVL`.
`k`	Integer value, amount of features selected in the model (if relevant).
`p.threshold`	Numeric value, threshold used for feature selection (if relevant).
`p.method`	Character value, p-value correction used for feature selection (if relevant).

Normalization

As expression values are directly used in the score, gene centering and scaling are strongly recommended. For Affymetrix raw expression values (strictly positive, linear and absolute), Wright et al. suggests a multiplicative centering on a median of 1000 followed by a log2 transformation. For log-ratio, gene centering and scaling should not be necessary, as they are naturally 0-centered.

Time efficiency

Using a numeric matrix as data and a factor as response is the fastest way to compute coefficients, if time consumption matters (as in cross-validation schemes). formula is there only for consistency with R modeling functions, and to provide response, k or threshold in a single way.

Author(s)

Sylvain Mareschal

References

Radmacher MD, McShane LM, Simon R. A paradigm for class prediction using gene expression profiles. J Comput Biol. 2002;9(3):505-11.

Wright G, Tan B, Rosenwald A, Hurt EH, Wiestner A, Staudt LM. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6.

Bohers E, Mareschal S, Bouzelfen A, Marchand V, Ruminy P, Maingonnat C, Menard AL, Etancelin P, Bertrand P, Dubois S, Alcantara M, Bastard C, Tilly H, Jardin F. Targetable activating mutations are very frequent in GCB and ABC diffuse large B-cell lymphoma. Genes Chromosomes Cancer. 2014 Feb;53(2):144-53.

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Coefficients
  coeff <- LPS.coeff(data=expr, response=group)
  
  
  # 10 best features (straightforward)
  m <- LPS(data=expr, coeff=coeff, response=group, k=10)
  
  # 10 best features (formula)
  ### 'k' MUST be an integer, or will be understood as a 'threshold'
  ### Numbers are "numeric", enforce integer with "L" or "as.integer"
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~10L)
  k <- as.integer(10)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~k)
  
  # FDR threshold
  thr <- 0.01
  m <- LPS(data=expr, coeff=coeff, response=group, threshold=thr)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~0.01)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~thr)
  
  # Custom model
  m <- LPS(data=expr, coeff=coeff[ c("27481","17013") ,], response=group, k=2)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~`27481`+`17013`)
  ### Notice backticks in formula for syntactically invalid names
  
  # Complete model
  m <- LPS(data=expr, coeff=coeff, response=group, k=ncol(expr))
  m <- LPS(data=expr, coeff=coeff, response=group, threshold=1)
  ### m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~.)
  ### The last is correct but (really) slow on large datasets
# Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Coefficients
  coeff <- LPS.coeff(data=expr, response=group)
  
  
  # 10 best features (straightforward)
  m <- LPS(data=expr, coeff=coeff, response=group, k=10)
  
  # 10 best features (formula)
  ### 'k' MUST be an integer, or will be understood as a 'threshold'
  ### Numbers are "numeric", enforce integer with "L" or "as.integer"
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~10L)
  k <- as.integer(10)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~k)
  
  # FDR threshold
  thr <- 0.01
  m <- LPS(data=expr, coeff=coeff, response=group, threshold=thr)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~0.01)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~thr)
  
  # Custom model
  m <- LPS(data=expr, coeff=coeff[ c("27481","17013") ,], response=group, k=2)
  m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~`27481`+`17013`)
  ### Notice backticks in formula for syntactically invalid names
  
  # Complete model
  m <- LPS(data=expr, coeff=coeff, response=group, k=ncol(expr))
  m <- LPS(data=expr, coeff=coeff, response=group, threshold=1)
  ### m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~.)
  ### The last is correct but (really) slow on large datasets

Linear Predictor Score coefficient computation

Description

As Linear Predictor Score coefficients are genuinely t statistics, this function provides a faster implementation for large datasets than using t.test.

Usage

  LPS.coeff(data, response, formula = ~1, type = c("t", "limma"),
    p.value = TRUE, log = FALSE, weighted = FALSE, ...)
LPS.coeff(data, response, formula = ~1, type = c("t", "limma"),
    p.value = TRUE, log = FALSE, weighted = FALSE, ...)

Arguments

`data`	Continuous data used to retrieve classes, as a `data.frame` or `matrix`, with samples in rows and features (genes) in columns. Rows and columns should be named. `NA` values are silently ignored. Some precautions must be taken concerning data normalization, see the corresponding section in `LPS` manual page.
`response`	Already known classes for the samples provided in `data`, preferably as a two-level `factor`. Can be missing if a `formula` with a response element is provided, but this argument precedes.
`formula`	A `formula` object, describing the features to consider in `data`. The formula response element (before the "~" sign) can replace the `response` argument if it is not provided. The features can be enumerated in the variable section of the formula (after the "~" sign). "." is also handled in the usual way (all `data` columns), and "1" is a more efficient way to refer to all numeric columns of `data`.
`type`	Single character value, "t" to compute genuine t statistics (unequal variances and unpaired samples) or "limma" to use the lmFit() and eBayes() t statistics from this microarray oriented Bioconductor package.
`p.value`	Single logical value, whether to compute (two-sided) p-values or not.
`log`	Single logical value, whether to log-transform t or not (sign will be preserved). Original description of the LPS does not include log-transformation, but it may be useful to not over-weight discriminant genes in large series. Values between -1 and 1 are transformed to 0 to avoid sign shifting, as it generally comes with non significant p-values.
`weighted`	Single logical value, whether to divide t (or log-transformed t) by gene mean or not. We recommend to normalize data only by samples and use `weighted = TRUE` to include gene centering in the model, rather than centering and scaling genes by normalizing independantly each series as Wright et al. did.
`...`	Further arguments are passed to `model.frame` if `response` is missing (thus defined via `formula`). `subset` and `na.action` may be particularly useful for cross-validation schemes, see `model.frame.default` for details. `subset` is always handled but masked in "..." for compatibility reasons.

Value

Always returns a row named numeric matrix, with a "t" column holding statistics computed. If p.value is TRUE, a second "p.value" column is added.

Note

Using a numeric matrix as data and a factor as response is the fastest way to compute coefficients, if time consumption matters (as in cross-validation schemes). formula was added only for consistency with other R modeling functions, and eventually to subset features to compute coefficients for.

Author(s)

Sylvain Mareschal

References

http://www.bioconductor.org/packages/release/bioc/html/limma.html

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  
  # All features, all samples
  k <- LPS.coeff(data=expr, response=group)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr))
  ### LPS.coeff(formula=group~., data=as.data.frame(expr), na.action=na.pass)
  ### The last is correct but (really) slow on large datasets
  
  # Feature subset, all samples
  k <- LPS.coeff(data=expr[, c("27481","17013") ], response=group)
  k <- LPS.coeff(formula=group~`27481`+`17013`, data=as.data.frame(expr))
  ### Notice backticks in formula for syntactically invalid names
  
  # All features, sample subset
  training <- rosenwald.cli$set == "Training"
  ### training <- sample.int(nrow(expr), 10)
  ### training <- which(rosenwald.cli$set == "Training")
  ### training <- rownames(subset(rosenwald.cli, set == "Training"))
  k <- LPS.coeff(data=expr, response=group, subset=training)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), subset=training)

  # NA handling by model.frame()
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), na.action=na.omit)
# Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  
  # All features, all samples
  k <- LPS.coeff(data=expr, response=group)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr))
  ### LPS.coeff(formula=group~., data=as.data.frame(expr), na.action=na.pass)
  ### The last is correct but (really) slow on large datasets
  
  # Feature subset, all samples
  k <- LPS.coeff(data=expr[, c("27481","17013") ], response=group)
  k <- LPS.coeff(formula=group~`27481`+`17013`, data=as.data.frame(expr))
  ### Notice backticks in formula for syntactically invalid names
  
  # All features, sample subset
  training <- rosenwald.cli$set == "Training"
  ### training <- sample.int(nrow(expr), 10)
  ### training <- which(rosenwald.cli$set == "Training")
  ### training <- rownames(subset(rosenwald.cli, set == "Training"))
  k <- LPS.coeff(data=expr, response=group, subset=training)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), subset=training)

  # NA handling by model.frame()
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), na.action=na.omit)

Overlap quantification for LPS object

Description

Quantify the overlap between gaussian distributions of the two group scores, to assess model efficiency (best models should not overlap, to prevent from false discovery).

Usage

  OVL(means, sds, cutoff=1e-4, n=1e4)
OVL(means, sds, cutoff=1e-4, n=1e4)

Arguments

`means`	Numeric vector of two values, the means of the gaussian distributions.
`sds`	Numeric vector of two values, the standard deviations of the gaussian distributions.
`cutoff`	Single numeric value, minimal quantile for integration range definition (distributions will be considered between their `cutoff` and `1 - cutoff` quantiles only). The lesser it is, the more precise the returned value will be.
`n`	Single integer value, the amount of equi-distant points to use for the computation. The greater it is, the more precise the returned value will be.

Value

Returns the proportion of the overlap between the two gaussian distributions N1 and N2, i.e. min(N1, N2) / (N1 + N2).

Author(s)

Sylvain Mareschal

Examples

  # Full overlap between identical distributions
  OVL(c(0,0), c(1,1))
  
  # Increasing shift
  OVL(c(0,1), c(1,1))
  OVL(c(0,2), c(1,1))
  OVL(c(0,3), c(1,1))
  OVL(c(0,10), c(1,1))
# Full overlap between identical distributions
  OVL(c(0,0), c(1,1))
  
  # Increasing shift
  OVL(c(0,1), c(1,1))
  OVL(c(0,2), c(1,1))
  OVL(c(0,3), c(1,1))
  OVL(c(0,10), c(1,1))

Plot method for LPS objects

Description

This function plots the distributions of the LPS scores in each group for a fitted LPS object.

Usage

  ## S3 method for class 'LPS'
plot(x, y, method=c("Wright", "Radmacher", "exact"), threshold = 0.9,
    values = FALSE, col.classes = c("#FFCC00", "#1144CC"), xlim, yaxt = "s",
    xlab = "LPS", ylab, las = 0, lwd = 2,...)
## S3 method for class 'LPS'
plot(x, y, method=c("Wright", "Radmacher", "exact"), threshold = 0.9,
    values = FALSE, col.classes = c("#FFCC00", "#1144CC"), xlim, yaxt = "s",
    xlab = "LPS", ylab, las = 0, lwd = 2,...)

Arguments

`x`	An object of class `"LPS"`, as returned by `LPS`.
`y`	Single character value defining y axis : "density" or (bayesian) "probability".
`method`	Single character value, the method to use for predictions. See `predict.LPS`.
`threshold`	Single numeric value, the confidence threshold to use for the "gray zone" (scores for which none of the two groups can be assigned with a probability greater than this threshold). See `predict.LPS`.
`values`	Single logical value, whether to plot individual scores from the training series or not.
`col.classes`	Character vector of two values giving to each class a distinct color.
`xlim`	To be passed to `plot`, see `plot.default`.
`yaxt`	To be passed to `plot`, see `par`.
`xlab`	To be passed to `plot`, see `plot.default`.
`ylab`	To be passed to `plot`, see `plot.default`.
`las`	To be passed to `plot`, see `par`.
`lwd`	To be passed to `plot`, see `par`.
`...`	Further arguments to be passed to `plot` or `par`.

Author(s)

Sylvain Mareschal

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Coefficients
  coeff <- LPS.coeff(data=expr, response=group)
  
  # 10 best features model
  m <- LPS(data=expr, coeff=coeff, response=group, k=10)
  
  
  # Distributions of scores in each group
  plot(m, "density")
  
  # Probability for each group along the score axis
  plot(m, "probability", yaxt="s")
# Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Coefficients
  coeff <- LPS.coeff(data=expr, response=group)
  
  # 10 best features model
  m <- LPS(data=expr, coeff=coeff, response=group, k=10)
  
  
  # Distributions of scores in each group
  plot(m, "density")
  
  # Probability for each group along the score axis
  plot(m, "probability", yaxt="s")

Predict method for LPS objects

Description

This function allow predictions to be made from a fitted LPS model and a new dataset.

It can also plot a gene expression heatmap to visualize results of the prediction.

Usage

  ## S3 method for class 'LPS'
predict(object, newdata, type=c("class", "probability", "score"),
    method = c("Wright", "Radmacher", "exact"), threshold = 0.9, na.rm = TRUE,
    subset = NULL, col.lines = "#FFFFFF", col.classes = c("#FFCC00", "#1144CC"),
    plot = FALSE, side = NULL, cex.col = NA, cex.row = NA, mai.left = NA,
    mai.bottom = NA, mai.right = 1, mai.top = 0.1, side.height = 1, side.col = NULL,
    col.heatmap = heat(), zlim = "0 centered", norm = c("rows", "columns", "none"),
    norm.robust = FALSE, customLayout = FALSE, getLayout = FALSE, ...)
## S3 method for class 'LPS'
predict(object, newdata, type=c("class", "probability", "score"),
    method = c("Wright", "Radmacher", "exact"), threshold = 0.9, na.rm = TRUE,
    subset = NULL, col.lines = "#FFFFFF", col.classes = c("#FFCC00", "#1144CC"),
    plot = FALSE, side = NULL, cex.col = NA, cex.row = NA, mai.left = NA,
    mai.bottom = NA, mai.right = 1, mai.top = 0.1, side.height = 1, side.col = NULL,
    col.heatmap = heat(), zlim = "0 centered", norm = c("rows", "columns", "none"),
    norm.robust = FALSE, customLayout = FALSE, getLayout = FALSE, ...)

Arguments

`object`	An object of class `"LPS"`, as returned by `LPS`.
`newdata`	Continuous data used to retrieve classes, as a `data.frame` or `matrix`, with samples in rows and features (genes) in columns. Rows and columns should be named. It can also be a named numeric vector of already computed scores. Some precautions must be taken concerning data normalization, see the corresponding section in `LPS` manual page.
`type`	Single character value, return type of the predictions to be made ("class", "probability" or "score"). See 'Value' section.
`method`	Single character value, the method to use to make predictions ("Wright", "Radmacher" or "exact"). See 'Details' section.
`threshold`	Threshold to use for class prediction. "Wright" method was designed with 0.9, "Radmacher" method makes no use of the threshold.
`na.rm`	Single logical value, if TRUE samples with one or many `NA` features will be scored too (concerned feature is removed for the concerned sample, which might be discutable).
`subset`	A subsetting vector to apply on `newdata` rows. See `[` for handled values.
`col.lines`	If `graph` is TRUE, a single character value to be used for line drawing on the heatmap.
`col.classes`	If `graph` is TRUE, a character vector of two values giving to each class a distinct color.
`plot`	To be passed to `heat.map`.
`side`	To be passed to `heat.map`.
`cex.col`	To be passed to `heat.map`.
`cex.row`	To be passed to `heat.map`.
`mai.left`	To be passed to `heat.map`.
`mai.bottom`	To be passed to `heat.map`.
`mai.right`	To be passed to `heat.map` (used to plot score coefficients).
`mai.top`	To be passed to `heat.map`.
`side.height`	To be passed to `heat.map`.
`side.col`	To be passed to `heat.map`.
`col.heatmap`	To be passed to `heat.map`.
`zlim`	To be passed to `heat.map`.
`norm`	To be passed to `heat.map`.
`norm.robust`	To be passed to `heat.map`.
`customLayout`	To be passed to `heat.map`.
`getLayout`	To be passed to `heat.map`.
`...`	Ignored, just there to match the `predict` generic function.

Details

The "Compound covariate predictor" from Radmacher et al. (method = "Radmacher") simply assign each sample to the closest group (comparing the sample score to the mean scores of each group in the training dataset).

The "Linear Predictor Score" from Wright et al. (method = "Wright") modelizes scores in each training sub-group with a distinct gaussian distribution, and computes the probability for a sample to be in one of them or the other using a bayesian rule.

The "exact" mode is still under development and should not be used.

Value

For a "class" type, returns a character vector with group assignment for each new sample (possibly NA), named according to data row names.

For a "probability" type, returns a numeric matrix with two columns (probabilities to be in each group) and a row for each new sample, row named according to data row names and column named according to the group labels.

For a "score" type, returns a numeric vector with LPS score for each new sample, named according to data row names. Notice the score is the same for all methods.

If plot is TRUE, returns the list returned by heat.map, with data described above in the first unammed element.

Author(s)

Sylvain Mareschal

References

Radmacher MD, McShane LM, Simon R. A paradigm for class prediction using gene expression profiles. J Comput Biol. 2002;9(3):505-11.

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Coefficients
  coeff <- LPS.coeff(data=expr, response=group)
  
  # 10 best features model
  m <- LPS(data=expr, coeff=coeff, response=group, k=10)
  
  
  # Class prediction plot
  predict(m, expr, plot=TRUE)
  
  # Wright et al. class prediction
  table(
    group,
    prediction = predict(m, expr),
    exclude = NULL
  )
  
  # More stringent threshold
  table(
    group,
    prediction = predict(m, expr, threshold=0.99),
    exclude = NULL
  )
  
  # Radmacher et al. class prediction
  table(
    group,
    prediction = predict(m, expr, method="Radmacher"),
    exclude = NULL
  )
  
  # Probabilities
  predict(m, expr, type="probability", method="Wright")
  predict(m, expr, type="probability", method="Radmacher")
  predict(m, expr, type="probability", method="exact")
  
  # Probability plot
  predict(m, expr, type="probability", plot=TRUE)
  
  # Annotated probability plot
  side <- data.frame(group, row.names=rownames(expr))
  predict(m, expr, side=side, type="probability", plot=TRUE)
  
  # Score plot
  predict(m, expr, type="score", plot=TRUE)
# Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Coefficients
  coeff <- LPS.coeff(data=expr, response=group)
  
  # 10 best features model
  m <- LPS(data=expr, coeff=coeff, response=group, k=10)
  
  
  # Class prediction plot
  predict(m, expr, plot=TRUE)
  
  # Wright et al. class prediction
  table(
    group,
    prediction = predict(m, expr),
    exclude = NULL
  )
  
  # More stringent threshold
  table(
    group,
    prediction = predict(m, expr, threshold=0.99),
    exclude = NULL
  )
  
  # Radmacher et al. class prediction
  table(
    group,
    prediction = predict(m, expr, method="Radmacher"),
    exclude = NULL
  )
  
  # Probabilities
  predict(m, expr, type="probability", method="Wright")
  predict(m, expr, type="probability", method="Radmacher")
  predict(m, expr, type="probability", method="exact")
  
  # Probability plot
  predict(m, expr, type="probability", plot=TRUE)
  
  # Annotated probability plot
  side <- data.frame(group, row.names=rownames(expr))
  predict(m, expr, side=side, type="probability", plot=TRUE)
  
  # Score plot
  predict(m, expr, type="score", plot=TRUE)

Rosenwald et al. Lymphochip data

Description

This dataset contains 60 Diffuse Large B-Cell Lymphomas analysed on Lymphochip microarrays, as published by Rosenwald et al. The "Germinal Center B-cell like" and "Activated B-Cell like" subtypes, as determined by hierarchical clustering, were predicted by a LPS approach in Wright et al.

To minimize package size, values were rounded at 3 decimals and only 60 DLBCL from the 240 series were randomly selected (40 from the "Training" set, 20 from the "Validation" set), excluding "Type III" sub-types.

Usage

data(rosenwald)data(rosenwald)

Format

rosenwald.expr is a numeric matrix of expression values, with probes in rows and samples in columns. Both dimensions are named, probes by there "UNIQID" and samples by there "LYM numbers". Many NA values are present.

rosenwald.cli is a data.frame with a row for each sample, and 4 factor columns described below. Rows are named by samples "LYM numbers", in the same order than rosenwald.expr.

set: the "Training" or "Validation" set the sample comes from.
group: the DLBCL sub-type that is to be predicted ("GCB" or "ABC").
follow.up: follow-up of the patient, in years.
status: status of the patient at the end of the follow-up ("Dead" or "Alive").

Source

http://llmpp.nih.gov/DLBCL/

References

Rosenwald A et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002 Jun 20;346(25):1937-47.

Wright G et al. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6.

Produces visual representation of survival data

Description

This function generates color shades for each individual, according to their respective right-censored survival data (event occurred or not, after which follow-up time). This can prove useful to annotate heat maps with survival data.

Two color scales are used, one for right-censored individuals (lost of sight before the event occurs, yellow with default colors) and an other for individual with observed events (death, relapse ... black in default colors). Shades are generated according to their impact : fast events and long follow-ups without event have strong colors, while late events and short follow-up without event are light-colored.

Usage

  surv.colors(time, event, eventColors = c("#000000", "#CCCCCC"),
    censColors = c("#FFFFEE", "#FFDD00"))
surv.colors(time, event, eventColors = c("#000000", "#CCCCCC"),
    censColors = c("#FFFFEE", "#FFDD00"))

Arguments

`time`	Numeric vector, the follow-up times of each individual (see `Surv` in the `survival` package).
`event`	Logical vector, whether an event (death, relapse ...) occured at the end of each individual follow-up or not (see `Surv` in the `survival` package).
`eventColors`	Character vector of length 2, the boundaries of the color scale to generate for individuals with events.
`censColors`	Character vector of length 2, the boundaries of the color scale to generate for right-censored individuals.

Value

Returns a character vector, named according to time names.

Author(s)

Sylvain Mareschal

Examples

  # Rosenwald's dataset (hand-picked prognostic probes)
  data(rosenwald)
  probes <- c("30580", "16006", "32315", "16978", "26588")
  expr <- t(rosenwald.expr[ probes ,])
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Survival colors
  surv <- with(rosenwald.cli, surv.colors(time=follow.up, event=status=="Dead"))
  
  # Color scale legend
  with(rosenwald.cli, surv.scale(time=follow.up, event=status=="Dead"))
  
  # Annotated clustering
  side <- data.frame(OS=surv, row.names=rownames(rosenwald.cli))
  clusterize(expr, side=side)
# Rosenwald's dataset (hand-picked prognostic probes)
  data(rosenwald)
  probes <- c("30580", "16006", "32315", "16978", "26588")
  expr <- t(rosenwald.expr[ probes ,])
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Survival colors
  surv <- with(rosenwald.cli, surv.colors(time=follow.up, event=status=="Dead"))
  
  # Color scale legend
  with(rosenwald.cli, surv.scale(time=follow.up, event=status=="Dead"))
  
  # Annotated clustering
  side <- data.frame(OS=surv, row.names=rownames(rosenwald.cli))
  clusterize(expr, side=side)

Plots a survival color scale, for legend

Description

This function plots a color scale using a custom color palette, to legend surv.colors annotations.

Usage

  surv.scale(time, event, eventColors = c("#000000", "#CCCCCC"),
    censColors = c("#FFFFEE", "#FFDD00"))
surv.scale(time, event, eventColors = c("#000000", "#CCCCCC"),
    censColors = c("#FFFFEE", "#FFDD00"))

Arguments

`time`	Numeric vector, the follow-up times of each individual (see `Surv` in the `survival` package).
`event`	Logical vector, whether an event (death, relapse ...) occured at the end of each individual follow-up or not (see `Surv` in the `survival` package).
`eventColors`	Character vector of length 2, the boundaries of the color scale to generate for individuals with events.
`censColors`	Character vector of length 2, the boundaries of the color scale to generate for right-censored individuals.

Author(s)

Sylvain Mareschal

Examples

  # Rosenwald's dataset (hand-picked prognostic probes)
  data(rosenwald)
  probes <- c("30580", "16006", "32315", "16978", "26588")
  expr <- t(rosenwald.expr[ probes ,])
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Survival colors
  surv <- with(rosenwald.cli, surv.colors(time=follow.up, event=status=="Dead"))
  
  # Annotated clustering
  side <- data.frame(OS=surv, row.names=rownames(rosenwald.cli))
  clusterize(expr, side=side)
  
  # Color scale legend
  with(rosenwald.cli, surv.scale(time=follow.up, event=status=="Dead"))
# Rosenwald's dataset (hand-picked prognostic probes)
  data(rosenwald)
  probes <- c("30580", "16006", "32315", "16978", "26588")
  expr <- t(rosenwald.expr[ probes ,])
  
  # NA imputation (feature's mean to minimize impact)
  f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
  expr <- apply(expr, 2, f)
  
  # Survival colors
  surv <- with(rosenwald.cli, surv.colors(time=follow.up, event=status=="Dead"))
  
  # Annotated clustering
  side <- data.frame(OS=surv, row.names=rownames(rosenwald.cli))
  clusterize(expr, side=side)
  
  # Color scale legend
  with(rosenwald.cli, surv.scale(time=follow.up, event=status=="Dead"))

Package 'LPS'

Help Index

Hierarchical clustering heat maps

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Heatmap palette generation

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Enhanced heat map ploting

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plots a heat map color scale, for legend

Description

Usage

Arguments

Author(s)

See Also

Linear Predictor Score fitting

Description

Usage

Arguments

Value

Normalization

Time efficiency

Author(s)

References

See Also

Examples

Linear Predictor Score coefficient computation

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Overlap quantification for LPS object

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot method for LPS objects

Description

Usage

Arguments

Author(s)

See Also

Examples

Predict method for LPS objects

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Rosenwald et al. Lymphochip data

Description