Shaoxun Yuan
Affiliation: School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, China
Email: yuanshaoxun@njucm.edu.cn
SEtoolbox is an R package that operates, analyzes and visualizes SummarizedExperiment objects.
To install the SEtoolbox package, you first need to install the devtools package.
Run the following command in your R console:
install.packages("devtools")
devtools::install_github("shaoxunyuan/SEtoolbox")
During this tutorial, we might need to use a few additional packages.
Since we specified dependencies = TRUE when installing SEtoolbox package, these additional packages have already been installed.
We can load them directly.
library(SummarizedExperiment)
library(tidyverse)
library(plyr)
library(dplyr)
library(reshape2)
library(DESeq2)
For this tutorial, SEtoolbox will be working with a SummarizedExperiment object.
Functions in SEtoolbox can be obtain using
help(package="SEtoolbox")
Combine multiple SummarizedExperiment objects.
This function merges multiple SummarizedExperiment objects based on the specified merge type (intersection or union) for all assays present in the input list.
SE_combine(se_list, merge_type = "intersection")
se_list: A list of SummarizedExperiment objects to be combined (se_list = list(SE1,SE2,SE3))
merge_type: A character string specifying the type of merge to perform. Options are:
1.intersection (default): Keep only common features across all objects.
2.union: Keep all features.
Batch effect correct using Combat in sva package for SummarizedExperiment object.
SE_combat(SE, col_for_combat, col_for_compare)
`SE`: A `SummarizedExperiment` object containing the data to be imputed.
Fill missing values in a SummarizedExperiment object.
This function imputes missing values (NA) in the given SummarizedExperiment object using specified methods.
Multiple imputation techniques can be utilized to handle missing values, ensuring the robustness of subsequent analyses.
SE_impute(SE, assayname = "TPM", group = "group", ZerosAsNA = FALSE, RemoveNA = TRUE,
cutoff = 20, method = c("none", "LOD", "half_min", "median", "mean", "min", "knn", "rf", "global_mean", "svd", "QRILC"),
LOD = NULL, knum = 10)
SE: A SummarizedExperiment object containing the data to be imputed.
assayname: The name of the SummarizedExperiment assay, specifying the type of data to be imputed.
group: A character string specifying the grouping variable in the sample data.
ZerosAsN: A logical value indicating whether to treat zeros as NA. Default is FALSE.
RemoveNA: A logical value indicating whether to remove samples with a high percentage of NA values based on the cutoff. Default is TRUE.
cutoff: A numerical value representing the percentage cutoff for NA samples. Default is 20.
method: A character string specifying the imputation method to use. Options include:
1.none (default): No imputation, replace NA with zero.
2.LOD: Replace NA with the limit of detection (LOD).
3.half_min: Replace NA with half of the minimum value.
4.median: Replace NA with the median value.
5.mean: Replace NA with the mean value.
6.min: Replace NA with the minimum value.
7.knn: K-nearest neighbors imputation.
8.rf: Random forest imputation.
9.global_mean: Global mean imputation.
10.svd: Singular value decomposition imputation.
11.QRILC: Quantile regression imputation.
LOD: A numerical value representing the limit of detection (used for the LOD imputation method). Default is NULL.
knum: An integer value representing the number of neighbors in the KNN imputation method. Default is 10.
Calculate detection ratio and update SummarizedExperiment object’s rowData.
This function computes the detection ratio of expression data and updates the rowData of the provided SummarizedExperiment object with detection sample counts and ratios. It also generates a histogram of detection ratios.
SE_detectratio(SE, assayname = "TPM")
SE: A SummarizedExperiment object containing expression data.
assayname: The name of the assay to be used for calculations. Default is "TPM".
Perform differential expression analysis using DESeq2.
This function performs differential expression analysis on count data contained in a SummarizedExperiment object using the DESeq2 package.
SE_DEseq2(SE, assayname = "Count", groupname = "group")
SE: An SummarizedExperiment object containing count data.
assayname: The name of the assay to use for the analysis. Default is "Count".
groupname: The name of the column in colData(SE) that contains the factor for grouping samples. Default is "group".
Generates two plots illustrating the distribution of non-zero entries in a SummarizedExperiment object.
This function creates a bar plot to display the count of non-zero entries for each feature (gene) in the specified expression matrix and a histogram showing the distribution of the fraction of non-zero entries across samples.
It allows for the option to treat zeros as NA, thereby excluding them from the count.
SE_distribution(SE, assayname = “TPM”, ZeroasNA = TRUE)
SE: An SummarizedExperiment object containing the expression data.
assayname: A string indicating the assay name to use from the SummarizedExperiment object. The default is “TPM”.
ZeroasNA: A logical value indicating whether zeros should be treated as NA. The default is TRUE.
Generates a boxplot for specified features in a SummarizedExperiment object.
This function creates a boxplot for the specified features (genes) within a given SummarizedExperiment object.
It supports normalization and grouping of the data.
SE_boxplot(SE, feature_of_interest, assayname = "TPM", group_col = NA, normalization = "none")
SE: A SummarizedExperiment object.
feature_of_interest: Character vector of gene identifiers.
assayname: The assay name in the SummarizedExperiment object.
group_col: Column name in colData(SE) for grouping.
normalization: Normalization method (“scale”, “log”, or “none”).
Generate PCA plots.
This function takes a SummarizedExperiment object, computes PCA, and visualizes the results.
SE_PCAplot(SE, assayname = "TPM", groupname = "group", outlier_threshold = 2, scale = TRUE, feature_of_interesting = NULL)
SE: SummarizedExperiment object containing gene expression data.
assayname: Name of the expression data, default is “TPM”.
groupname: Name of the grouping column, default is “group”.
outlier_threshold: Outlier filtering threshold, default is 2.
scale: Whether to standardize the data, default is TRUE.
feature_of_interesting: Vector of specific feature names; if NULL, all features are used, default is NULL.
Filter features in SummarizedExperiment object based on various criteria.
This function filters features (genes/proteins) in a SummarizedExperiment object based on various criteria including expression level, detection ratio, variance, and coefficient of variation. It provides flexible filtering options to remove low-quality features before downstream analysis.
SE_filter(SE, assayname = "TPM", min_expr = 0, min_detectratio = 0, min_variance = 0, min_cv = 0, group_colname = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use for filtering. The default value is “TPM”.
min_expr: Numeric value. Minimum expression threshold. Features with mean expression below this value will be removed. Default is 0.
min_detectratio: Numeric value between 0 and 1. Minimum detection ratio threshold. Features with detection ratio below this value will be removed. Default is 0.
min_variance: Numeric value. Minimum variance threshold. Features with variance below this value will be removed. Default is 0.
min_cv: Numeric value. Minimum coefficient of variation (CV) threshold. Features with CV below this value will be removed. Default is 0.
group_colname: A string representing the column name in colData that contains group information. This is optional; if provided, filtering will be performed within each group separately.
Subset SummarizedExperiment object by features or samples.
This function subsets a SummarizedExperiment object based on feature names, sample names, or conditions from colData. It provides flexible subsetting options to extract specific subsets of data for downstream analysis.
SE_subset(SE, features = NULL, samples = NULL, condition = NULL, exclude_features = NULL, exclude_samples = NULL)
SE: A SummarizedExperiment object containing gene expression data.
features: A character vector of feature names to keep. If NULL, all features are kept. Default is NULL.
samples: A character vector of sample names to keep. If NULL, all samples are kept. Default is NULL.
condition: A named list or vector for conditional subsetting. Names should be column names in colData, and values should be the values to keep. Default is NULL.
exclude_features: A character vector of feature names to exclude. Default is NULL.
exclude_samples: A character vector of sample names to exclude. Default is NULL.
Normalize expression data in SummarizedExperiment object.
This function normalizes expression data in a SummarizedExperiment object using various normalization methods including TPM, FPKM, RPKM, log2 transformation, quantile normalization, and library size normalization.
SE_normalize(SE, assayname = "Counts", method = "log2", pseudocount = 1, gene_length = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to normalize. The default value is “Counts”.
method: A character string specifying the normalization method. Options include “TPM”, “FPKM”, “RPKM”, “log2”, “quantile”, “library_size”, “median”, “upper_quartile”. Default is “log2”.
pseudocount: Numeric value to add before log transformation to avoid log(0). Default is 1.
gene_length: A numeric vector of gene lengths in base pairs. Required for TPM/FPKM/RPKM methods.
Perform differential expression analysis using DESeq2.
This function performs differential expression analysis on count data contained in a SummarizedExperiment object using the DESeq2 package.
SE_DEseq2(SE, assayname = "Count", groupname = "group")
SE: An SummarizedExperiment object containing count data.
assayname: The name of the assay to use for the analysis. Default is "Count".
groupname: The name of the column in colData(SE) that contains the factor for grouping samples. Default is "group".
Perform differential expression analysis using limma.
This function performs differential expression analysis using the limma package, which is suitable for both microarray and RNA-seq data. It uses linear models and empirical Bayes methods to identify differentially expressed features.
SE_limma(SE, assayname = "log2", group_colname = "group", contrast = NULL, adjust_method = "BH", pvalue_threshold = 0.05, logFC_threshold = 1)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use for analysis. The default value is “log2”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
contrast: A character string specifying the contrast for differential analysis (e.g., “Treatment-Control”). Default is NULL, which will compare the first two groups.
adjust_method: A character string specifying the p-value adjustment method. Options include “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”. Default is “BH”.
pvalue_threshold: Numeric value for p-value threshold. Default is 0.05.
logFC_threshold: Numeric value for log2 fold change threshold. Default is 1.
Perform differential expression analysis using edgeR.
SE_edgeR(SE, assayname = "Counts", group_colname = "group", contrast = NULL, adjust_method = "BH", pvalue_threshold = 0.05, logFC_threshold = 1)
SE: A SummarizedExperiment object containing count data.
assayname: A string indicating which assay to use for analysis. The default value is “Counts”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
contrast: A character string specifying the contrast for differential analysis (e.g., “Treatment-Control”). Default is NULL, which will compare the first two groups.
adjust_method: A character string specifying the p-value adjustment method. Options include “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”. Default is “BH”.
pvalue_threshold: Numeric value for p-value threshold. Default is 0.05.
logFC_threshold: Numeric value for log2 fold change threshold. Default is 1.
Identify highly variable genes in SummarizedExperiment object.
This function identifies highly variable genes (HVGs) in a SummarizedExperiment object using various methods including variance, coefficient of variation (CV), and the method from the scran package.
SE_HVG(SE, assayname = "TPM", method = "variance", n_top_genes = 1000, min_mean = 0.1, max_mean = 1000, min_dispersion = 0.1)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
method: A character string specifying the method to use for identifying HVGs. Options include “variance”, “cv”, “scran”. Default is “variance”.
n_top_genes: Numeric value. Number of top highly variable genes to return. Default is 1000.
min_mean: Numeric value. Minimum mean expression threshold for scran method. Default is 0.1.
max_mean: Numeric value. Maximum mean expression threshold for scran method. Default is 1000.
min_dispersion: Numeric value. Minimum dispersion threshold for scran method. Default is 0.1.
Generate volcano plot for differential expression analysis results.
This function generates a volcano plot for differential expression analysis results, showing the relationship between log2 fold change and adjusted p-value.
SE_volcano(SE, assayname = "log2", logFC_threshold = 1, pvalue_threshold = 0.05)
SE: A SummarizedExperiment object containing differential expression results in rowData.
assayname: A string indicating which assay to use. The default value is “log2”.
logFC_threshold: Numeric value for log2 fold change threshold. Default is 1.
pvalue_threshold: Numeric value for adjusted p-value threshold. Default is 0.05.
Generate MA plot for differential expression analysis results.
This function generates an MA plot for differential expression analysis results, showing the relationship between mean expression and log2 fold change.
SE_MAplot(SE, assayname = "log2", logFC_threshold = 1, pvalue_threshold = 0.05)
SE: A SummarizedExperiment object containing differential expression results in rowData.
assayname: A string indicating which assay to use. The default value is “log2”.
logFC_threshold: Numeric value for log2 fold change threshold. Default is 1.
pvalue_threshold: Numeric value for adjusted p-value threshold. Default is 0.05.
Perform Gene Set Enrichment Analysis (GSEA).
This function performs Gene Set Enrichment Analysis (GSEA) on a SummarizedExperiment object using the fgsea package. It identifies gene sets that are significantly enriched in a ranked list of genes.
SE_GSEA(SE, assayname = "log2", group_colname = "group", gene_sets, min_size = 10, max_size = 500, n_perm = 1000, adjust_method = "BH", pvalue_threshold = 0.05)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “log2”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
gene_sets: A list of gene sets where each element is a character vector of gene symbols.
min_size: Numeric value. Minimum size of gene sets to consider. Default is 10.
max_size: Numeric value. Maximum size of gene sets to consider. Default is 500.
n_perm: Numeric value. Number of permutations to perform. Default is 1000.
adjust_method: A character string specifying the p-value adjustment method. Options include “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”. Default is “BH”.
pvalue_threshold: Numeric value for p-value threshold. Default is 0.05.
Perform Gene Ontology (GO) enrichment analysis.
This function performs Gene Ontology (GO) enrichment analysis on a list of genes using the clusterProfiler package. It identifies GO terms that are significantly enriched in the input gene list.
SE_GO(SE, genes, ont = "BP", organism = "human", pvalue_cutoff = 0.05, qvalue_cutoff = 0.2, min_gene = 10, max_gene = 500)
SE: A SummarizedExperiment object containing gene expression data.
genes: A character vector of gene symbols.
ont: A character string specifying the GO ontology to use. Options include “BP” (Biological Process), “CC” (Cellular Component), “MF” (Molecular Function). Default is “BP”.
organism: A character string specifying the organism. Default is “human”.
pvalue_cutoff: Numeric value for p-value cutoff. Default is 0.05.
qvalue_cutoff: Numeric value for q-value cutoff. Default is 0.2.
min_gene: Numeric value. Minimum number of genes in a GO term. Default is 10.
max_gene: Numeric value. Maximum number of genes in a GO term. Default is 500.
Perform KEGG pathway enrichment analysis.
This function performs KEGG pathway enrichment analysis on a list of genes using the clusterProfiler package. It identifies KEGG pathways that are significantly enriched in the input gene list.
SE_KEGG(SE, genes, organism = "hsa", pvalue_cutoff = 0.05, qvalue_cutoff = 0.2, min_gene = 10, max_gene = 500)
SE: A SummarizedExperiment object containing gene expression data.
genes: A character vector of gene symbols.
organism: A character string specifying the KEGG organism code. Default is “hsa” (human).
pvalue_cutoff: Numeric value for p-value cutoff. Default is 0.05.
qvalue_cutoff: Numeric value for q-value cutoff. Default is 0.2.
min_gene: Numeric value. Minimum number of genes in a KEGG pathway. Default is 10.
max_gene: Numeric value. Maximum number of genes in a KEGG pathway. Default is 500.
Visualize enrichment analysis results.
This function visualizes the results of enrichment analysis (GO/KEGG/GSEA) using various plot types including bar plots, dot plots, and network plots.
SE_enrichplot(enrich_result, plot_type = "bar", n_terms = 10, pvalue_cutoff = 0.05)
enrich_result: An enrichment analysis result object from clusterProfiler or fgsea.
plot_type: A character string specifying the plot type. Options include “bar”, “dot”, “network”, “cnet”. Default is “bar”.
n_terms: Numeric value. Number of top terms to display. Default is 10.
pvalue_cutoff: Numeric value for p-value cutoff. Default is 0.05.
Perform hierarchical clustering analysis.
This function performs hierarchical clustering on a SummarizedExperiment object and visualizes the results as a dendrogram. It can cluster either samples or features.
SE_hierarchical(SE, assayname = "TPM", cluster_by = "samples", method = "complete", distance = "euclidean")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
cluster_by: A character string specifying whether to cluster samples or features. Options include “samples”, “features”. Default is “samples”.
method: A character string specifying the clustering method. Options include “complete”, “single”, “average”, “ward.D”, “ward.D2”. Default is “complete”.
distance: A character string specifying the distance metric. Options include “euclidean”, “maximum”, “manhattan”, “canberra”, “binary”, “minkowski”. Default is “euclidean”.
Perform k-means clustering analysis.
This function performs k-means clustering on a SummarizedExperiment object and adds cluster assignments to the rowData or colData.
SE_kmeans(SE, assayname = "TPM", n_clusters = 3, cluster_by = "samples", nstart = 10)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
n_clusters: Numeric value. Number of clusters to form. Default is 3.
cluster_by: A character string specifying whether to cluster samples or features. Options include “samples”, “features”. Default is “samples”.
nstart: Numeric value. Number of random initializations. Default is 10.
Visualize clustering results.
This function visualizes clustering results using dimensionality reduction techniques such as PCA, t-SNE, or UMAP, with points colored by cluster assignment.
SE_clusterplot(SE, assayname = "TPM", cluster_colname, dim_reduction = "PCA", group_colname = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
cluster_colname: A string representing the column name in rowData or colData that contains cluster assignments.
dim_reduction: A character string specifying the dimensionality reduction method. Options include “PCA”, “tSNE”, “UMAP”. Default is “PCA”.
group_colname: A string representing the column name in colData that contains group information. Default is NULL.
Generate PCA plots.
This function takes a SummarizedExperiment object, computes PCA, and visualizes the results.
SE_PCAplot(SE, assayname = "TPM", groupname = "group", outlier_threshold = 2, scale = TRUE, feature_of_interesting = NULL)
SE: SummarizedExperiment object containing gene expression data.
assayname: Name of the expression data, default is “TPM”.
groupname: Name of the grouping column, default is “group”.
outlier_threshold: Outlier filtering threshold, default is 2.
scale: Whether to standardize the data, default is TRUE.
feature_of_interesting: Vector of specific feature names; if NULL, all features are used, default is NULL.
Perform t-distributed Stochastic Neighbor Embedding (t-SNE) analysis.
This function performs t-SNE dimensionality reduction on a SummarizedExperiment object and visualizes the results.
SE_tSNE(SE, assayname = "TPM", group_colname = "group", perplexity = 30, theta = 0.5, max_iter = 1000)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
perplexity: Numeric value. Perplexity parameter for t-SNE. Default is 30.
theta: Numeric value. Speed/accuracy trade-off parameter for t-SNE. Default is 0.5.
max_iter: Numeric value. Maximum number of iterations for t-SNE. Default is 1000.
Perform Uniform Manifold Approximation and Projection (UMAP) analysis.
This function performs UMAP dimensionality reduction on a SummarizedExperiment object and visualizes the results.
SE_UMAP(SE, assayname = "TPM", group_colname = "group", n_neighbors = 15, min_dist = 0.1, n_components = 2)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
n_neighbors: Numeric value. Number of neighbors for UMAP. Default is 15.
min_dist: Numeric value. Minimum distance for UMAP. Default is 0.1.
n_components: Numeric value. Number of components for UMAP. Default is 2.
Perform Multidimensional Scaling (MDS) analysis.
This function performs MDS dimensionality reduction on a SummarizedExperiment object and visualizes the results.
SE_MDS(SE, assayname = "TPM", group_colname = "group", distance = "euclidean")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
distance: A character string specifying the distance metric. Options include “euclidean”, “maximum”, “manhattan”, “canberra”, “binary”, “minkowski”. Default is “euclidean”.
Generates a boxplot for specified features in a SummarizedExperiment object.
This function creates a boxplot for the specified features (genes) within a given SummarizedExperiment object.
It supports normalization and grouping of the data.
SE_boxplot(SE, feature_of_interest, assayname = "TPM", group_col = NA, normalization = "none")
SE: A SummarizedExperiment object.
feature_of_interest: Character vector of gene identifiers.
assayname: The assay name in the SummarizedExperiment object.
group_col: Column name in colData(SE) for grouping.
normalization: Normalization method (“scale”, “log”, or “none”).
Generate heatmap for gene expression data.
This function generates a heatmap for gene expression data in a SummarizedExperiment object, with optional clustering and annotation.
SE_heatmap(SE, assayname = "TPM", features = NULL, group_colname = "group", scale = "row", clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean", clustering_method = "complete")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
features: A character vector of feature names to include in the heatmap. If NULL, all features are included. Default is NULL.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
scale: A character string specifying the scaling method. Options include “row”, “column”, “none”. Default is “row”.
clustering_distance_rows: A character string specifying the distance metric for row clustering. Default is “euclidean”.
clustering_distance_cols: A character string specifying the distance metric for column clustering. Default is “euclidean”.
clustering_method: A character string specifying the clustering method. Default is “complete”.
Calculate correlation between features or samples.
This function calculates correlation coefficients between features or samples in a SummarizedExperiment object and visualizes the results as a heatmap.
SE_correlation(SE, assayname = "TPM", correlation_type = "pearson", correlate_by = "features", features = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
correlation_type: A character string specifying the correlation method. Options include “pearson”, “spearman”, “kendall”. Default is “pearson”.
correlate_by: A character string specifying whether to correlate features or samples. Options include “features”, “samples”. Default is “features”.
features: A character vector of feature names to include. If NULL, all features are included. Default is NULL.
Generates two plots illustrating the distribution of non-zero entries in a SummarizedExperiment object.
This function creates a bar plot to display the count of non-zero entries for each feature (gene) in the specified expression matrix and a histogram showing the distribution of the fraction of non-zero entries across samples.
It allows for the option to treat zeros as NA, thereby excluding them from the count.
SE_distribution(SE, assayname = “TPM”, ZeroasNA = TRUE)
SE: An SummarizedExperiment object containing the expression data.
assayname: A string indicating the assay name to use from the SummarizedExperiment object. The default is “TPM”.
ZeroasNA: A logical value indicating whether zeros should be treated as NA. The default is TRUE.
Perform Weighted Gene Co-expression Network Analysis (WGCNA).
This function performs WGCNA on a SummarizedExperiment object to identify co-expression modules and their relationships with sample traits.
SE_WGCNA(SE, assayname = "TPM", power = 6, minModuleSize = 30, mergeCutHeight = 0.25, traitData = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
power: Numeric value. Soft-thresholding power for network construction. Default is 6.
minModuleSize: Numeric value. Minimum module size. Default is 30.
mergeCutHeight: Numeric value. Cut height for merging modules. Default is 0.25.
traitData: A data frame containing sample traits. If NULL, colData(SE) is used. Default is NULL.
Visualize gene co-expression networks.
This function visualizes gene co-expression networks or protein-protein interaction networks using igraph and ggraph.
SE_networkplot(network, layout = "fr", n_nodes = 100, n_edges = 500, node_size = 5, node_color = "blue", edge_color = "gray")
network: A network object (igraph or adjacency matrix).
layout: A character string specifying the layout algorithm. Options include “fr”, “kk”, “circle”, “grid”. Default is “fr”.
n_nodes: Numeric value. Number of top nodes to include. Default is 100.
n_edges: Numeric value. Number of top edges to include. Default is 500.
node_size: Numeric value. Node size. Default is 5.
node_color: Character string. Node color. Default is “blue”.
edge_color: Character string. Edge color. Default is “gray”.
Perform random forest classification or regression.
This function performs random forest analysis on a SummarizedExperiment object for classification or regression tasks.
SE_randomforest(SE, assayname = "TPM", group_colname = "group", n_trees = 500, mtry = NULL, importance = TRUE, ntree = 500)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
n_trees: Numeric value. Number of trees in the forest. Default is 500.
mtry: Numeric value. Number of variables randomly sampled as candidates at each split. If NULL, default value is used. Default is NULL.
importance: Logical value. Whether to calculate variable importance. Default is TRUE.
ntree: Numeric value. Number of trees in the forest (alternative parameter name). Default is 500.
Perform Support Vector Machine (SVM) classification or regression.
This function performs SVM analysis on a SummarizedExperiment object for classification or regression tasks.
SE_SVM(SE, assayname = "TPM", group_colname = "group", kernel = "radial", cost = 1, gamma = "auto")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
kernel: A character string specifying the kernel type. Options include “linear”, “polynomial”, “radial”, “sigmoid”. Default is “radial”.
cost: Numeric value. Cost parameter. Default is 1.
gamma: Numeric value or “auto”. Gamma parameter. Default is “auto”.
Perform cross-validation for machine learning models.
This function performs cross-validation for machine learning models (random forest, SVM) to evaluate model performance.
SE_crossvalidation(SE, assayname = "TPM", group_colname = "group", model_type = "randomforest", k_folds = 5, n_repeats = 1)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
model_type: A character string specifying the model type. Options include “randomforest”, “SVM”. Default is “randomforest”.
k_folds: Numeric value. Number of folds for cross-validation. Default is 5.
n_repeats: Numeric value. Number of repeats for cross-validation. Default is 1.
Calculate Area Under the Curve (AUC) for classification models.
This function calculates the AUC for classification models using the pROC package.
SE_AUCcalc(predictions, labels)
predictions: A numeric vector of predicted values or probabilities.
labels: A factor vector of true class labels.
Perform feature selection for machine learning models.
This function performs feature selection using various methods including variance threshold, mutual information, and recursive feature elimination.
SE_featureselection(SE, assayname = "TPM", group_colname = "group", method = "variance", n_features = 100)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
method: A character string specifying the feature selection method. Options include “variance”, “mutual_info”, “rfe”. Default is “variance”.
n_features: Numeric value. Number of top features to select. Default is 100.
Perform sample quality control.
This function performs quality control on samples in a SummarizedExperiment object, including assessment of library size, number of detected features, and sample correlation.
SE_sampleQC(SE, assayname = "TPM", group_colname = "group")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
Perform feature quality control.
This function performs quality control on features in a SummarizedExperiment object, including assessment of expression levels, detection rates, and variability.
SE_featureQC(SE, assayname = "TPM", group_colname = "group")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
Generate comprehensive quality control report.
This function generates a comprehensive quality control report for a SummarizedExperiment object, including sample and feature QC metrics, and data visualization.
SE_QCreport(SE, assayname = "TPM", group_colname = "group", output_dir = ".")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
output_dir: A string specifying the output directory for the report. Default is “.”.
Calculate detection ratio and update SummarizedExperiment object’s rowData.
This function computes the detection ratio of expression data and updates the rowData of the provided SummarizedExperiment object with detection sample counts and ratios. It also generates a histogram of detection ratios.
SE_detectratio(SE, assayname = "TPM")
SE: A SummarizedExperiment object containing expression data.
assayname: The name of the assay to be used for calculations. Default is "TPM".
Batch effect correct using Combat in sva package for SummarizedExperiment object.
SE_combat(SE, col_for_combat, col_for_compare)
SE: A SummarizedExperiment object containing the data to be imputed.
Export SummarizedExperiment object to various formats.
This function exports a SummarizedExperiment object to various formats including CSV, TSV, Excel, and RData.
SE_export(SE, assayname = "TPM", output_format = "csv", output_file = "SE_export")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to export. The default value is “TPM”.
output_format: A character string specifying the output format. Options include “csv”, “tsv”, “xlsx”, “rdata”. Default is “csv”.
output_file: A string specifying the output file name (without extension). Default is “SE_export”.
Generate summary statistics for SummarizedExperiment object.
This function generates summary statistics for a SummarizedExperiment object, including number of features and samples, expression statistics, and metadata information.
SE_summary(SE, assayname = "TPM")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use for summary statistics. The default value is “TPM”.
Load example SummarizedExperiment object.
This function loads an example SummarizedExperiment object for demonstration purposes.
loadSE()
Load list of example SummarizedExperiment objects.
This function loads a list of example SummarizedExperiment objects for demonstration purposes.
loadSElist()
This function performs differential expression analysis on count data using the edgeR package, which is specifically designed for RNA-seq data analysis.
SE_edgeR(SE, assayname = "Counts", group_colname = "group", contrast = NULL, adjust_method = "BH", pvalue_threshold = 0.05, logFC_threshold = 1)
SE: A SummarizedExperiment object containing count data.
assayname: A string indicating which assay to use for analysis. The default value is “Counts”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
contrast: A character string specifying the contrast for differential analysis (e.g., “Treatment-Control”). Default is NULL, which will compare the first two groups.
adjust_method: A character string specifying the p-value adjustment method. Options include “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”. Default is “BH”.
pvalue_threshold: Numeric value for p-value threshold. Default is 0.05.
logFC_threshold: Numeric value for log2 fold change threshold. Default is 1.
Identify highly variable genes in SummarizedExperiment object.
This function identifies highly variable genes (HVGs) in a SummarizedExperiment object using various methods including variance, coefficient of variation (CV), and the method from the scran package.
SE_HVG(SE, assayname = "TPM", method = "variance", n_top_genes = 1000, min_mean = 0.1, max_mean = 1000, min_dispersion = 0.1)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
method: A character string specifying the method to use for identifying HVGs. Options include “variance”, “cv”, “scran”. Default is “variance”.
n_top_genes: Numeric value. Number of top highly variable genes to return. Default is 1000.
min_mean: Numeric value. Minimum mean expression threshold for scran method. Default is 0.1.
max_mean: Numeric value. Maximum mean expression threshold for scran method. Default is 1000.
min_dispersion: Numeric value. Minimum dispersion threshold for scran method. Default is 0.1.
Perform Gene Set Enrichment Analysis (GSEA).
This function performs Gene Set Enrichment Analysis (GSEA) on a SummarizedExperiment object using the fgsea package. It identifies gene sets that are significantly enriched in a ranked list of genes.
SE_GSEA(SE, assayname = "log2", group_colname = "group", gene_sets, min_size = 10, max_size = 500, n_perm = 1000, adjust_method = "BH", pvalue_threshold = 0.05)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “log2”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
gene_sets: A list of gene sets where each element is a character vector of gene symbols.
min_size: Numeric value. Minimum size of gene sets to consider. Default is 10.
max_size: Numeric value. Maximum size of gene sets to consider. Default is 500.
n_perm: Numeric value. Number of permutations to perform. Default is 1000.
adjust_method: A character string specifying the p-value adjustment method. Options include “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”. Default is “BH”.
pvalue_threshold: Numeric value for p-value threshold. Default is 0.05.
Perform Gene Ontology (GO) enrichment analysis.
This function performs Gene Ontology (GO) enrichment analysis on a list of genes using the clusterProfiler package. It identifies GO terms that are significantly enriched in the input gene list.
SE_GO(SE, genes, ont = "BP", organism = "human", pvalue_cutoff = 0.05, qvalue_cutoff = 0.2, min_gene = 10, max_gene = 500)
SE: A SummarizedExperiment object containing gene expression data.
genes: A character vector of gene symbols.
ont: A character string specifying the GO ontology to use. Options include “BP” (Biological Process), “CC” (Cellular Component), “MF” (Molecular Function). Default is “BP”.
organism: A character string specifying the organism. Default is “human”.
pvalue_cutoff: Numeric value for p-value cutoff. Default is 0.05.
qvalue_cutoff: Numeric value for q-value cutoff. Default is 0.2.
min_gene: Numeric value. Minimum number of genes in a GO term. Default is 10.
max_gene: Numeric value. Maximum number of genes in a GO term. Default is 500.
Perform KEGG pathway enrichment analysis.
This function performs KEGG pathway enrichment analysis on a list of genes using the clusterProfiler package. It identifies KEGG pathways that are significantly enriched in the input gene list.
SE_KEGG(SE, genes, organism = "hsa", pvalue_cutoff = 0.05, qvalue_cutoff = 0.2, min_gene = 10, max_gene = 500)
SE: A SummarizedExperiment object containing gene expression data.
genes: A character vector of gene symbols.
organism: A character string specifying the KEGG organism code. Default is “hsa” (human).
pvalue_cutoff: Numeric value for p-value cutoff. Default is 0.05.
qvalue_cutoff: Numeric value for q-value cutoff. Default is 0.2.
min_gene: Numeric value. Minimum number of genes in a KEGG pathway. Default is 10.
max_gene: Numeric value. Maximum number of genes in a KEGG pathway. Default is 500.
Visualize enrichment analysis results.
This function visualizes the results of enrichment analysis (GO/KEGG/GSEA) using various plot types including bar plots, dot plots, and network plots.
SE_enrichplot(enrich_result, plot_type = "bar", n_terms = 10, pvalue_cutoff = 0.05)
enrich_result: An enrichment analysis result object from clusterProfiler or fgsea.
plot_type: A character string specifying the plot type. Options include “bar”, “dot”, “network”, “cnet”. Default is “bar”.
n_terms: Numeric value. Number of top terms to display. Default is 10.
pvalue_cutoff: Numeric value for p-value cutoff. Default is 0.05.
Perform hierarchical clustering analysis.
This function performs hierarchical clustering on a SummarizedExperiment object and visualizes the results as a dendrogram. It can cluster either samples or features.
SE_hierarchical(SE, assayname = "TPM", cluster_by = "samples", method = "complete", distance = "euclidean")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
cluster_by: A character string specifying whether to cluster samples or features. Options include “samples”, “features”. Default is “samples”.
method: A character string specifying the clustering method. Options include “complete”, “single”, “average”, “ward.D”, “ward.D2”. Default is “complete”.
distance: A character string specifying the distance metric. Options include “euclidean”, “maximum”, “manhattan”, “canberra”, “binary”, “minkowski”. Default is “euclidean”.
Perform k-means clustering analysis.
This function performs k-means clustering on a SummarizedExperiment object and adds cluster assignments to the rowData or colData.
SE_kmeans(SE, assayname = "TPM", n_clusters = 3, cluster_by = "samples", nstart = 10)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
n_clusters: Numeric value. Number of clusters to form. Default is 3.
cluster_by: A character string specifying whether to cluster samples or features. Options include “samples”, “features”. Default is “samples”.
nstart: Numeric value. Number of random initializations. Default is 10.
Visualize clustering results.
This function visualizes clustering results using dimensionality reduction techniques such as PCA, t-SNE, or UMAP, with points colored by cluster assignment.
SE_clusterplot(SE, assayname = "TPM", cluster_colname, dim_reduction = "PCA", group_colname = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
cluster_colname: A string representing the column name in rowData or colData that contains cluster assignments.
dim_reduction: A character string specifying the dimensionality reduction method. Options include “PCA”, “tSNE”, “UMAP”. Default is “PCA”.
group_colname: A string representing the column name in colData that contains group information. Default is NULL.
Perform t-distributed Stochastic Neighbor Embedding (t-SNE) analysis.
This function performs t-SNE dimensionality reduction on a SummarizedExperiment object and visualizes the results.
SE_tSNE(SE, assayname = "TPM", group_colname = "group", perplexity = 30, theta = 0.5, max_iter = 1000)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
perplexity: Numeric value. Perplexity parameter for t-SNE. Default is 30.
theta: Numeric value. Speed/accuracy trade-off parameter for t-SNE. Default is 0.5.
max_iter: Numeric value. Maximum number of iterations for t-SNE. Default is 1000.
Perform Uniform Manifold Approximation and Projection (UMAP) analysis.
This function performs UMAP dimensionality reduction on a SummarizedExperiment object and visualizes the results.
SE_UMAP(SE, assayname = "TPM", group_colname = "group", n_neighbors = 15, min_dist = 0.1, n_components = 2)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
n_neighbors: Numeric value. Number of neighbors for UMAP. Default is 15.
min_dist: Numeric value. Minimum distance for UMAP. Default is 0.1.
n_components: Numeric value. Number of components for UMAP. Default is 2.
Perform Multidimensional Scaling (MDS) analysis.
This function performs MDS dimensionality reduction on a SummarizedExperiment object and visualizes the results.
SE_MDS(SE, assayname = "TPM", group_colname = "group", distance = "euclidean")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
distance: A character string specifying the distance metric. Options include “euclidean”, “maximum”, “manhattan”, “canberra”, “binary”, “minkowski”. Default is “euclidean”.
Generate heatmap for gene expression data.
This function generates a heatmap for gene expression data in a SummarizedExperiment object, with optional clustering and annotation.
SE_heatmap(SE, assayname = "TPM", features = NULL, group_colname = "group", scale = "row", clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean", clustering_method = "complete")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
features: A character vector of feature names to include in the heatmap. If NULL, all features are included. Default is NULL.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
scale: A character string specifying the scaling method. Options include “row”, “column”, “none”. Default is “row”.
clustering_distance_rows: A character string specifying the distance metric for row clustering. Default is “euclidean”.
clustering_distance_cols: A character string specifying the distance metric for column clustering. Default is “euclidean”.
clustering_method: A character string specifying the clustering method. Default is “complete”.
Calculate correlation between features or samples.
This function calculates correlation coefficients between features or samples in a SummarizedExperiment object and visualizes the results as a heatmap.
SE_correlation(SE, assayname = "TPM", correlation_type = "pearson", correlate_by = "features", features = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
correlation_type: A character string specifying the correlation method. Options include “pearson”, “spearman”, “kendall”. Default is “pearson”.
correlate_by: A character string specifying whether to correlate features or samples. Options include “features”, “samples”. Default is “features”.
features: A character vector of feature names to include. If NULL, all features are included. Default is NULL.
Perform Weighted Gene Co-expression Network Analysis (WGCNA).
This function performs WGCNA on a SummarizedExperiment object to identify co-expression modules and their relationships with sample traits.
SE_WGCNA(SE, assayname = "TPM", power = 6, minModuleSize = 30, mergeCutHeight = 0.25, traitData = NULL)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
power: Numeric value. Soft-thresholding power for network construction. Default is 6.
minModuleSize: Numeric value. Minimum module size. Default is 30.
mergeCutHeight: Numeric value. Cut height for merging modules. Default is 0.25.
traitData: A data frame containing sample traits. If NULL, colData(SE) is used. Default is NULL.
Visualize gene co-expression networks.
This function visualizes gene co-expression networks or protein-protein interaction networks using igraph and ggraph.
SE_networkplot(network, layout = "fr", n_nodes = 100, n_edges = 500, node_size = 5, node_color = "blue", edge_color = "gray")
network: A network object (igraph or adjacency matrix).
layout: A character string specifying the layout algorithm. Options include “fr”, “kk”, “circle”, “grid”. Default is “fr”.
n_nodes: Numeric value. Number of top nodes to include. Default is 100.
n_edges: Numeric value. Number of top edges to include. Default is 500.
node_size: Numeric value. Node size. Default is 5.
node_color: Character string. Node color. Default is “blue”.
edge_color: Character string. Edge color. Default is “gray”.
Perform random forest classification or regression.
This function performs random forest analysis on a SummarizedExperiment object for classification or regression tasks.
SE_randomforest(SE, assayname = "TPM", group_colname = "group", n_trees = 500, mtry = NULL, importance = TRUE, ntree = 500)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
n_trees: Numeric value. Number of trees in the forest. Default is 500.
mtry: Numeric value. Number of variables randomly sampled as candidates at each split. If NULL, default value is used. Default is NULL.
importance: Logical value. Whether to calculate variable importance. Default is TRUE.
ntree: Numeric value. Number of trees in the forest (alternative parameter name). Default is 500.
Perform Support Vector Machine (SVM) classification or regression.
This function performs SVM analysis on a SummarizedExperiment object for classification or regression tasks.
SE_SVM(SE, assayname = "TPM", group_colname = "group", kernel = "radial", cost = 1, gamma = "auto")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
kernel: A character string specifying the kernel type. Options include “linear”, “polynomial”, “radial”, “sigmoid”. Default is “radial”.
cost: Numeric value. Cost parameter. Default is 1.
gamma: Numeric value or “auto”. Gamma parameter. Default is “auto”.
Perform cross-validation for machine learning models.
This function performs cross-validation for machine learning models (random forest, SVM) to evaluate model performance.
SE_crossvalidation(SE, assayname = "TPM", group_colname = "group", model_type = "randomforest", k_folds = 5, n_repeats = 1)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
model_type: A character string specifying the model type. Options include “randomforest”, “SVM”. Default is “randomforest”.
k_folds: Numeric value. Number of folds for cross-validation. Default is 5.
n_repeats: Numeric value. Number of repeats for cross-validation. Default is 1.
Calculate Area Under the Curve (AUC) for classification models.
This function calculates the AUC for classification models using the pROC package.
SE_AUCcalc(predictions, labels)
predictions: A numeric vector of predicted values or probabilities.
labels: A factor vector of true class labels.
Perform sample quality control.
This function performs quality control on samples in a SummarizedExperiment object, including assessment of library size, number of detected features, and sample correlation.
SE_sampleQC(SE, assayname = "TPM", group_colname = "group")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
Perform feature quality control.
This function performs quality control on features in a SummarizedExperiment object, including assessment of expression levels, detection rates, and variability.
SE_featureQC(SE, assayname = "TPM", group_colname = "group")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
Generate comprehensive quality control report.
This function generates a comprehensive quality control report for a SummarizedExperiment object, including sample and feature QC metrics, and data visualization.
SE_QCreport(SE, assayname = "TPM", group_colname = "group", output_dir = ".")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
output_dir: A string specifying the output directory for the report. Default is “.”.
Export SummarizedExperiment object to various formats.
This function exports a SummarizedExperiment object to various formats including CSV, TSV, Excel, and RData.
SE_export(SE, assayname = "TPM", output_format = "csv", output_file = "SE_export")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to export. The default value is “TPM”.
output_format: A character string specifying the output format. Options include “csv”, “tsv”, “xlsx”, “rdata”. Default is “csv”.
output_file: A string specifying the output file name (without extension). Default is “SE_export”.
Manage metadata for SummarizedExperiment object.
This function manages metadata for a SummarizedExperiment object, including adding, removing, and updating metadata entries.
SE_metadata(SE, metadata = NULL, action = "add")
SE: A SummarizedExperiment object containing gene expression data.
metadata: A named list of metadata entries. Default is NULL.
action: A character string specifying the action to perform. Options include “add”, “remove”, “update”, “get”. Default is “add”.
Rename features or samples in SummarizedExperiment object.
This function renames features or samples in a SummarizedExperiment object based on a mapping provided by the user.
SE_rename(SE, mapping, rename_by = "features")
SE: A SummarizedExperiment object containing gene expression data.
mapping: A named vector or data frame where names are old identifiers and values are new identifiers.
rename_by: A character string specifying whether to rename features or samples. Options include “features”, “samples”. Default is “features”.
Convert SummarizedExperiment object to other formats.
This function converts a SummarizedExperiment object to other formats including ExpressionSet, DGEList, and data frames.
SE_convert(SE, assayname = "TPM", to_format = "ExpressionSet")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use for conversion. The default value is “TPM”.
to_format: A character string specifying the format to convert to. Options include “ExpressionSet”, “DGEList”, “data.frame”, “matrix”. Default is “ExpressionSet”.
Merge multiple SummarizedExperiment objects.
This function merges multiple SummarizedExperiment objects based on common features or samples.
SE_merge(se_list, merge_by = "features", merge_type = "inner")
se_list: A list of SummarizedExperiment objects to merge.
merge_by: A character string specifying whether to merge by features or samples. Options include “features”, “samples”. Default is “features”.
merge_type: A character string specifying the merge type. Options include “inner”, “outer”, “left”, “right”. Default is “inner”.
Generate summary statistics for SummarizedExperiment object.
This function generates summary statistics for a SummarizedExperiment object, including number of features and samples, expression statistics, and metadata information.
SE_summary(SE, assayname = "TPM")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use for summary statistics. The default value is “TPM”.
Generate volcano plot for differential expression analysis results.
This function generates a volcano plot for differential expression analysis results, showing the relationship between log2 fold change and adjusted p-value.
SE_volcano(SE, assayname = "log2", logFC_threshold = 1, pvalue_threshold = 0.05)
SE: A SummarizedExperiment object containing differential expression results in rowData.
assayname: A string indicating which assay to use. The default value is “log2”.
logFC_threshold: Numeric value for log2 fold change threshold. Default is 1.
pvalue_threshold: Numeric value for adjusted p-value threshold. Default is 0.05.
Generate MA plot for differential expression analysis results.
This function generates an MA plot for differential expression analysis results, showing the relationship between mean expression and log2 fold change.
SE_MAplot(SE, assayname = "log2", logFC_threshold = 1, pvalue_threshold = 0.05)
SE: A SummarizedExperiment object containing differential expression results in rowData.
assayname: A string indicating which assay to use. The default value is “log2”.
logFC_threshold: Numeric value for log2 fold change threshold. Default is 1.
pvalue_threshold: Numeric value for adjusted p-value threshold. Default is 0.05.
Generate density plot for gene expression data.
This function generates a density plot for gene expression data in a SummarizedExperiment object, showing the distribution of expression values.
SE_density(SE, assayname = "TPM", group_colname = "group")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
Analyze expression trends across conditions or time points.
This function analyzes expression trends across conditions or time points in a SummarizedExperiment object.
SE_trend(SE, assayname = "TPM", group_colname = "group", feature_of_interest)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
feature_of_interest: A character vector of feature names to analyze.
Analyze time series gene expression data.
This function analyzes time series gene expression data in a SummarizedExperiment object, including identifying temporal patterns and differentially expressed genes across time points.
SE_timeseries(SE, assayname = "TPM", time_colname = "time", group_colname = "group")
SE: A SummarizedExperiment object containing time series gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
time_colname: A string representing the column name in colData that contains time point information. Default is “time”.
group_colname: A string representing the column name in colData that contains group information. Default is “group”.
Transform expression data in SummarizedExperiment object.
This function transforms expression data in a SummarizedExperiment object using various transformation methods including log transformation, z-score normalization, and quantile transformation.
SE_transform(SE, assayname = "TPM", method = "log2", pseudocount = 1)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to transform. The default value is “TPM”.
method: A character string specifying the transformation method. Options include “log2”, “log10”, “zscore”, “quantile”. Default is “log2”.
pseudocount: Numeric value to add before log transformation to avoid log(0). Default is 1.
Detect batch effects in SummarizedExperiment object.
This function detects batch effects in a SummarizedExperiment object using principal component analysis (PCA) and correlation analysis.
SE_batchdetect(SE, assayname = "TPM", batch_colname = "batch")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
batch_colname: A string representing the column name in colData that contains batch information. Default is “batch”.
Calculate silhouette width for clustering evaluation.
This function calculates silhouette width for clustering results to evaluate cluster quality.
SE_silhouette(SE, assayname = "TPM", cluster_colname, distance = "euclidean")
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
cluster_colname: A string representing the column name in rowData or colData that contains cluster assignments.
distance: A character string specifying the distance metric. Options include “euclidean”, “maximum”, “manhattan”, “canberra”, “binary”, “minkowski”. Default is “euclidean”.
Perform feature selection for machine learning.
This function performs feature selection for machine learning tasks using various methods including variance threshold, recursive feature elimination, and LASSO.
SE_featureselection(SE, assayname = "TPM", group_colname = "group", method = "variance", n_features = 100)
SE: A SummarizedExperiment object containing gene expression data.
assayname: A string indicating which assay to use. The default value is “TPM”.
group_colname: A string representing the column name in colData that contains the response variable. Default is “group”.
method: A character string specifying the feature selection method. Options include “variance”, “rfe”, “lasso”. Default is “variance”.
n_features: Numeric value. Number of features to select. Default is 100.
Perform circular permutation test for enrichment analysis.
This function performs circular permutation test for enrichment analysis, which is useful for testing enrichment of gene sets in ranked lists.
SE_circTest(ranked_list, gene_set, n_perm = 1000)
ranked_list: A named numeric vector of ranked values.
gene_set: A character vector of gene symbols.
n_perm: Numeric value. Number of permutations to perform. Default is 1000.
Perform COCONUT analysis for functional annotation.
This function performs COCONUT (COmprehensive COnsortium Network Utility Tool) analysis for functional annotation of genes.
SE_COCONUT(genes, organism = "human", database = "GO")
genes: A character vector of gene symbols.
organism: A character string specifying the organism. Default is “human”.
database: A character string specifying the database to use. Options include “GO”, “KEGG”, “Reactome”. Default is “GO”.
Load example SummarizedExperiment object.
This function loads an example SummarizedExperiment object for testing and demonstration purposes.
loadSE()
None.
Load list of example SummarizedExperiment objects.
This function loads a list of example SummarizedExperiment objects for testing and demonstration purposes.
loadSElist()
None.
The author would like to thank Deepseek.
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /home/shaoxun/anaconda3/envs/yuanshaoxun/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SEtoolbox_0.1.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.14 pillar_1.10.1
[3] bslib_0.9.0 compiler_4.2.0
[5] later_1.4.1 jquerylib_0.1.4
[7] GenomeInfoDb_1.34.9 XVector_0.38.0
[9] MatrixGenerics_1.10.0 bitops_1.0-9
[11] tools_4.2.0 zlibbioc_1.44.0
[13] digest_0.6.37 tibble_3.2.1
[15] gtable_0.3.6 lattice_0.22-6
[17] jsonlite_1.9.1 memoise_2.0.1
[19] lifecycle_1.0.4 pkgconfig_2.0.3
[21] rlang_1.1.5 Matrix_1.6-5
[23] DelayedArray_0.24.0 shiny_1.10.0
[25] cli_3.6.4.9000 fastmap_1.2.0
[27] GenomeInfoDbData_1.2.9 dplyr_1.1.4
[29] generics_0.1.3 vctrs_0.6.5
[31] S4Vectors_0.36.2 sass_0.4.9
[33] IRanges_2.32.0 tidyselect_1.2.1
[35] grid_4.2.0 stats4_4.2.0
[37] glue_1.8.0 Biobase_2.58.0
[39] R6_2.6.1 ggplot2_3.5.1
[41] magrittr_2.0.3 scales_1.3.0
[43] promises_1.3.2 matrixStats_1.5.0
[45] htmltools_0.5.8.1 BiocGenerics_0.44.0
[47] GenomicRanges_1.50.2 SummarizedExperiment_1.28.0
[49] colorspace_2.1-1 mime_0.12
[51] xtable_1.8-4 httpuv_1.6.15
[53] munsell_0.5.1 RCurl_1.98-1.16
[55] cachem_1.1.0