Supplemental Data to Publications:

A biopsy sample reduction approach to identify significant alterations of the testicular transcriptome in the presence of Y -chromosomal microdeletions that are independent of germ cell composition.
Heike Cappallo-Obermann, Kathrein von Kopylow, Wolfgang Schulze and Andrej-Nikolai Spiess
Accepted in Human Genetics.

Supplemental Data 1. Excel worksheet containing the processed and normalized microarray data for all samples analyzed in this paper. Additionally, the statistical analysis and filtering of genes having ‘deletion effects’ or ‘germ cell effects’ sorted by p-values are included. A detailed description of the different sheets within the file are found under ‘Description’..
Download


An evaluation of R2 as an inadequate measure for nonlinear models in pharmaceutical and biochemical research.
Spiess AN & Neumeyer N
BMC Pharmacol (2010), 10:6.


The following scripts were used for the analysis of R-square vs. AIC (or Akaike weights) :

R-square vs. AIC and Akaike weights for 9 different sigmoidal models. The five-parameter log-logistic model is the 'true' model.
Download

Effect of different gaussian noise set-ups on AIC and three different definitions of R-square (see Supplemental Data 5 to the manuscript).
Download

Effect of increasing y-magnitude and 1% gaussian noise on AIC and three different definitions of R-square (see Supplemental Data 5 to the manuscript).
Download


Highly accurate sigmoidal fitting of real-time PCR data by introducing a parameter for asymmetry.
Spiess AN,
Feig C and Ritz C
BMC Bioinformatics (2008), 29: 221


Figure 6. Assessment of quantitative real-time PCR efficiencies from the replicates of four independent dilution datasets.
Four independent datasets (in rows) were analyzed in respect to the PCR efficiencies of five different methods, as follows: 4-par: a four-parametric log-logistic model; 5-par: a five-parameter log-logistic model; exp: an exponential model after outlier cycle detection [2]; w-o-l: the window-of-linearity method as described in [1]; calib: a calibration curve obtained from linear regression of all dilution cycles. The boxplots depict the statistical features of the replicates within each dilution step. For methods 1-4, efficiencies were calculated per curve, while for method 5 one efficiency estimation was obtained from all dilution steps. If estimated efficiencies were larger than 2.1, this is denoted in the graphs.
Download

Table 1. Summary for accuracy and precision of dilution ratio quantitation obtained from four independent dilution datasets.

Seven commonly used quantification methods in conjunction with threshold cycles estimated from four- and five-parameter sigmoidal models were applied for the analysis. Four different datasets differing in the number of replicates, chemistry and platform were analyzed in respect to accuracy (average percentage of calculated ratios from real ratios) and precision (average c.v.; numbers in brackets). Threshold cycles estimated from the second derivatives maximum of four- and five-parameter sigmoidal models (4-par, 5-par) were used in combination with the following methods: sigmoidal model with Δct method (sigm/Δct), sigmoidal model with initial fluorescence (sigm/F0), exponential model with Δct method (exp/Δct), exponential model with initial fluorescence (exp/F0), window-of-linearity method with Δct method (w-o-l/Δct), window-of-linearity method with initial fluorescence (w-o-l/F0) and calibration curve with Δct method (calib/Δct). Numbers in bold are combinations in which the five-parameter model performs better. N.V. : no realistic estimation values due to high imprecision.
Download

Supplemental Data 1. Statistical summary of four independent dilution datasets and seven commonly used quantification methods in conjunction with threshold cycles estimated from four- and five-parameter sigmoidal models.
Four different dilution datasets differing in the number of replicates, chemistry and platform were analyzed in respect to efficiency, measures for the goodness of fit (RMSE, AIC, R-squared; see Materials & Methods), threshold cycles estimated from four- and five-parameter sigmoidal model and calculated ratios obtained either by Δct methods or by estimation of the initial template fluorescence, where applicable. In total, seven different methods as described in the file under ‘Details’ were used for the statistical comparison (see also Legend to Table 1). In cases of increased performance of the five-parameter models over the four-parameter models, the statistical values were highlighted in yellow. 
Download


Cross-platform gene expression signature of human spermatogenic failure reveals inflammatory-like response.
Spiess AN, Feig C, Schulze W, Chalmel F, Cappallo-Obermann H, Primig M and Kirchhoff C
Hum Reprod (2007), 22: 2936 - 46

Supplemental Figure 1. Platform comparison of the within-group variance of the four histological subtypes analyzed in the present study. 8263 genes common to both platforms were analyzed in respect to the within-group variance of the four groups (FTS: full testicular spermatogenesis; HYS: hypo­spermatogenesis; GCA: germ cell arrest; SCO: Sertoli-cell only). Hexbin scatter plots of the standard deviations from the groups were displayed, with the dashed line illustrating a ratio of 1. The majority of data points are scattered on the right side of this diagonal, showing the tendency to have smaller standard deviations within the groups of the GeneChip system.
Download

Supplemental Figure 2. (A) Hierarchical clustering of 28 Codelink samples using the differential set of 551 probes. All samples were clustered using average distance and Manhattan metric. The heat map was built using all genes and heat colours from dark red (high expression) to bright yellow (low expression). Sample labels include the morphological score and sample number. Red and black dots on the dendrogram nodes are permutation-based cluster stability p-values as calculated from the pvclust package (red: p-value ≤ 0.05, black: p-value > 0.05). (B) Hierarchical clustering as in (A) but using the median expression values from each pathological group. (C) Principle component analysis of the 28 samples. Dimensional reduction was done using scaled and centered data. Samples in all diagrams are colour coded in black (FTS), green (HYS), blue (GCA) and red (SCO).
Download

Supplemental Figure 3. (A) Hierarchical clustering of 27 GeneChip samples using the differential set of 2096 probesets. All samples were clustered using average distance and manhattan metric. The heatmap was built using all genes and heatcolours from dark red (high expression) to bright yellow (low expression). Sample labels include the morphological score and sample number. Red and black dots on the dendrogram nodes are permutation-based cluster stability p-values as calculated from the pvclust package (red: p-value ≤ 0.05, black: p-value > 0.05). (B) Hierarchical clustering as in (A) but using the median expression values from each pathological group. (C) Principle component analysis of the 27 samples. Dimensional reduction was done using scaled and centered data. Samples in all diagrams are colour coded in black (FTS), green (HYS), blue (GCA) and red (SCO).
Download

Supplemental Figure 4. Additional validation of microarray data by quantitative real-time PCR (qRT-PCR) for seven selected genes. qRT-PCR was applied to the same 28 Codelink samples used for microarray hybridization (with n=12, 6, 5 and 5 for the four pathological subtypes, respectively). Box plots showing fold-changes (in respect to FTS) for HYS (Bar 1 & 2), GCA (Bar 3 & 4) and SCO ratios (Bar 5 & 6). Blue:microarray result; red: qRT-PCR result. The coefficient of variation was under 5% for all samples, so that error bars were omitted.
Download

Supplemental Figure 5. Statistical analysis results obtained from counting the number of mast cells in semi-thin sections of the different testicular pathologies. Twelve different tests for distribution, location, variance and scale were applied to get an overview of the statistical characteristics underlying the mast cell counts (see Supplemental Data 7). Not all datapoints from the different pathological subtypes followed a normal distribution, so that non-parametric methods were applied (Kolmogorov-Smirnov test, Kruskal-Wallis test). Grey bars summarize mast cell counts from each patient, black bars the respective counts from the complete group. Significant differences were found for FTS/HYS (p-value = 0.0043) and FTS/SCO (p-value = 0.0013). For details see Supplemental Data 7.
Download


Supplemental Figure 6. Systemic bias reduction of all samples by distance-weighted discrimination (DWD). Based on the 8263-gene overlap of both platforms, all 69 samples from the 28-sample Codelink training set, the 14-sample Codelink test set, and the 27-sample GeneChip set were normalized, scaled and centered by eliminating the systemic bias of the two platforms by the DWD method (see Materials & Methods). Colour coding same as in Supplemental Figure 2. Boxplots depict the average intensities of all samples before (A) and after (B) the normalization procedure.
Download

All Supplemental Data files referred to in the publication (Supplemental Data 1-8) can be downloaded as a zip-file here.


A new paradigm for profiling testicular gene expression during normal and disturbed human spermatogenesis.
Feig C, Kirchhoff C, Ivell R, Naether O, Schulze W and Spiess AN
Mol Hum Reprod (2007), 13: 33-43


Supplemental Data 1: Filtered, log-transformed and cyclic-loess normalized gene expression data sorted by ANOVA p-values and used for the statistical analysis and interpretation in this study. This also includes the results from the statistical testing and clustering procedures. Microsoft Excel spreadsheet. Description of column labels at end of data.
Download. The unfiltered and COMPLETE dataset with all genes can be downloaded here.

Supplemental Data 2: Characteristics of the primers used for validation by qRT-PCR. Microsoft Excel spreadsheet.
Download

Supplemental Data 3:  Permutation f-test (ANOVA) results from BRB-Array Tools output. HTML file.
Download

Supplemental Data 4:  Tissue distribution summary of the obtained differential gene expression set. Upper panel: Histogram showing the number of genes with their highest expression in a specific tissue. Lower panel: Box-plot of the relative expression strength of the gene set throughout the various human tissues. Data for both representations were extracted and summarized from the SymAtlas database (http://symatlas.gnf.org/SymAtlas/) using the batch function.
Download

Supplemental Data 5:  Power analysis based on 2000 random datasets for Fisher Exact tests used in over-representation analysis in dependence of the number of observations in the test set. Analysis was based on a proportion of 1% of a certain TFBS in the background dataset. Power analysis was done with different proportions (1-16%) of the same TFBS in the test set. Dashed line at P=0.8 is the common minimal power to be accepted.
Download

Supplemental Data 6: Over-representation analysis for GO terms, KEGG pathways and transcription factor binding sites (TFBS). Data are p-values of the results in dependence of  the number of genes obtained from the Pearson correlation gene merging procedure. Numbers in red were used for Figure 5.
Download