An R Introduction to Statistics

Hierarchical Linear Model

  Linear regression probably is the most familiar technique of data analysis, but its application is often hamstrung by model assumptions. For instance, if the data has a hierarchical structure, quite often the assumptions of linear regression are feasible only at local levels. We will investigate an extension of the linear model to bi-level hierarchies.

Bayesian Classification with Gaussian Process

  Despite prowess of the support vector machine, it is not specifically designed to extract features relevant to the prediction. For example, in network intrusion detection, we need to learn relevant network statistics for the network defense. In consumer credit rating, we would like to determine relevant financial records for the credit score. As for medical genetics research, we aim to identify genes relevant to the illness.

Bayesian Inference Using OpenBUGS

  In our previous statistics tutorials, we have treated population parameters as fixed values, and provided point estimates and confidence intervals for them. An alternative approach is the Bayesian statistics. It treats population parameters as random variables. Probability becomes a measure of our belief in possible outcomes. With new tools like OpenBUGS, tackling new problems requires building new models, instead of creating yet another R command.

Significance Test for Kendall's Tau-b

  A variation of the standard definition of Kendall correlation coefficient is necessary in order to deal with data samples with tied ranks. It known as the Kendall’s tau-b coefficient and is more effective in determining whether two non-parametric data samples with ties are correlated.

Support Vector Machine with GPU, Part II

  In our last tutorial on SVM training with GPU, we mentioned a necessary step to pre-scale the data with rpusvm-scale, and to reverse scaling the prediction outcome. This cumbersome procedure is now simplified with the latest RPUSVM.

Hierarchical Cluster Analysis

  With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a hierarchical relationship among the vehicles.