New Tutorial on Resampling Methods

See the new tutorial on resampling methods, which are an indispensable tool in modern statistics. They involve repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model. For example, in order to estimate the variability of a linear regression fit, we can repeatedly draw different samples from the training data, fit a linear regression to each new sample, and then examine the extent to which the resulting fits differ. Such an approach may allow us to obtain information that would not be available from fitting the model only once using the original training sample.

New Tutorials on Clustering

Clustering is a broad set of techniques for finding subgroups of observations within a data set. When we cluster observations, we want observations in the same group to be similar and observations in different groups to be dissimilar. Clustering allows us to identify which observations (i.e. customers, students, states) are alike, and potentially categorize them therein. Check out our new tutorials covering k-means and hierarchical clustering.

New Tutorial on Principal Components Analysis

Principal components analysis (PCA) reduces the dimensionality of our data, allowing most of the variability to be explained using fewer variables than the original data set. This allows us to understand the primary features that can best represent our data. Check out the latest tutorial which covers PCA

New Tutorial on Logistic Regression

Logistic regression is a foundational analytic technique for classification problems. It allows us to estimate the probability of a categorical response based on one or more predictor variables and tells us if the presence of a predictor increases (or decreases) the probability of a given outcome by a specific percentage. Check out the recently added tutorial on logistic regression.

New Tutorial on Linear Regression

Linear regression is a useful and widely used statistical learning method and serves as a good jumping-off point for newer predictive analytic approaches. Check out the newly added linear regression tutorial that covers the basics of this power analytic technique.