rpusvm {rpud} | R Documentation |
The rpusvm
method trains an SVM model.
In the non-free rpudplus
add-on, the rpusvm
method
is implemented in NVIDIA CUDA, and assumes necessary
double precision CUDA hardware support. The SVM
model thus trained assumes ascending classification label ordering,
and has an independent sigma
coefficient for
probabilistic regression. It also includes possible scaling parameters.
Hence it is incompatible with the SVM model created by e1071
.
Despite the incompatibility of the SVM models, rpusvm
supports equivalent LIBSVM functionality in e1071
.
This method is not supported in the free rpud
package.
## S3 method for class 'formula'
rpusvm(formula, data = NULL, ..., subset, na.action =
na.omit, scale = TRUE, verbose = TRUE)
## Default S3 method:
rpusvm(x, y = NULL, scale = TRUE, type = NULL, kernel =
"radial", degree = 3, gamma = if (is.vector(x)) 1 else 1 / ncol(x),
coef0 = 0, cost = 1, nu = 0.5,
class.weights = NULL, cachesize = 100, tolerance = 0.001, epsilon = 0.1,
shrinking = TRUE, cross = 0, probability = FALSE, fitted = TRUE, seed = 0,
..., subset, na.action = na.omit, verbose = TRUE)
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘rpusvm’ is called from. |
x |
a data matrix, a vector, or a sparse matrix (object of class
|
y |
a response vector with one label for each row/component of
|
scale |
It can be a numeric vector or a single logical value.
If it is a numeric vector, the first and second vector elements
will be the lower and upper bounds
of each attributes in the |
type |
|
kernel |
the kernel used in training and predicting. You
might consider changing some of the following parameters, depending
on the kernel type.
|
degree |
parameter needed for kernel of type |
gamma |
parameter needed for all kernels except |
coef0 |
parameter needed for kernels of type |
cost |
cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation. |
nu |
parameter needed for |
class.weights |
a named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named. |
cachesize |
set cache memory in MB (default 100),
which maybe constrained by the GPU device memory in |
tolerance |
tolerance of termination criterion (default: 0.001) |
epsilon |
epsilon in the insensitive-loss function (default: 0.1) |
shrinking |
option whether to use the shrinking-heuristics (default: |
cross |
if a integer value k>0 is specified, a k-fold cross validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Squared Error for regression |
probability |
logical indicating whether the model should allow for probability predictions. |
fitted |
logical indicating whether the fitted values should be computed
and included in the model or not (default: |
seed |
integer indicating the seed of random number generator
used in cross-validation and probabilistic inference (default: |
... |
additional parameters for the low level fitting function
|
subset |
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if |
verbose |
logical indicating whether progress information should be displayed. (default: |
For multiclass-classification with k levels, k>2, libsvm
uses the
‘one-against-one’-approach, in which k(k-1)/2 binary classifiers are
trained; the appropriate class is found by a voting scheme.
libsvm
internally uses a sparse data representation, which is
also high-level supported by the package SparseM.
If the predictor variables include factors, the formula interface must be used to get a correct model matrix.
plot.rpusvm
allows a simple graphical
visualization of classification models.
The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization. The probabilistic regression model assumes (zero-mean) laplace-distributed errors for the predictions, and estimates the scale parameter using maximum likelihood.
An object of class "rpusvm"
containing the fitted model, including:
SV |
The resulting support vectors (possibly scaled). |
index |
The index of the resulting support vectors in the data
matrix. Note that this index refers to the preprocessed data (after
the possible effect of |
coefs |
The corresponding coefficients times the training labels. |
rho |
The negative intercept. |
sigma |
In case of a probabilistic regression model, the scale parameter of the hypothesized (zero-mean) laplace distribution estimated by maximum likelihood. |
probA, probB |
numeric vectors of length k(k-1)/2, k number of classes, containing the parameters of the logistic distributions fitted to the decision values of the binary classifiers (1 / (1 + exp(a x + b))). |
Data are scaled internally, usually yielding better results.
Parameters of SVM-models usually must be tuned to yield sensible results!
Chi Yau (based on R doc of svm
in e1071
by David Meyer)
chi.yau@r-tutor.com
Chih-Chung Chang and Chih-Jen Lin:
LIBSVM: a library for Support Vector Machines
http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chih-Chung Chang and Chih-Jen Lin:
LIBSVM: a library for Support Vector Machines
http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz
Rong-En Fan, Pai-Hsune Chen and Chih-Jen Lin:
Working Set Selection Using the Second Order Information for Training SVM
http://www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf
Austin Carpenter:
CuSVM: A CUDA implementation of support vector classification and regression
http://patternsonascreen.net/cuSVMDesc.pdf
predict.rpusvm
plot.rpusvm
matrix.csr
(in package SparseM)
## Not run:
library(rpud)
data(iris)
attach(iris)
## classification mode
# default with factor response:
model <- rpusvm(Species ~ ., data = iris)
# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- Species
model <- rpusvm(x, y)
print(model)
summary(model)
# test with train data
pred <- predict(model, x)
# (same as:)
pred <- fitted(model)
# Check accuracy:
table(pred, y)
# compute decision values and probabilities:
pred <- predict(model, x, decision.values = TRUE)
attr(pred, "decision.values")[1:4,]
# visualize (classes by color, SV by crosses):
plot(cmdscale(dist(iris[,-5])),
col = as.integer(iris[,5]),
pch = c("o","+")[1:150 %in% model$index + 1])
## try regression mode on two dimensions
# create data
x <- seq(0.1, 5, by = 0.05)
y <- log(x) + rnorm(x, sd = 0.2)
# estimate model and predict input values
m <- rpusvm(x, y)
new <- predict(m, x)
# visualize
plot(x, y)
points(x, log(x), col = 2)
points(x, new, col = 4)
## density-estimation
# create 2-dim. normal with rho=0:
X <- data.frame(a = rnorm(1000), b = rnorm(1000))
attach(X)
# traditional way:
m <- rpusvm(X, gamma = 0.1)
# formula interface:
m <- rpusvm(~., data = X, gamma = 0.1)
# or:
m <- rpusvm(~ a + b, gamma = 0.1)
# test:
newdata <- data.frame(a = c(0, 4), b = c(0, 4))
predict (m, newdata)
# visualize:
plot(X, col = 1:1000 %in% m$index + 1, xlim = c(-5,5), ylim=c(-5,5))
points(newdata, pch = "+", col = 2, cex = 5)
# weights: (example not particularly sensible)
i2 <- iris
levels(i2$Species)[3] <- "versicolor"
summary(i2$Species)
wts <- 100 / table(i2$Species)
wts
m <- rpusvm(Species ~ ., data = i2, class.weights = wts)
## End(Not run)