mirror of
https://asciireactor.com/otho/cs-5821.git
synced 2024-11-24 14:05:05 +00:00
almost done with hw3
This commit is contained in:
parent
a817d948c9
commit
84a0b4cfc9
100
hw3/answers
100
hw3/answers
@ -73,11 +73,13 @@ Part B: Choose one of Questions 10 or 11
|
|||||||
(a) On average, what fraction of people with an odds of 0.37 of
|
(a) On average, what fraction of people with an odds of 0.37 of
|
||||||
defaulting on their credit card payment will in fact default?
|
defaulting on their credit card payment will in fact default?
|
||||||
|
|
||||||
|
.27
|
||||||
|
|
||||||
(b) Suppose that an individual has a 16 % chance of defaulting
|
(b) Suppose that an individual has a 16 % chance of defaulting
|
||||||
on her credit card payment. What are the odds that she will de-
|
on her credit card payment. What are the odds that she will de-
|
||||||
fault?
|
fault?
|
||||||
|
|
||||||
|
.19
|
||||||
|
|
||||||
11. In this problem, you will develop a model to predict whether a
|
11. In this problem, you will develop a model to predict whether a
|
||||||
given car gets high or low gas mileage based on the Auto data
|
given car gets high or low gas mileage based on the Auto data
|
||||||
@ -90,28 +92,122 @@ Part B: Choose one of Questions 10 or 11
|
|||||||
data.frame() function to create a single data set containing
|
data.frame() function to create a single data set containing
|
||||||
both mpg01 and the other Auto variables.
|
both mpg01 and the other Auto variables.
|
||||||
|
|
||||||
|
> auto$mpg01
|
||||||
|
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0
|
||||||
|
[38] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
|
||||||
|
[75] 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
|
||||||
|
[112] 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
|
||||||
|
[149] 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 1 1
|
||||||
|
[186] 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0
|
||||||
|
[223] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0
|
||||||
|
[260] 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
|
||||||
|
[297] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
|
||||||
|
[334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1
|
||||||
|
[371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
|
||||||
|
|
||||||
|
|
||||||
(b) Explore the data graphically in order to investigate the
|
(b) Explore the data graphically in order to investigate the
|
||||||
associ- ation between mpg01 and the other features. Which of the
|
associ- ation between mpg01 and the other features. Which of the
|
||||||
other features seem most likely to be useful in predicting mpg01
|
other features seem most likely to be useful in predicting mpg01
|
||||||
? Scat- terplots and boxplots may be useful tools to answer this
|
? Scat- terplots and boxplots may be useful tools to answer this
|
||||||
ques- tion. Describe your findings.
|
ques- tion. Describe your findings.
|
||||||
|
|
||||||
|
Horsepower clearly has the best relationship from the
|
||||||
|
scatter plots. All of the mpgs over the median are on one
|
||||||
|
side of the plot. Weight and acceleration are alright, but
|
||||||
|
there is significant overlap between middle values.
|
||||||
|
Displacement is on the cusp and the other variables don't
|
||||||
|
have a terribly useful relationship with this median.
|
||||||
|
|
||||||
(c) Split the data into a training set and a test set.
|
(c) Split the data into a training set and a test set.
|
||||||
|
|
||||||
|
Seems like a 50/50 random sampling is appropriate enough.
|
||||||
|
|
||||||
|
> training_indices = sample(nrow(auto),397/2)
|
||||||
|
> train_bools = rep(F,length(auto$mpg))
|
||||||
|
> train_bools[training_indices]=T
|
||||||
|
> head(train_bools)
|
||||||
|
[1] FALSE TRUE FALSE FALSE TRUE FALSE
|
||||||
|
> length(train_bools)
|
||||||
|
[1] 397
|
||||||
|
> train_data = auto[train_bools,]
|
||||||
|
> test_data = auto[!train_bools,]
|
||||||
|
|
||||||
|
|
||||||
(d) Perform LDA on the training data in order to predict mpg01
|
(d) Perform LDA on the training data in order to predict mpg01
|
||||||
using the variables that seemed most associated with mpg01 in
|
using the variables that seemed most associated with mpg01 in
|
||||||
(b). What is the test error of the model obtained?
|
(b). What is the test error of the model obtained?
|
||||||
|
|
||||||
|
> lda.fit
|
||||||
|
Call:
|
||||||
|
lda(mpg01 ~ horsepower + weight + acceleration + displacement,
|
||||||
|
data = train_data)
|
||||||
|
|
||||||
|
Prior probabilities of groups:
|
||||||
|
0 1
|
||||||
|
0.5431472 0.4568528
|
||||||
|
|
||||||
|
Group means:
|
||||||
|
horsepower weight acceleration displacement
|
||||||
|
0 129.08411 3557.757 14.55981 269.729
|
||||||
|
1 79.64444 2345.233 16.39222 116.800
|
||||||
|
|
||||||
|
Coefficients of linear discriminants:
|
||||||
|
LD1
|
||||||
|
horsepower 0.005678626
|
||||||
|
weight -0.001137499
|
||||||
|
acceleration -0.014950459
|
||||||
|
displacement -0.007401647
|
||||||
|
|
||||||
|
|
||||||
|
Error Rate against test data:
|
||||||
|
> mean(lda.pred$class!=test_data$mpg01,na.rm=T)
|
||||||
|
[1] 0.1179487
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
(e) Perform QDA on the training data in order to predict mpg01
|
(e) Perform QDA on the training data in order to predict mpg01
|
||||||
using the variables that seemed most associated with mpg01 in
|
using the variables that seemed most associated with mpg01 in
|
||||||
(b). What is the test error of the model obtained?
|
(b). What is the test error of the model obtained?
|
||||||
|
|
||||||
|
> qda.fit=qda(mpg01 ~ horsepower + weight + acceleration + displacement,data=train_data)
|
||||||
|
> qda.fit
|
||||||
|
Call:
|
||||||
|
qda(mpg01 ~ horsepower + weight + acceleration + displacement,
|
||||||
|
data = train_data)
|
||||||
|
|
||||||
|
Prior probabilities of groups:
|
||||||
|
0 1
|
||||||
|
0.5431472 0.4568528
|
||||||
|
|
||||||
|
Group means:
|
||||||
|
horsepower weight acceleration displacement
|
||||||
|
0 129.08411 3557.757 14.55981 269.729
|
||||||
|
1 79.64444 2345.233 16.39222 116.800
|
||||||
|
|
||||||
|
Error Rate:
|
||||||
|
> mean(qda.pred$class!=test_data$mpg01,na.rm=T)
|
||||||
|
[1] 0.1025641
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
(f) Perform logistic regression on the training data in order to
|
(f) Perform logistic regression on the training data in order to
|
||||||
pre- dict mpg01 using the variables that seemed most associated
|
pre- dict mpg01 using the variables that seemed most associated
|
||||||
with mpg01 in (b). What is the test error of the model obtained?
|
with mpg01 in (b). What is the test error of the model obtained?
|
||||||
|
|
||||||
|
> glm.fit=glm(mpg01 ~ horsepower + weight + acceleration + displacement,data=train_data,family=binomial)
|
||||||
|
> glm.probs=predict(glm.fit,test_data,type="response")
|
||||||
|
> glm.pred=rep(0,199)
|
||||||
|
> glm.pred[glm.probs>.5]=1
|
||||||
|
> mean(glm.pred!=test_data$mpg01)
|
||||||
|
[1] 0.120603
|
||||||
|
|
||||||
|
|
||||||
(g) Perform KNN on the training data, with several values of K,
|
(g) Perform KNN on the training data, with several values of K,
|
||||||
in order to predict mpg01 . Use only the variables that seemed
|
in order to predict mpg01 . Use only the variables that seemed
|
||||||
most associated with mpg01 in (b). What test errors do you
|
most associated with mpg01 in (b). What test errors do you
|
||||||
obtain? Which value of K seems to perform the best on this data
|
obtain? Which value of K seems to perform the best on this data
|
||||||
set?
|
set?
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user