mirror of
https://asciireactor.com/otho/cs-5821.git
synced 2024-11-21 17:15:06 +00:00
added hw3 answers
This commit is contained in:
parent
8ffd07f1ef
commit
a817d948c9
0
ISLR Sixth Printing.pdf
Normal file → Executable file
0
ISLR Sixth Printing.pdf
Normal file → Executable file
0
hw1/.RData
Normal file → Executable file
0
hw1/.RData
Normal file → Executable file
0
hw1/.Rhistory
Normal file → Executable file
0
hw1/.Rhistory
Normal file → Executable file
0
hw1/Auto.data
Normal file → Executable file
0
hw1/Auto.data
Normal file → Executable file
0
hw1/answers
Normal file → Executable file
0
hw1/answers
Normal file → Executable file
0
hw1/assigned
Normal file → Executable file
0
hw1/assigned
Normal file → Executable file
0
hw1/auto_pairs.png
Normal file → Executable file
0
hw1/auto_pairs.png
Normal file → Executable file
Before Width: | Height: | Size: 101 KiB After Width: | Height: | Size: 101 KiB |
0
hw2/.RData
Normal file → Executable file
0
hw2/.RData
Normal file → Executable file
0
hw2/.Rhistory
Normal file → Executable file
0
hw2/.Rhistory
Normal file → Executable file
0
hw2/answers
Normal file → Executable file
0
hw2/answers
Normal file → Executable file
117
hw3/answers
Normal file
117
hw3/answers
Normal file
@ -0,0 +1,117 @@
|
||||
Part B: Choose one of Questions 10 or 11
|
||||
|
||||
5. We now examine the differences between LDA and QDA.
|
||||
|
||||
(a) If the Bayes decision boundary is linear, do we expect LDA
|
||||
or QDA to perform better on the training set? On the test set?
|
||||
|
||||
The QDA has more flexibility, so it will match the training
|
||||
set more closely than the LDA. The LDA will perform better
|
||||
on the test set because the real relationship is linear, so
|
||||
the QDA would have additional bias.
|
||||
|
||||
|
||||
(b) If the Bayes decision boundary is non-linear, do we expect
|
||||
LDA or QDA to perform better on the training set? On the test
|
||||
set?
|
||||
|
||||
QDA will still perform better on the training set, but now
|
||||
should also perform better than LDA on the test set, since
|
||||
QDA will account for the additional degree of freedom in the
|
||||
real relationship.
|
||||
|
||||
|
||||
(c) In general, as the sample size n increases, do we expect the
|
||||
test prediction accuracy of QDA relative to LDA to improve,
|
||||
decline, or be unchanged? Why?
|
||||
|
||||
Definitely increase. The LDA has an advantage when the
|
||||
training set is small because it is less sensitive to the
|
||||
fluctuations of those few data. As the size of the training
|
||||
set grows, the QDA is able to optimize its coefficients
|
||||
well, and assuming the real relationship is not linear, the
|
||||
QDA should eventually out-perform the LDA.
|
||||
|
||||
|
||||
(d) True or False: Even if the Bayes decision boundary for a
|
||||
given problem is linear, we will probably achieve a superior
|
||||
test error rate using QDA rather than LDA because QDA is
|
||||
flexible enough to model a linear decision boundary. Justify
|
||||
your answer.
|
||||
|
||||
False. The QDA will likely be biased because it will fit to
|
||||
training data that don't completely represent the
|
||||
relationship that will be observed in test data.
|
||||
|
||||
|
||||
8. Suppose that we take a data set, divide it into equally-sized
|
||||
training and test sets, and then try out two different
|
||||
classification procedures. First we use logistic regression and
|
||||
get an error rate of 20 % on the training data and 30 % on the
|
||||
test data. Next we use 1-nearest neighbors (i.e. K = 1) and get
|
||||
an average error rate (averaged over both test and training data
|
||||
sets) of 18 %. Based on these results, which method should we
|
||||
prefer to use for classification of new observations? Why?
|
||||
|
||||
Definitely 1-nearest neighbor. The logistic regression
|
||||
performed more poorly on the training data, to which it has
|
||||
been optimized as much as possible, and yet the nearest
|
||||
neighbor model performs better over the entire dataset.
|
||||
Considering the logistic regression performed even worse on
|
||||
the test data, the average error rate of the logistic
|
||||
regression over the training and test data is 25%. This
|
||||
suggests that the relationship may not even be linear, and
|
||||
the nearest-neighbor is a very solid method for modeling
|
||||
non-linear classifications, so if the real relationship is
|
||||
not linear, it easily explains why the nearest-neighbor
|
||||
method would do so much better. Everything here seems to
|
||||
point at using the nearest-neighbor.
|
||||
|
||||
|
||||
9. This problem has to do with odds.
|
||||
|
||||
(a) On average, what fraction of people with an odds of 0.37 of
|
||||
defaulting on their credit card payment will in fact default?
|
||||
|
||||
(b) Suppose that an individual has a 16 % chance of defaulting
|
||||
on her credit card payment. What are the odds that she will de-
|
||||
fault?
|
||||
|
||||
|
||||
|
||||
11. In this problem, you will develop a model to predict whether a
|
||||
given car gets high or low gas mileage based on the Auto data
|
||||
set.
|
||||
|
||||
(a) Create a binary variable, mpg01 , that contains a 1 if mpg
|
||||
contains a value above its median, and a 0 if mpg contains a
|
||||
value below its median. You can compute the median using the
|
||||
median() function. Note you may find it helpful to use the
|
||||
data.frame() function to create a single data set containing
|
||||
both mpg01 and the other Auto variables.
|
||||
|
||||
(b) Explore the data graphically in order to investigate the
|
||||
associ- ation between mpg01 and the other features. Which of the
|
||||
other features seem most likely to be useful in predicting mpg01
|
||||
? Scat- terplots and boxplots may be useful tools to answer this
|
||||
ques- tion. Describe your findings.
|
||||
|
||||
(c) Split the data into a training set and a test set.
|
||||
|
||||
(d) Perform LDA on the training data in order to predict mpg01
|
||||
using the variables that seemed most associated with mpg01 in
|
||||
(b). What is the test error of the model obtained?
|
||||
|
||||
(e) Perform QDA on the training data in order to predict mpg01
|
||||
using the variables that seemed most associated with mpg01 in
|
||||
(b). What is the test error of the model obtained?
|
||||
|
||||
(f) Perform logistic regression on the training data in order to
|
||||
pre- dict mpg01 using the variables that seemed most associated
|
||||
with mpg01 in (b). What is the test error of the model obtained?
|
||||
|
||||
(g) Perform KNN on the training data, with several values of K,
|
||||
in order to predict mpg01 . Use only the variables that seemed
|
||||
most associated with mpg01 in (b). What test errors do you
|
||||
obtain? Which value of K seems to perform the best on this data
|
||||
set?
|
0
lab2/.RData
Normal file → Executable file
0
lab2/.RData
Normal file → Executable file
0
lab2/.Rhistory
Normal file → Executable file
0
lab2/.Rhistory
Normal file → Executable file
0
lab2/Figure.pdf
Normal file → Executable file
0
lab2/Figure.pdf
Normal file → Executable file
0
lab2/lab.r
Normal file → Executable file
0
lab2/lab.r
Normal file → Executable file
0
lab2/program
Normal file → Executable file
0
lab2/program
Normal file → Executable file
0
project/ideas
Normal file → Executable file
0
project/ideas
Normal file → Executable file
0
usingR.pdf
Normal file → Executable file
0
usingR.pdf
Normal file → Executable file
Loading…
Reference in New Issue
Block a user