2017-01-25 03:48:35 +00:00
|
|
|
1. For each of parts (a) through (d), indicate whether we would generally
|
|
|
|
expect the performance of a flexible statistical learning method to be
|
|
|
|
better or worse than an inflexible method. Justify your answer.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
(a) The sample size n is extremely large, and the number of predic- tors p
|
|
|
|
is small.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
This seems to still depend on how the data are distributed, but
|
|
|
|
generally, I would say a less flexible method will perform better here,
|
|
|
|
given that we have a large number of observations to average over.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
(b) The number of predictors p is extremely large, and the number of
|
|
|
|
observations n is small.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
We might want a more flexible method in this case, since the data are
|
|
|
|
sparse and we want a model that responds smoothly to possible large
|
|
|
|
changes along and across predictors.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
(c) The relationship between the predictors and response is highly non-
|
|
|
|
linear.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
A more-flexible model will clearly be expected to have better
|
|
|
|
performance here, as it will reflect the non-linear nature of the real
|
|
|
|
function.
|
|
|
|
|
|
|
|
(d) The variance of the error terms is extremely high.
|
|
|
|
|
|
|
|
A less-flexible function will likely respond better here, because the
|
|
|
|
bias-variance trade-off is concerned with nuanced differences that are
|
|
|
|
overwhelmed in a high-ε situation. The variance of f̂ and the bias of
|
|
|
|
f̂ are insignificant compared to the variance of the error ε, so we
|
|
|
|
don't gain predictability by attempting to reduce them.
|
|
|
|
|
|
|
|
|
|
|
|
2. Explain whether each scenario is a classification or regression problem,
|
|
|
|
and indicate whether we are most interested in inference or prediction.
|
|
|
|
Finally, provide n and p.
|
|
|
|
|
|
|
|
(a) We collect a set of data on the top 500 firms in the US. For each firm
|
|
|
|
we record profit, number of employees, industry and the CEO salary. We are
|
|
|
|
interested in understanding which factors affect CEO salary.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
|
|
|
p = 4
|
|
|
|
n = 500
|
2017-01-25 03:48:35 +00:00
|
|
|
This is a regression problem, as we're predicting
|
|
|
|
numerical values using numerical values. Prediction is interesting
|
|
|
|
here, because we want to be able to predict CEO salary as a function of
|
|
|
|
the predictors we find significant of the 4 available.
|
|
|
|
|
|
|
|
(b) We are considering launching a new product and wish to know whether it
|
|
|
|
will be a success or a failure. We collect data on 20 similar products that
|
|
|
|
were previously launched. For each prod- uct we have recorded whether it
|
|
|
|
was a success or failure, price charged for the product, marketing budget,
|
|
|
|
competition price, and ten other variables.
|
|
|
|
|
2017-01-23 10:18:35 +00:00
|
|
|
p=14
|
|
|
|
n=20
|
2017-01-25 03:48:35 +00:00
|
|
|
Another prediction problem, because we're interested in a
|
|
|
|
predicted outcome -- success or failure -- as a function of the various
|
|
|
|
predictors. This could be considered semi-categorical, since at least
|
|
|
|
one predictor has a classification nature, but I would say it is a
|
|
|
|
classification problem because the goal is to predict a class: failure
|
|
|
|
or success.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
|
|
|
(c) We are interesting in predicting the % change in the US dollar in
|
2017-01-25 03:48:35 +00:00
|
|
|
relation to the weekly changes in the world stock markets. Hence we collect
|
|
|
|
weekly data for all of 2012. For each week we record the % change in the
|
|
|
|
dollar, the % change in the US market, the % change in the British market,
|
|
|
|
and the % change in the German market.
|
|
|
|
|
2017-01-23 10:18:35 +00:00
|
|
|
n=52
|
|
|
|
p=4
|
2017-01-25 03:48:35 +00:00
|
|
|
A clear regression setting, but this is an inference problem,
|
|
|
|
not a prediction problem. With inference, we have a starting place and
|
|
|
|
attempt to predict the change in a variable as a function of other
|
|
|
|
observed rates: in this case, we have a known US dollar price, and we
|
|
|
|
want to predict how it will change given rate shifts in other markets,
|
|
|
|
so inference clearly applies.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4. You will now think of some real-life applications for statistical
|
|
|
|
learning.
|
|
|
|
|
|
|
|
(a) Describe three real-life applications in which classification might be
|
|
|
|
useful. Describe the response, as well as the predictors. Is the goal of
|
|
|
|
each application inference or prediction? Explain your answer.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
Image identification. The predictors could be things like "distribution
|
|
|
|
of greyscale intensity", "distribution of colors", and any number of
|
|
|
|
clever things I'm sure machine learning professionals have thought up.
|
|
|
|
The response is the most probably classification. This is a prediction.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
Galactic classification. Really this is very similar to general image
|
|
|
|
identification, but we classify galaxies using very specific spectral
|
|
|
|
bands for the predictors that involve light intensity, but then we also
|
|
|
|
look at how strong particular spikes or dips in the spectrum are, so we
|
|
|
|
might have predictors for "emission line strength" for several spectral
|
|
|
|
features. The response is the most likely galactic classification. This
|
|
|
|
is a prediction.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
Speech recognition. The predictors would perhaps be the audio spectrum,
|
|
|
|
with the response being the word the audio spectrum corresponds to.
|
|
|
|
This would predict the most likely word for the audio received.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
(b) Describe three real-life applications in which regression might be
|
|
|
|
useful. Describe the response, as well as the predictors. Is the goal of
|
|
|
|
each application inference or prediction? Explain your answer.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
Marketing data is obvious. The predictor is perhaps how much was spent
|
|
|
|
on a certain type of marketing, or a few types of marketing -- this is
|
|
|
|
now sounding like the example from the book. The response is an amount
|
|
|
|
sold for the same fiscal period. You could use inference or prediction
|
|
|
|
here: inference to how how many addition sales you might add by
|
|
|
|
spending marketing funds, or prediction by asking just "how many sales
|
|
|
|
did we see when we spent X amount on marketing?"
|
2017-01-24 09:37:37 +00:00
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
I want to try to use this for my project: understanding the time delay,
|
|
|
|
or reverberation, of a dynamic spectral feature compared against a
|
|
|
|
similarly dynamic reference feature. 2 predictors, line-of-sight
|
|
|
|
velocity and time delay, give a response of light intensity. Our task
|
|
|
|
is to predict the light intensity as a function of these predictors.
|
|
|
|
This is actually a vanguard question in astrophysics, and I'll bet
|
|
|
|
somebody is already trying to do this!
|
|
|
|
|
|
|
|
Maybe something municipal. I could predict the taxable income of a city
|
|
|
|
based on a number of predictors, like availability of mass transit or
|
|
|
|
highways, demographics, resources, distance to neighbouring cities, and
|
|
|
|
all kinds of things, then the response would continue to just be
|
|
|
|
taxable income given all of these inputs. Perhaps it would be good to
|
|
|
|
consider an inference questions here, for example: how would my city's
|
|
|
|
taxable change if I increased the availability of public transit?
|
|
|
|
|
|
|
|
|
|
|
|
(c) Describe three real-life applications in which cluster analysis might
|
|
|
|
be useful.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
2017-01-24 09:37:37 +00:00
|
|
|
Categorizing star type by spectral band strengths.
|
2017-01-25 03:48:35 +00:00
|
|
|
|
2017-01-24 09:37:37 +00:00
|
|
|
Plant and animal species identification.
|
2017-01-25 03:48:35 +00:00
|
|
|
|
2017-01-24 09:37:37 +00:00
|
|
|
Tracking objects in sensor data.
|
2017-01-23 10:18:35 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2017-01-25 03:48:35 +00:00
|
|
|
9. This exercise involves the Auto data set studied in the lab. Make sure
|
|
|
|
that the missing values have been removed from the data.
|
|
|
|
|
|
|
|
(a) Which of the predictors are quantitative, and which are quali- tative?
|
|
|
|
|
|
|
|
mpg, horsepower, weight, acceleration, and displacement are all clearly
|
|
|
|
quantitative.
|
|
|
|
|
|
|
|
cylinders I think is arguably qualitative because each number of
|
|
|
|
cylinders defines a somewhat broad class of vehicles. For the years,
|
|
|
|
the same argument might apply: each year is a class of vehicles. The
|
|
|
|
origin is clearly qualitative, and so is name.
|
|
|
|
|
|
|
|
(b) What is the range of each quantitative predictor? You can an- swer this
|
|
|
|
using the range() function.
|
|
|
|
|
|
|
|
$mpg
|
|
|
|
[1] 9.0 46.6
|
|
|
|
|
|
|
|
$cylinders
|
|
|
|
[1] 3 8
|
|
|
|
|
|
|
|
$displacement
|
|
|
|
[1] 68 455
|
|
|
|
|
|
|
|
$horsepower
|
|
|
|
[1] 46 230
|
|
|
|
|
|
|
|
$weight
|
|
|
|
[1] 1613 5140
|
|
|
|
|
|
|
|
$acceleration
|
|
|
|
[1] 8.0 24.8
|
|
|
|
|
|
|
|
$year
|
|
|
|
[1] 70 82
|
|
|
|
|
|
|
|
(c) What is the mean and standard deviation of each quantitative predictor?
|
|
|
|
|
|
|
|
$mpg
|
|
|
|
mu sigma
|
|
|
|
23.445918 7.805007
|
|
|
|
|
|
|
|
$cylinders
|
|
|
|
mu sigma
|
|
|
|
5.471939 1.705783
|
|
|
|
|
|
|
|
$displacement
|
|
|
|
mu sigma
|
|
|
|
194.412 104.644
|
|
|
|
|
|
|
|
$horsepower
|
|
|
|
mu sigma
|
|
|
|
104.46939 38.49116
|
|
|
|
|
|
|
|
$weight
|
|
|
|
mu sigma
|
|
|
|
2977.5842 849.4026
|
|
|
|
|
|
|
|
$acceleration
|
|
|
|
mu sigma
|
|
|
|
15.541327 2.758864
|
|
|
|
|
|
|
|
$year
|
|
|
|
mu sigma
|
|
|
|
75.979592 3.683737
|
|
|
|
|
|
|
|
(d) Now remove the 10th through 85th observations. What is the range, mean,
|
|
|
|
and standard deviation of each predictor in the subset of the data that
|
|
|
|
remains?
|
|
|
|
|
|
|
|
$mpg
|
|
|
|
mu sigma
|
|
|
|
24.404430 7.867283
|
|
|
|
|
|
|
|
$cylinders
|
|
|
|
mu sigma
|
|
|
|
5.373418 1.654179
|
|
|
|
|
|
|
|
$displacement
|
|
|
|
mu sigma
|
|
|
|
187.24051 99.67837
|
|
|
|
|
|
|
|
$horsepower
|
|
|
|
mu sigma
|
|
|
|
100.72152 35.70885
|
|
|
|
|
|
|
|
$weight
|
|
|
|
mu sigma
|
|
|
|
2935.9715 811.3002
|
|
|
|
|
|
|
|
$acceleration
|
|
|
|
mu sigma
|
|
|
|
15.726899 2.693721
|
|
|
|
|
|
|
|
$year
|
|
|
|
mu sigma
|
|
|
|
77.145570 3.106217
|
|
|
|
|
|
|
|
|
|
|
|
I've now changed my mind and say that both year and cylinders are
|
|
|
|
quantitative, since there is plenty of sense about talking about the
|
|
|
|
mean and std in those predictors for this set of data.
|
|
|
|
|
|
|
|
(e) Using the full data set, investigate the predictors graphically, using
|
|
|
|
scatterplots or other tools of your choice. Create some plots highlighting
|
|
|
|
the relationships among the predictors. Comment on your findings.
|
|
|
|
|
|
|
|
(f) Suppose that we wish to predict gas mileage ( mpg ) on the basis of the
|
|
|
|
other variables. Do your plots suggest that any of the other variables
|
|
|
|
might be useful in predicting mpg ? Justify your answer.
|
|
|
|
|