mirror of
				https://asciireactor.com/otho/cs-5821.git
				synced 2025-10-31 17:48:04 +00:00 
			
		
		
		
	added answers
This commit is contained in:
		
							parent
							
								
									460fa7c4a8
								
							
						
					
					
						commit
						65c3edbed6
					
				
							
								
								
									
										87
									
								
								hw1/answers
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										87
									
								
								hw1/answers
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,87 @@ | ||||
| 1. For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer. | ||||
| 
 | ||||
| (a) The sample size n is extremely large, and the number of predic- | ||||
| tors p is small. | ||||
|     This seems to still depend on how the data are distributed, but generally, I would say a less flexible method will perform better here, given that we have a large number of observations to average over. | ||||
| 
 | ||||
| (b) The number of predictors p is extremely large, and the number | ||||
| of observations n is small. | ||||
|     We might want a more flexible method in this case, since the data are sparse and we want a model that responds smoothly to possible large changes along and across predictors. | ||||
| 
 | ||||
| (c) The relationship between the predictors and response is highly | ||||
| non-linear. | ||||
|     A more-flexible will clearly be expected to have better performance here, as it will reflect the non-linear nature of the real function. | ||||
| (d) The variance of the error terms is extremely | ||||
| high. | ||||
|     A less-flexible function will likely respond better here, because the bias-variance trade-off is concerned with nuanced differences that are overwhelmed in a high-ε situation. The variance of f̂ and the bias of f̂ are insignificant compared to the variance of the error ε, so we don't gain predictability by attempting to reduce them. | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 2. Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p. | ||||
| 
 | ||||
| (a) We collect a set of data on the top 500 firms in the US. For each | ||||
| firm we record profit, number of employees, industry and the | ||||
| CEO salary. We are interested in understanding which factors | ||||
| affect CEO salary. | ||||
|     p = 4 | ||||
|     n = 500 | ||||
|     This is a regression problem, as we're predicting numerical values using numerical values. Prediction is interesting here, because we want to be able to predict CEO salary as a function of the predictors we find significant of the 4 available. | ||||
| 
 | ||||
| (b) We are considering launching a new product and wish to know | ||||
| whether it will be a success or a failure. We collect data on 20 | ||||
| similar products that were previously launched. For each prod- | ||||
| uct we have recorded whether it was a success or failure, price | ||||
| charged for the product, marketing budget, competition price, | ||||
| and ten other variables. | ||||
|     p=14 | ||||
|     n=20 | ||||
|     Another prediction problem, because we're interested in a predicted outcome -- success or failure -- as a function of the various predictors. This could be considered semi-categorical, since at least one predictor has a classification nature, but I would say it is a classification problem because the goal is to predict a class: failure or success. | ||||
| 
 | ||||
| (c) We are interesting in predicting the % change in the US dollar in | ||||
| relation to the weekly changes in the world stock markets. Hence | ||||
| we collect weekly data for all of 2012. For each week we record | ||||
| the % change in the dollar, the % change in the US market, | ||||
| the % change in the British market, and the % change in the | ||||
| German market. | ||||
|     n=52 | ||||
|     p=4 | ||||
|     A clear regression setting, but this is an inference problem, not a prediction problem. With inference, we have a starting place and attempt to predict the change in a variable as a function of other observed rates: in this case, we have a known US dollar price, and we want to predict how it will change given rate shifts in other markets, so inference clearly applies. | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 4. You will now think of some real-life applications for statistical learning. | ||||
| 
 | ||||
| (a) Describe three real-life applications in which classification might be useful. Describe the response, as well as the predictors. Is the goal of each application inference or prediction? Explain your | ||||
| answer. | ||||
| 
 | ||||
| (b) Describe three real-life applications in which regression might | ||||
| be useful. Describe the response, as well as the predictors. Is the | ||||
| goal of each application inference or prediction? Explain your | ||||
| answer. | ||||
| 
 | ||||
| (c) Describe three real-life applications in which cluster analysis | ||||
| might be useful. | ||||
|     Star categories using spectral strengths | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 9. This exercise involves the Auto data set studied in the lab. Make sure | ||||
| that the missing values have been removed from the data. | ||||
| (a) Which of the predictors are quantitative, and which are quali- | ||||
| tative? | ||||
| (b) What is the range of each quantitative predictor? You can an- | ||||
| swer this using the range() function. | ||||
| (c) What is the mean and standard deviation of each quantitative | ||||
| predictor? | ||||
| (d) Now remove the 10th through 85th observations. What is the | ||||
| range, mean, and standard deviation of each predictor in the | ||||
| subset of the data that remains? | ||||
| (e) Using the full data set, investigate the predictors graphically, | ||||
| using scatterplots or other tools of your choice. Create some plots | ||||
| highlighting the relationships among the predictors. Comment | ||||
| on your findings. | ||||
| (f) Suppose that we wish to predict gas mileage ( mpg ) on the basis | ||||
| of the other variables. Do your plots suggest that any of the | ||||
| other variables might be useful in predicting mpg ? Justify your | ||||
| answer. | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user