mirror of
				https://asciireactor.com/otho/cs-5821.git
				synced 2025-10-31 17:48:04 +00:00 
			
		
		
		
	finished h2 and added R log
This commit is contained in:
		
							parent
							
								
									3b3f51d41b
								
							
						
					
					
						commit
						8ffd07f1ef
					
				
							
								
								
									
										
											BIN
										
									
								
								hw2/.RData
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								hw2/.RData
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										57
									
								
								hw2/.Rhistory
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										57
									
								
								hw2/.Rhistory
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,57 @@ | ||||
| auto = read.table("auto.data",header=T,na.strings="?") | ||||
| auto | ||||
| attach(auto) | ||||
| horsepower | ||||
| plot(mpg ~ horsepower) | ||||
| fit = lm(mpg ~ horsepower) | ||||
| fit | ||||
| summary(fit) | ||||
| lines(fit) | ||||
| abline(fit) | ||||
| abline(fit,col="red") | ||||
| plot(mpg ~ horsepower,pch="x") | ||||
| abline(fit,col="red") | ||||
| abline(fit,col="red",size="2") | ||||
| abline(fit,col="red",lt="2") | ||||
| help(abline) | ||||
| abline(fit,col="red",lwd="2") | ||||
| abline(fit,col="red",lwd="4") | ||||
| summary(lm) | ||||
| summary(fit) | ||||
| help(predict) | ||||
| predict(fit,horsepower=98) | ||||
| predict(fit) | ||||
| predict(fit,98) | ||||
| predict(fit[98]) | ||||
| help(predict) | ||||
| help(predict.lm) | ||||
| predict(lm(mpg ~ 98) | ||||
| predict(lm(mpg ~ horsepower) | ||||
| predict(lm(mpg ~ 98)) | ||||
| pre | ||||
| predict(fit,data.frame(p=98( | ||||
| predict(fit,data.frame(p=98)) | ||||
| predict(fit,data.frame(p=c(98))) | ||||
| predict(fit,data.frame(p=c(98)),interval="confidence") | ||||
| predict(fit,data.frame(horesepower=c(98)),interval="confidence") | ||||
| predict(fit,interval="confidence") | ||||
| fit | ||||
| predict(fit,data.frame(c(98))interval="confidence") | ||||
| predict(fit,data.frame(p=c(98))interval="confidence") | ||||
| predict(fit,data.frame(p=c(98)),interval="confidence") | ||||
| predict(fit,data.frame(p=c(98)),interval="confidence") | ||||
| names(fit) | ||||
| coef(fit) | ||||
| confint(fit) | ||||
| predict(fit,data.frame(horsepower=c(1,98)),interval="confidence") | ||||
| predict(fit,data.frame(horsepower=c(98)),interval="confidence") | ||||
| predict(fit,data.frame(horsepower=98),interval="confidence") | ||||
| predict(fit,data.frame(horsepower=98),interval="prediction") | ||||
| dev.print(pdf,"mpg_horsepower_regression.pdf") | ||||
| plot(fit) | ||||
| par(mfrow=c(2,2)) | ||||
| plot(fit) | ||||
| dev.print(pdf,"fit_quality.pdf") | ||||
| save | ||||
| save() | ||||
| q() | ||||
							
								
								
									
										155
									
								
								hw2/answers
									
									
									
									
									
								
							
							
						
						
									
										155
									
								
								hw2/answers
									
									
									
									
									
								
							| @ -17,19 +17,19 @@ | ||||
| 	which of these predictors have a strong relationship with sales | ||||
| 	of this product. | ||||
| 
 | ||||
| 	TV marketing and radio marketing both have a strong relationship | ||||
| 	to sales, according to their linear regression p-values, but | ||||
| 	newspaper advertising does not appear to be effective, given | ||||
| 	that the linear model does not account for much of the variation | ||||
| 	in sales across that domain. We can conclude that cutting back | ||||
| 	on newspaper advertising will likely have little effect on the | ||||
| 	sales of the product, and that increasing TV and radio | ||||
| 	advertising budgets likely will have an effect. Furthermore, we | ||||
| 	can see that radio advertising spending has a stronger | ||||
| 	relationship with sales, as the best-fit slope is significantly | ||||
| 	more positive than the best fit for TV advertising spending, so | ||||
| 	increasing the radio advertising budget will likely be more | ||||
| 	effective. | ||||
|     TV marketing and radio marketing both have a strong relationship | ||||
|     to sales, according to their linear regression p-values, but | ||||
|     newspaper advertising does not appear to be effective, given | ||||
|     that the linear model does not account for much of the variation | ||||
|     in sales across that domain. We can conclude that cutting back | ||||
|     on newspaper advertising will likely have little effect on the | ||||
|     sales of the product, and that increasing TV and radio | ||||
|     advertising budgets likely will have an effect. Furthermore, we | ||||
|     can see that radio advertising spending has a stronger | ||||
|     relationship with sales, as the best-fit slope is significantly | ||||
|     more positive than the best fit for TV advertising spending, so | ||||
|     increasing the radio advertising budget will likely be more | ||||
|     effective. | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| @ -40,47 +40,142 @@ | ||||
|    dollars). Suppose we use least squares to fit the model, and get | ||||
|    β₀ = 50, β₁ = 20, β₂ = 0.07, β₃ = 35, β₄ = 0.01, β₅ = −10. | ||||
| 
 | ||||
| 	(a) Which answer is correct, and why? | ||||
| 		i. For a fixed value of IQ and GPA, males earn more on | ||||
| 		   average than females. | ||||
|     This is the model: ŷ = 50 + 20 X₁ + 0.07 X₂ + 35 X₃ + 0.01 X₄ + | ||||
|         -10 X₅ | ||||
| 
 | ||||
| 		ii. For a fixed value of IQ and GPA, females earn more on | ||||
| 		average than males. | ||||
|         For fixed IQ and GPA, we can infer that the starting salary | ||||
|         for a female sharing an IQ and GPA with her male counterpart | ||||
|         will make (35*1 - 10*(GPA*1)) more starting salary units | ||||
|         than her male counterpart. This means that at very low | ||||
|         GPAs(maybe this includes people who didn't attend school?), | ||||
|         males have a lower starting wage, and as GPA grows, males | ||||
|         make a larger starting salary from that point, overtaking | ||||
|         females at GPA=3.5. Therefore, | ||||
| 
 | ||||
| 		iii. For a fixed value of IQ and GPA, males earn more on | ||||
| 		average than females provided that the GPA is high enough. | ||||
|     (a) Which answer is correct, and why? →  iii. For a fixed value | ||||
|         of IQ and GPA, males earn more on average than females | ||||
|         provided that the GPA is high enough. | ||||
|              | ||||
|             This one is correct. | ||||
| 
 | ||||
| 		iv. For a fixed value of IQ and GPA, females earn more on | ||||
| 		average than males provided that the GPA is high enough. | ||||
|     (b) Predict the salary of a female with IQ of 110 and a GPA of | ||||
|     4.0. | ||||
| 
 | ||||
| 	(b) Predict the salary of a female with IQ of 110 and a GPA of | ||||
| 	4.0. | ||||
|         ŷ = 50 + 20*4.0 + 0.07*110 + 35*1 + 0.01*(4.0*110) - 10*(4.0*1) | ||||
| 
 | ||||
| 	(c) True or false: Since the coefficient for the GPA/IQ | ||||
| 	interaction term is very small, there is very little evidence of | ||||
| 	an interaction effect. Justify your answer. | ||||
|         → ŷ = 137.1 salary units | ||||
| 
 | ||||
|     (c) True or false: Since the coefficient for the GPA/IQ | ||||
|     interaction term is very small, there is very little evidence of | ||||
|     an interaction effect. Justify your answer. | ||||
| 
 | ||||
|         False. There is still a noticeable effect because the | ||||
|         coefficient for IQ's effect alone is only 7 times greater | ||||
|         than the coefficient of the interaction term. So, this term | ||||
|         holds significant weight compared to the overall | ||||
|         response of the model to IQ. | ||||
| 
 | ||||
| 
 | ||||
| 4. I collect a set of data (n = 100 observations) containing a | ||||
|    single predictor and a quantitative response. I then fit a linear | ||||
|    regression model to the data, as well as a separate cubic | ||||
|    regression, i.e. Y = β₀ + β₁ X + β₂ X² + β₃ X³ + . | ||||
|    regression, i.e. Y = β₀ + β₁ X + β₂ X² + β₃ X³ + ε. | ||||
| 
 | ||||
| 	(a) Suppose that the true relationship between X and Y is | ||||
| 	linear, i.e. Y = β₀ + β₁ X + . Consider the training residual | ||||
| 	linear, i.e. Y = β₀ + β₁ X + ε. Consider the training residual | ||||
| 	sum of squares (RSS) for the linear regression, and also the | ||||
| 	training RSS for the cubic regression. Would we expect one to be | ||||
| 	lower than the other, would we expect them to be the same, or is | ||||
| 	there not enough information to tell? Justify your answer. | ||||
| 
 | ||||
|         For the training data, the cubic regression might return a | ||||
|         better RSS than the linear regression, but this would only | ||||
|         be because the cubic is fitting points that are varied | ||||
|         according to the ε random error. It also may not, depending | ||||
|         on how that random error expressed itself in this case. | ||||
| 
 | ||||
| 	(b) Answer (a) using test rather than training RSS. | ||||
| 
 | ||||
|         For the test error, the RSS will almost certainly be greater | ||||
|         for the cubic model than the linear model, because the | ||||
|         random error ε will likely express itself in a way that is | ||||
|         inconsistent with the noise that the cubic model adopted | ||||
|         during its training. The linear model will be more likely to | ||||
|         have a lower RSS the more test data is used against the | ||||
|         models. | ||||
| 
 | ||||
| 	(c) Suppose that the true relationship between X and Y is not | ||||
| 	linear, but we don’t know how far it is from linear. Consider | ||||
| 	the training RSS for the linear regression, and also the | ||||
| 	training RSS for the cubic regression. Would we expect one to be | ||||
| 	lower than the other, would we expect them to be the same, or is | ||||
| 	there not enough information to tell? Justify your answer. (d) | ||||
| 	Answer (c) using test rather than training RSS. | ||||
| 	Answer (c) using test rather than training RSS. | ||||
| 
 | ||||
|         The cubic model will pick up more information because of its | ||||
|         additional degrees of freedom. If the true relationship is | ||||
|         more complex than linear, then the cubic model will likely | ||||
|         have a lower RSS over the linear model. If the model is less | ||||
|         complex than linear (E.G. perhaps it is just a constant | ||||
|         scalar relationship) then the linear model will still be | ||||
|         more likely to have a smaller RSS, because the cubic will | ||||
|         again pick up information from the ε noise that is not | ||||
|         inherent in the real relationship. | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 8. This question involves the use of simple linear regression on the Auto | ||||
| data set. | ||||
| 
 | ||||
|     (a) Use the lm() function to perform a simple linear regression | ||||
|     with mpg as the response and horsepower as the predictor. Use | ||||
|     the summary() function to print the results. Comment on the | ||||
|     output. For example: | ||||
| 
 | ||||
|         There is definitely a correlation between horsepower and | ||||
|         mpg. The RSE is ~4.9, which is not insignificant and does | ||||
|         indicate that the response may not be truly linear, but it | ||||
|         is small enough relative to the mpg magnitude that it's | ||||
|         clear a relationship exists. The R² statistics corroborates | ||||
|         this by indicating (it has a small value at ~0.6) that a | ||||
|         large proportion of the mpg variability is explained by the | ||||
|         model. mpg has a negative correlation with horsepower, | ||||
|         indicated by the negative coefficient on the horsepower | ||||
|         factor. | ||||
| 
 | ||||
| 
 | ||||
|         For example, for a vehicle with 98 horsepower, one can | ||||
|         expect with 95% confidence that the mpg will be within 23.97 | ||||
|         and 24.96, if the vehicles follow our model. However, after | ||||
|         incorporating the irreducible error, the prediction turns | ||||
|         out to be much less precise, with a 95% prediction interval | ||||
|         spanning 14.8 to 34.1. Some of this variability may also be | ||||
|         reduced by using a quadratic model, from visual inspection | ||||
|         of the plot. | ||||
| 
 | ||||
|     (b) Plot the response and the predictor. Use the abline() function | ||||
|     to display the least squares regression line. | ||||
| 
 | ||||
|         Attached. | ||||
| 
 | ||||
|     (c) Use the plot() function to produce diagnostic plots of the least | ||||
|     squares regression fit. Comment on any problems you see with | ||||
|     the fit. | ||||
| 
 | ||||
|         Attached. From these four plots it's clear there is a lot of | ||||
|         variability that remains unexplained by the linear model. | ||||
|         The standardized residuals plotted against the fitted values | ||||
|         shows clearly that the variability is strong, with values | ||||
|         consistenly lying outside 1 standardized residual unit, but | ||||
|         still within a tight range that doesn't extend past 3, which | ||||
|         is often considered an approximate threshold to indicate | ||||
|         values that aren't explained well by the model. There are | ||||
|         many points with high leverage, and these values have less | ||||
|         residual by default, of course, and in both of these graphs | ||||
|         we see a few points (323, 330) that are rearing their ugly | ||||
|         heads. These seems to be the bit of "uptick" toward the | ||||
|         higher end of the horsepower scale that would probably be | ||||
|         picked up by a quadratic fit. | ||||
| 
 | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user