#### Question Preview:

1. This question concerns a dataset on Mammals and the amount of sleep they have. The data is in spreadsheet Mammals.xls. The variables are:
Wt = weight of the mammal, in kg
Brain = brain size of mammal, in g.
Gest = length of gestation (pregnancy) for animal, in days
Lifespan = Maximum number of years the mammal typically lives.
Sleep = typical hours of sleep per day/night.
Dream = hours of sleep in a dreaming state
nDream = hours of sleep in a non-dreaming state
Pred = where the mammal fits on a scale from totally predator (1) to totally prey (5)
Expose = How exposed to ...

View Complete Question >>

#### Question Preview:

1. This question concerns a dataset on Mammals and the amount of sleep they have. The data is in spreadsheet Mammals.xls. The variables are:
Wt = weight of the mammal, in kg
Brain = brain size of mammal, in g.
Gest = length of gestation (pregnancy) for animal, in days
Lifespan = Maximum number of years the mammal typically lives.
Sleep = typical hours of sleep per day/night.
Dream = hours of sleep in a dreaming state
nDream = hours of sleep in a non-dreaming state
Pred = where the mammal fits on a scale from totally predator (1) to totally prey (5)
Expose = How exposed to predators the animal is when sleeping (1 = not exposed, 5= very exposed)
Danger = An index suggesting how much in danger the animal is when sleeping (partly combination of Pred and Expose. 1=in little danger, 5=in great danger)
(a) (i) Produce a scatterplot of Brain (Y) vs Wt (X), showing the least squares regression line and 95% prediction bounds (also known as individual confidence intervals).
(Hint: you may use the Graph> Chart builder > Scatterplot and edit the graph to include line and prediction bounds. )
(ii) Comment from the plot whether the regression line is a good model (mention any model imperfections).
(iii) Interpret the value of the R2, in words. (Interpret means give it meaning, not just state it).
(iv) Identify which mammals lie outside the prediction bounds and what is unusual about them. (name the mammals).
(Hint: if working in SPSS, double click on the graph, then select the Data Label Mode button. Then if you click on a point it will display the row number for the animal. )
(v) Also identify from the graph which two mammals will have the highest leverage. (Hint you shouldn’t need to do a calculation, but explain how you know it’s them).
(b) (i) Calculate two new variables 'logBrain' = log10(Brain) and 'logWt' = log10(Wt). Again produce a scatterplot of logBrain (Y) vs logWt (X), showing the least squares regression line and 95% prediction bounds.
(Hint: If working in SPSS, you may use the Transform > Compute Variable > type in logBrain as target variable and type in lg10(brain) as the Numerical expression. Similarly for logWt. )
(ii) Comment on the shape of the graph of points. Is this a better model than before? Why?
(iii) Identify which mammals lie outside the prediction bounds and explain what is unusual about them. (name the mammals). You can label the points on the same graph as in part (a).
(c) Fit a regression of logBrain on LogWt. Save the studentized residuals and individual 95% prediction intervals. Show relevant output.
(Hint in SPSS, Use Analyze > Regression > Linear… with logBrain as the dependent and logWt as the independent variable. Use the Save button and select the appropriate options. )
(i) Produce a normal probability plot of the studentized (i.e. standardized) residuals. Do you conclude that the errors are normally distributed? (Hint: use Analyze > Explore, and tick the option for Normality Plot with Tests.)
(ii) What is the 95% prediction interval of logbrain for Man. Convert this to a 95% Prediction interval for brain weight (in grams). Do you think this is a useful interval for predicting the brain sizes of other animals that are about the same size as Man? Explain why or why not.
2. Still using the Mammals dataset we now focus on the variable Sleep = the number of hours per day/night than an animal sleeps, on average.
(a) Produce a boxplot of Sleep, split by category variable Danger. The latter is an index variable that someone has come up with for how much the animal is in danger (1=low, 5=high) Briefly describe the relationship (if any) between Sleep and Danger?
(Hint: in SPSS, use Graphs > Chart Builder Boxplots with sleep as vertical axis and Danger as category. )
(b) (i) In Curve Estimation, fit a model for Sleep vs Danger with Linear, Quadratic and Cubic ticked. Show the graph and the Model Summary, ANOVA and Coefficients tables for all three models (Linear, Quadratic and Cubic models respectively).
(ii) Based on the Adjusted R2 which is the best model?
(iii) Based on the desire for significant P-values for the t-ratios of the regression coefficients for danger, danger**2 and danger**3, which is the best model?
3. (a) Fit a regression of Sleep on Danger and LogBrain (together). Save the unstandardized predicted values, unstandardized residuals, studentized residuals, leverages, standardized Dfits, and individual 95% prediction intervals. Show the output with regression coefficients. Include Partial Regression Plots
(Hint in SPSS, Use Analyze > Regression > Linear… with Sleep as the dependent and both Danger and logBrain as independent variables. Use the Save button. Use Enter as the method of selection. )
(b) (i) What evidence is there that this is a better model for Sleep than the one in Question 2(b)? i.e. Comment on the R2, adjusted R2 or S, p-values.
(ii) Interpret the values of the slopes and intercept. In particular, what is the effect on sleep of a one-category rise in Danger, or of a ten-fold increase in brain size (g).
(iii) What is the slope of the line and the R2 for the partial regression plot of sleep vs danger?
(c) Show the formula and numbers for calculating a confidence interval for the slope for logBrain, using the regression output and pocket calculator. Hint. Use the t* value with the closest tabulated df, or else if your calculator will produce the number with the correct df then indicate that and quote the number.
Note that the dataset contains missing values, so that n is not necessarily the same as the number of rows.
(You may check your computed CI using SPSS, but you need to show that you know how to use the specific numbers in a formula. You answers should be similar to those of SPSS )
(d) Plot the unstandardized residuals vs the fitted values (i.e unstandardized predicted values). Is the behaviour of the residuals acceptable?
(e) Plot the centered leverages vs logbrain, with separate symbols for each Danger value. What is the criterion for high leverage on your graph? Do any of the mammals have high leverage? (Hint double-click on the “Set Colours” box and it should give you the option to set the symbols instead of the colours. ).
(f) Plot Cook’s D vs logBrain. Do any of the mammals have high influence on the regression parameters?
(g) Plot the DFFITS vs logBrain. Are there any mammals that have a large influence on the fitted values of the model? (Hint: what is the formula?) If so, which mammals?
4. Calculate logLifespan = lg10(lifespan) and logGest = lg10(gest). Also in ‘Variable View’ change the measure for pred expose and danger from Nominal to Scale.
(a) Produce a matrix plot of; dream pred expose danger logBrain logWt logLifespan logGest
Include smoother lines. (You do not need to comment on this graph, but you should have a look for yourself which of the variables look to be most highly related to each other.)
(Hint: in SPSS, use Graphs > Legacy dialogs > Scatter/dot > Matrix scatter and put all the above-mentioned variables in as matrix variables. Edit the graph to add in scatterplot smoother lines. You may also find it useful to change the colour of the lines.)
(b) Correlate the variables:
dream pred expose danger logBrain logWt logLifespan logGest
Which variables are significantly related to dream?
(Hint: in SPSS, use Analyze > Correlate > Bivariate and select the
Variables mentioned above. Use Pearson correlation. For pasting into
Word, you might find it easier to select the correlation output and then
use Edit > Copy Special > and change the formats to copy to Image
(as shown at right). Then the output will paste as a picture, which will
fit on your page. )
(c) Regress dream on all seven predictor variables
pred expose danger logBrain logWt logLifespan logGest
Show output with Model summary table and the table with Coefficients.
Which of these variables has the biggest p-value? Drop this variable from the analysis and regress dream on the remaining six variables. Repeat this process until you have only significant variables (P<0.05) left in the list of predictors.
(Hint: in SPSS use Analyze > Regression > Linear… with dependent dream and independent variables pred expose danger logBrain logWt logLifespan logGest.
Stop the saving of output variables, and stop all plots. After each regression, copy the table for Coefficients and paste into your word document. Then repeat with one less variable ….)
(d) Use the last regression model to describe a mammal that you would predict to spend a lot of time dreaming (e.g. relatively ……… relatively ……… relatively ……..
relatively big/small brain compared to its body weight )
Does the animal that has the most dream time match your description? Comment.

View Less >>