Correlation & Regression (DP IB Applications & Interpretation (AI))

Flashcards

1/29
  • What is bivariate data?

Enjoying Flashcards?
Tell us what you think

Cards in this collection (29)

  • What is bivariate data?

    Bivariate data is data which is collected on two variables. Each data value from one variable will be paired with a data value from the other variable

  • True or False?

    The independent variable is plotted on the y-axis on a scatter diagram.

    False.

    The independent variable is plotted on the x-axis on a scatter diagram.

  • What type of correlation is shown in the diagram below?

    A scatter graph showing a number of data points following a rough line from the top left to the bottom right.

    The type of correlation shown is negative correlation.

    The data points follow a general trend from the top left of the graph to the bottom right.

  • True or False?

    If there is a strong positive correlation between two variables, then this indicates that increasing one variable causes the other variable to increase.

    False.

    If there is a strong positive correlation between two variables, this does not necessarily indicate that increasing one variable causes the other variable to increase.

    Correlation does not imply causation. There could be other factors which are causing the variables to increase together.

  • When is a line of best fit appropriate for bivariate data?

    A line of best fit is appropriate if there is strong linear (positive or negative) correlation.

  • True or False?

    The mean point open parentheses x with bar on top comma space y with bar on top close parentheses should always lie on a line of best fit.

    True.

    The mean point open parentheses x with bar on top comma space y with bar on top close parentheses should always lie on a line of best fit.

  • What does Pearson's product-moment correlation coefficient (r) measure?

    Pearson's product-moment correlation coefficient (r) measures the linear correlation between two variables.

  • What values can Pearson's product-moment correlation coefficient (r) take?

    Pearson's product-moment correlation coefficient (r) can take any value in the interval negative 1 less or equal than r less or equal than 1.

  • True or False?

    If all the points on a scatter diagram lie on the same straight line, then r equals 1.

    False.

    If all the points on a scatter diagram lie on the same straight line, then r equals 1 or r equals negative 1

  • Is there evidence of linear correlation for a set of bivariate data if:

    • r equals 0.7500,

    • the critical value is 0.4428?

    • r equals 0.7500

    • the critical value is 0.4428

    There is evidence of linear correlation because the PMCC is greater than the critical value.

  • True or False?

    The closer the PMCC (r) is to 1, the steeper the gradient of the line of best fit.

    False.

    The PMCC does not measure the steepness of the line of best fit. The closer the PMCC is to 1, the closer the points are to a straight line.

  • True or False?

    A scatter diagram where the points tend to go from the top left to the bottom right.

    The PMCC for the above scatter diagram is r equals 0.4.

    False.

    The gradient of the line of best fit is negative, therefore there is a negative linear correlation. The value of the PMCC cannot be a positive value for this diagram.

  • What does a monotonic relationship mean?

    A monotonic relationship means that as one variable increases, the other always increases or always decreases.

  • What is the difference between Pearson's product-moment correlation coefficient and Spearman's rank correlation coefficient?

    Pearson's product-moment correlation coefficient tests for a linear relationship between bivariate data.

    Spearman's rank correlation coefficient tests for any monotonic relationship between bivariate data.

  • How do you calculate the Spearman's rank correlation coefficient?

    To calculate Spearman's rank correlation coefficient:

    • rank the data for the first variable from 1 to n, biggest to smallest (or smallest to biggest),

    • rank the data for the second variable from 1 to n, using the same method as the first variable,

    • calculate the PMCC between the two sets of rankings using your GDC.

  • When calculating Spearman's rank correlation coefficient, how would you assign rankings to equal values?

    For example, how would you rank the data 8, 10, 10, 7?

    When calculating Spearman's rank correlation coefficient, if some values are equal then you would give each value the average of the ranks they would occupy.

    For example, for the data 8, 10, 10, 7, the value 10 is the 3rd and 4th biggest value (if ranking smallest to biggest). Therefore those values would be assigned the ranking fraction numerator 3 plus 4 over denominator 2 end fraction equals 3.5.

  • True or False?

    Spearman's rank correlation coefficient can only take values in the interval negative 1 less than r subscript s less than 1.

    False.

    Spearman's rank correlation coefficient can also take the value of 1 or -1. Therefore it can take values in the interval negative 1 less or equal than r subscript s less or equal than 1.

  • True or False?

    Spearman's rank correlation coefficient for the graph below is equal to 1.

    A scatter diagram where the points are always going up and to the right.

    True.

    Spearman's rank correlation coefficient for the graph is equal to 1 because the points are always increasing.

  • Is it possible for r equals 1 (PMCC) but r subscript s less than 1 (Spearman's rank)?

    No, it is not possible for r equals 1 (PMCC) but r subscript s less than 1 (Spearman's rank).

    If r equals 1, then the data lies on a straight line which is always increasing. Therefore r subscript s equals 1.

  • Is it possible for r subscript s equals 1 (Spearman's rank) but r less than 1 (PMCC)?

    Yes, it is possible for r subscript s equals 1 (Spearman's rank) but r less than 1 (PMCC).

    If r subscript s equals 1, then the points are always increasing. However, the graph does not need to be straight as shown in an example below.

    A scatter diagram where the points are always increasing.
  • Which correlation coefficient is most affected by outliers: Pearson's product-moment or Spearman's rank?

    Pearson's product-moment correlation coefficient is most affected by outliers.

  • True or False?

    The least squares regression line of y on x minimises the sum of the squares of the vertical distances between each point and the line.

    True.

    The least squares regression line of y on x (y equals a x plus b) minimises the sum of the squares of the vertical distances between each point and the line.

  • What is the equation for the regression line of y on x?

    The equation for the regression line of y on x is y equals a x plus b.

  • What does the value of a represent in the equation of the regression line of y on x open parentheses y equals a x plus b close parentheses?

    The value of a in the equation of the regression line of y on x (y equals a x plus b) represents the gradient of the line. It is the amount that y changes by if x is increased by 1 unit.

  • What does the value of b represent in the equation of the regression line of y on x open parentheses y equals a x plus b close parentheses?

    The value of b in the equation of the regression line of y on x (y equals a x plus b) represents the y-intercept of the line. It is the amount that y would be if x were zero.

  • True or False?

    The regression line of y on x can reliably predict the value of x for a value of y within the given range of the data.

    False.

    The regression line of y on x can not reliably predict the value of x for any value of y. The regression line of y on x should only be used to predict values of y.

  • What is extrapolation with a regression line?

    Extrapolation is making a prediction for the value of y for a value of x that is outside of the range of the given data.

  • What is interpolation with a regression line?

    Interpolation is making a prediction for the value of y for a value of x that is within the range of the given data.

  • Which is more reliable: interpolation or extrapolation?

    Interpolation is more reliable than extrapolation.