How do you know if an association of two variables is positive or negative?

In correlation analysis, we estimate a sample correlation coefficient, more specifically the Pearson Product Moment correlation coefficient. The sample correlation coefficient, denoted r,

ranges between -1 and +1 and quantifies the direction and strength of the linear association between the two variables. The correlation between two variables can be positive (i.e., higher levels of one variable are associated with higher levels of the other) or negative (i.e., higher levels of one variable are associated with lower levels of the other).

The sign of the correlation coefficient indicates the direction of the association. The magnitude of the correlation coefficient indicates the strength of the association.

For example, a correlation of r = 0.9 suggests a strong, positive association between two variables, whereas a correlation of r = -0.2 suggest a weak, negative association. A correlation close to zero suggests no linear association between two continuous variables.

It is important to note that there may be a non-linear association between two continuous variables, but computation of a correlation coefficient does not detect this. Therefore, it is always important to evaluate the data carefully before computing a correlation coefficient. Graphical displays are particularly useful to explore associations between variables.

The figure below shows four hypothetical scenarios in which one continuous variable is plotted along the X-axis and the other along the Y-axis.

How do you know if an association of two variables is positive or negative?

  • Scenario 1 depicts a strong positive association (r=0.9), similar to what we might see for the correlation between infant birth weight and birth length.
  • Scenario 2 depicts a weaker association (r=0,2) that we might expect to see between age and body mass index (which tends to increase with age).
  • Scenario 3 might depict the lack of association (r approximately = 0) between the extent of media exposure in adolescence and age at which adolescents initiate sexual activity.
  • Scenario 4 might depict the strong negative association (r= -0.9) generally observed between the number of hours of aerobic exercise per week and percent body fat.

How do you know if an association of two variables is positive or negative?

A study of a random sample of 100 Americans summarizes the relationship between alcohol consumption and age with a correlation coefficient r= 0.03. The value of r tells us:

Example - Correlation of Gestational Age and Birth Weight

A small study is conducted involving 17 infants to investigate the association between gestational age at birth, measured in weeks, and birth weight, measured in grams.

Infant ID #

Gestational Age (weeks)

Birth Weight (grams)

1

34.7

1895

2

36.0

2030

3

29.3

1440

4

40.1

2835

5

35.7

3090

6

42.4

3827

7

40.3

3260

8

37.3

2690

9

40.9

3285

10

38.3

2920

11

38.5

3430

12

41.4

3657

13

39.7

3685

14

39.7

3345

15

41.1

3260

16

38.0

2680

17

38.7

2005

We wish to estimate the association between gestational age and infant birth weight. In this example, birth weight is the dependent variable and gestational age is the independent variable. Thus y=birth weight and x=gestational age. The data are displayed in a scatter diagram in the figure below.

How do you know if an association of two variables is positive or negative?

Each point represents an (x,y) pair (in this case the gestational age, measured in weeks, and the birth weight, measured in grams). Note that the independent variable, gestational age) is on the horizontal axis (or X-axis), and the dependent variable (birth weight) is on the vertical axis (or Y-axis). The scatter plot shows a positive or direct association between gestational age and birth weight. Infants with shorter gestational ages are more likely to be born with lower weights and infants with longer gestational ages are more likely to be born with higher weights.

The correlation requires two scores from the same individuals. These scores are normally identified as X and Y. The pairs of scores can be listed in a table or presented in a scatterplot.

Example: We might be interested in the correlation between your SAT-M scores and your GPA at UNC.

Here are the Math SAT scores and the GPA scores of 13 of the students in this class, and the scatterplot for all 41 students:

How do you know if an association of two variables is positive or negative?

How do you know if an association of two variables is positive or negative?

The scatterplot has the X values (GPA) on the horizontal (X) axis, and the Y values (MathSAT) on the vertical (Y) axis. Each individual is identified by a single point (dot) on the graph which is located so that the coordinates of the point (the X and Y values) match the individual's X (GPA) and Y (MathSAT) scores.

For example, the student named "Obs5" (in the sixth row of the datasheet) has GPA=2.30 and MathSAT=710. This student is represented in the scatterplot by high-lighted and labled ("5") dot in the upper-left part of the scatterplot. Note that is to the right of MathSAT of 710 and above GPA of 2.30.

Note that the Pearson correlation (explained below) between these two variables is .32.

Correlations have three important characterstics. They can tell us about the direction of the relationship, the form (shape) of the relationship, and the degree (strength) of the relationship between two variables.

  1. The Direction of a Relationship
  2. The correlation measure tells us about the direction of the relationship between the two variables. The direction can be positive or negative.

    1. Positive
    2. : In a positive relationship both variables tend to move in the same direction: If one variable increases, the other tends to also increase. If one decreases, the other tends to also.

      In the example above, GPA and MathSAT are positively related. As GPA (or MathSAT) increases, the other variable also tends to increase.

    3. Negative
    4. : In a negative relationship the variables tend to move in the opposite directions: If one variable increases, the other tends to decrease, and vice-versa.

    The direction of the relationship between two variables is identified by the sign of the correlation coefficient for the variables. Postive relationships have a "plus" sign, whereas negative relationships have a "minus" sign.

  3. The Form (Shape) of a Relationship
  4. : The form or shape of a relationship refers to whether the relationship is straight or curved.
    1. Linear
    2. : A straight relationship is called linear, because it approximates a straight line. The GPA, MathSAT example shows a relationship that is, roughly, a linear relationship.

    3. Curvilinear
    4. : A curved relationship is called curvilinear, because it approximates a curved line. An example of the relationship between the Miles-per-gallon and engine displacement of various automobiles sold in the USA in 1982 is shown below. This is curvilinear (and negative).

    How do you know if an association of two variables is positive or negative?

    In this course we only deal with correlation coefficients that measure linear relationship. There are other correlation coefficients that measure curvilinear relationship, but they are beyond the introductory level.

  5. The Degree (Strength) of a Relationship
  6. Finally, a correlation coefficient measures the degree (strength) of the relationship between two variables. The mesures we discuss only measure the strength of the linear relationship between two variables. Two specific strengths are:

    1. Perfect Relationship
    2. : When two variables are exactly (linearly) related the correlation coefficient is either +1.00 or -1.00. They are said to be perfectly linearly related, either positively or negatively.

    3. No relationship
    4. : When two variables have no relationship at all, their correlation is 0.00.

    There are strengths in between -1.00, 0.00 and +1.00. Note, though. that +1.00 is the largest postive correlation and -1.00 is the largest negative correlation that is possible. Here are three examples:

    Weight and Horsepower

    How do you know if an association of two variables is positive or negative?

    The relationship between Weight and Horsepower is strong, linear, and positive, though not perfect. The Pearson correlation coefficient is +.92.

    Drive Ratio and Horsepower

    How do you know if an association of two variables is positive or negative?

    The relationship between drive ratio and Horsepower is weekly negative, though not zero. The Pearson correlation coefficient is -.59.

    Drive Ratio and Miles-Per-Gallon

    How do you know if an association of two variables is positive or negative?

    The relationship between drive ratio and MPG is weekly positive, though not zero. The Pearson correlation coefficient is .42.

  1. Prediction
  2. : Correlations can be used to help make predictions. If two variables have been known in the past to correlate, then we can assume they will continue to correlate in the future. We can use the value of one variable that is known now to predict the value that the other variable will take on in the future.

    For example, we require high school students to take the SAT exam because we know that in the past SAT scores correlated well with the GPA scores that the students get when they are in college. Thus, we predict high SAT scores will lead to high GPA scores, and conversely.

    Is the association positive or negative?

    If one variable increases as the other variable increases, there is said to be a positive association. If one variable increases as the other variable decreases, there is said to be a negative association. If there is no relationship between the variables, then the points in the scatterplot have no association.

    What does it mean to say that two variables are negatively associated?

    A negative, or inverse correlation, between two variables, indicates that one variable increases while the other decreases, and vice-versa.

    What is positive association variables?

    Two variables have a positive association when above-average values of one tend to accompany above-average values of the other, and when below-average values also tend to occur together.

    How to determine the direction of the association between two variables?

    The direction of the relationship between two variables is identified by the sign of the correlation coefficient for the variables. Postive relationships have a "plus" sign, whereas negative relationships have a "minus" sign.