Correlation

 

In a research project respondents were given a test that measured their general knowledge about promotion.  The test was scored from zero to 100.  The respondents then were given a series of well-known ads shown over the last ten years on TV.  The ability to recognize these ads was also scored from zero to 100.  A client wants to know if promotional knowledge is related to how many ads a person can remember.  Both of these measures are continuous and constitute interval data.  None of the “Stat of the Week” procedures outlined so far would allow an answer to the client’s question.

 

What is needed is a measure of association, sometimes called concomitance.  This problem became very important after the evolutionary theories of Darwin were published, but an entire generation of statisticians could not come up with a satisfactory solution.  Karl Pearson solved the problem about 1896, and the statistic is named after him.  Correlation is symbolized by the letter “r” and is formally called the Pearson Product Moment Correlation Coefficient.  The problem actually turned out to be a very simple one once Pearson thought of the key.  If any two variables (called “x” and “y”) from the same person are multiplied together and the product is summed (this is called the sum of the cross products), if the two variables are in the same order, the sum will be maximized.  In other words:

 

 

when x and y are in perfect order.  The same cross product will be minimal if x and y are in perfect inverse order.  This fact can be used to produce a coefficient of concomitance.    This can be done by standardizing the scores by reducing all scores to how many standard deviations they are from the center of their own distribution.

 

 

The cross product then becomes:

 

 

The answer has to become standardized by sample size by dividing by the sample size.  It turns out the correlation then is simply:

 

 

The correlation “r” can range from –1 to +1, with zero being the absence of all association.  If r is squared, it gives the amount of variation that can be accounted for in one variable by knowing the other.  The correlation can be either positive or negative.  Positive or negative coefficients do not indicate the magnitude of the relationship, only the direction. Also note that this measure of association says nothing at all about what “causes” what.  Remember that three conditions are necessary to establish cause, and correlation only establishes one of these. 

 

What to do:

 

1. Ask ten people to fill out the questionnaire handed out in class.  It is extremely important that you do not fabricate any of this data.

2. Input the data, combined with the data from all the other members of you research group, into SPSS.  In other words, if there are four members of your group, you will have 40 cases to input.  Name each variable in the dataset and describe how each variable will be translated into a number.  In other words, decide what question 8 will be called and what number will go with a) and b), etc. 

 

3. Do the following for every variable, except number 8 thru 10:

            Analyze

                        Correlate

                                    Bivariate

                                                [put all 9 variables into “Variable” box]

                                                Run the statistic

 

4. Cut and paste to report

 

5. Show how you decided to import every variable into the dataset.

 

6. Explain how each variable is related to every “class overall” and “the instructor overall.”  Calculate the square of each correlation and explain what it means.