Experimental Linguistics Module – Autumn 2011

Tuesday 2-4pm, Bancroft Building room 102.6

  • Module Description

    The goal of this module is to take students with no prior training in the methods or tools of experimental psychological science and provide them with the theoretical and practical training required to be able to critically engage with the Psycholinguistics literature and to undertake experimental linguistics research themselves. The module will include hands-on training in inferential statistics and hypothesis testing, experimental design, data collection (including training in ethical human subjects research protocols), and data analysis. The module will also engage students in considering strengths and limitations of various kinds of linguistics data, and how multiple sources of data and methods of data collection can be combined to enhance understanding. Students will develop their critical reading skills and gain practice in presenting primary source literature to their peers.

experiment results – it worked!

Posted by Linnaea on December 22, 2011

Finally, I have some results to share with you. Sorry it took longer than anticipated – the data set turned out to be larger and more complicated than I had fully anticipated, and required an awful lot of processing to extract out the comparisons we were interested in. There are still lots of analyses I haven’t had time to run, but at least I’ve managed to address the basic questions that motivated this research in the first place.

As you know, we had two hypotheses: 1st, that English speakers would find pairs like /aka/~/akka/ harder to discriminate than speakers of Hindi or Urdu, because of the status of gemination in the grammars of English speakers vs. Hindi/Urdu speakers; and 2nd, that English speakers would find pairs like /aka/~/akka/ harder to discriminate than pairs like /aka/~/aga/, which also differ in only a single feature, while for Hindi/Urdu speakers, there would be no difference between discriminating on the basis of consonant length or consonant voicing.

To test these two hypotheses, I had to do some work. In our experiment, we measured which of the two response buttons participants pushed on each trial, which told us whether they had succeeded in correctly perceiving the two stimulus tokens as the same or different.

Before beginning the analyses, I collated the data from all of the various log books and determined that our participants had the following characteristics:

    1. we had 140 participants in total, 93 were English native speakers, 47 were Hindi or Urdu native speakers
    2. 51 of the English participants, and 23 of the Hindi/Urdu were female (55 and 49% respectively), which is a very even gender balance (impressive given I never actually instructed you to try to run equal numbers of men and women)
    3. the English speakers ranged from 17 to 42 years old (mean 23.4) and the Hindi/Urdu speakers from 19-40 (mean 27.1)

So we had slightly less than twice as many English speakers as Hindi/Urdu speakers, but otherwise the two populations are very well matched for gender and age.

The next step was to look at these participants and see whether there were any that should be excluded from analysis. With a set this large, it would be normal to have a few participants who just didn’t do the task properly, or a few cases where the computer didn’t accurately record responses. To see whether this was the case, I used the ‘same’ trials. All our hypotheses are about how well speakers are able to discriminate pairs of sounds that differ in some feature – we didn’t have any hypotheses about the trials that involved two identical tokens (although you’re welcome to think some up).

So I calculated response rates for each subject on the same trials. There were 80 same trials in the experiment, every subject should have responded to at least 75 of those (even allowing for occasionally missing a trial). Instead, there were 6 subjects for whom we had fewer than 60 responses to these trials. Either these subjects were not bothering to press the response buttons, or the computer was not registering those responses, but either way, the result was that I couldn’t trust that the data from these subjects would be reliable, so I excluded it from further analysis (4 of the excluded subjects were English speakers, 2 Hindi/Urdu, which is in proportion to the total number of subjects from each group).

Then, for the remaining 134 subjects, I calculated the accuracy rate per condition on the ‘difference’. In the experiment, we had 6 conditions that involved a one feature mismatch between the two sound tokens. The first two conditions manipulated consonant length:

a. [gem] [sing] e.g. /akka/ /aka/

b. [sing] [gem] e.g. /aka/ /akka/

The next two conditions kept length constant, but manipulated consonant voicing:

c. [voiceless] [voiced] e.g. /aka/ /aga/

d. [voiced] [voiceless] e.g. /aga/ /aka/

And finally the last two conditions manipulated the place of articulation of the consonant:

e. 1st consonant was more anterior than the 2nd e.g. /apa/ /ata/

f. 1st consonant was more posterior than 2nd e.g. /aka/ /ata/

Analysing the place of articulation manipulation would involve a more complicated analysis than the other conditions, since we expect important differences in the perceptibility of the pairwise contrasts depending on whether they involve phonemes in the language or not. English speakers should have more difficulty distinguishing /k/ from /c/ or /d/ from /D/ than Hindi/Urdu speakers do, but should be equally as good at distinguishing /p/ from /t/. Since place of articulation was not actually the primary variable of interest in this experiment, I elected to skip this analysis for now.

Also, since we did not actually have any predictions about the order of the two elements (for example, we have no reason to think that discriminating /g/ from /gg/ should be harder if /g/ is first), I collapsed conditions (a) and (b) together and (c) and (d) together. This then left me with two scores for each subject: their average accuracy in discriminating geminate from singleton stops, and their average accuracy in discriminating voiced from voiceless stops.

I then calculated some basic descriptive statistics for the two language groups:

% of trials correctly discriminated (standard dev)

Length(SD) Voicing Average
English 70.53(19.47) 77.40(12.52) 73.97(15.99)
Hindi/Urdu 78.55(14.88) 81.09(9.12) 79.82(12.01)
Average 74.54(17.17) 79.25(10.83) 76.89(14.00)

Even without doing any inferential statistical analysis, we can see that there are differences between the languages and conditions, in the predicted directions. English speakers are 7% more accurate on the voicing discrimination trials than on the length discrimination trials (there is a small difference for Hindi/Urdu speakers as well, but it’s only 2.5%). In addition, Hindi/Urdu speakers are 8% more accurate than English speakers on the Length discrimination trials, but only 3.6% more accurate on the voicing trials.

So far, so good. It looks like Hindi/Urdu speakers did slightly better than English speakers at the discrimination task across the board (6% better), and that perceiving a voicing contrast is slightly easier than perceiving a length contrast (4.5%) in both languages, but it also looks like there is an additional cost in the English speakers + Length discrimination condition.

And happily, this is confirmed by the inferential statistics.

I performed a linear mixed effects analysis that tested the following model:

accuracy ~ lang * difference + (1 | subj)

This is read as:

dependent measure ~ variable1 * variable2 + (source of additional, random variation)

What this model investigates is the individual effect of our two variables of interest on our dependent measure, AND any possible interaction of effects between these two variables, while accounting for the random variation introduced by differences between subjects.

And what emerges from the analysis is:

a. Both speaker language and stimulus difference type have a main effect on accuracy rates (p<0.005)

b. There is a significant interaction between speaker language and difference type (p<0.025)

Remember that p values represent the probability of our data showing the patterns it shows given the null hypothesis that our variables of interest had no effect on our dependent measure. So very small p values mean that the probability of getting our data given the null hypothesis is very small, which then gives us confidence to reject the null hypothesis and conclude that our variables DID affect the responses.

I also performed pairwise planned comparisons to investigate these main effects and the interaction. We find significant differences between English and Hindi length discrimination accuracy (t=-2.58, p=0.0112) and English length vs. voicing discrimination accuracy (t=-2.656, p=0.0088), but not between English and Hindi voicing discrimination accuracy (t=-1.8889, p=0.061) or Hindi length vs. voicing discrimination accuracy (t=-0.979, p=0.331).

Pretty cool, I think. 🙂

There are dozens more analyses that I could do with more time. But I think this is more than enough to give you all something to write about in your results and discussion section, and sometimes it’s better to stop when you’re ahead of the game.

I’m looking forward to reading your interpretations of these results, and your perspectives on what they mean.

If you have questions about these results, or about how to make use of them in your reports, post those questions here. I will not be checking my qmul email until January.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: