Transcription conventions
Transcription conventions
There are important basic conventions established within the project in
order to have uniform transcriptions. They should be strictly adhered to
if we wish to have comparable data.
We
transcribe under PRAAT, on a single tier, regardless of the
number of speakers involved. Interval boundaries are added according to
the logic of turn-taking (a new interval for each new turn). However, if
stretches within boundaries are too long, a true phonemic/phonetic
alignment may prove difficult at a later stage. We therefore request
that interval units should not normally exceed 15 seconds. No carriage
returns are used, and the speaker is identified at the beginning of each
interval.

Simplified punctuation
The
punctuation system is simplified: the full stop, the comma and the
question mark are the only symbols from traditional spelling used for
the transcription of discourse in the project.
JV: I don't know what to do
with it, I mean I've never looked at a language that way, which is sort
of going out and not knowing anything.
Commas
indicate a brief pause in the discourse, or a ‘non-final’, ‘continuing’
intonation contour marked by a shift in pitch or other cues.
TB: So I was home. I won
the airline tickets.
Full stops
stand for a relatively long pause in the discourse, or for a ‘final’
intonation contour.
DH: How many of these are
you going to have?
A question
mark is inserted at the end of a question.
NB:
-
Pauses and intonation contours do not always
coincide with expectations based on syntax.
-
Pauses and intonation units are not distinguished
along rigorous lines in the orthography employed here, such a finer
supra-segmental transcription remains an optional subsequent task.
-
Commas are used between repeated words or
expressions.
-
An exceptionally long pause in an
otherwise logically/syntactically coherent sequence will be
indicated by a parenthetical remark.
LC: but overall I’d say,
(silence) a little less than half, of those who apply.

Turn
taking
At
the beginning of each turn the speaker is identified by his/her
initials, which are followed by a colon (a space is inserted on its
right, but none on its left). The fieldworker is designated by the
letter F.
F: So, do your parents
agree with you?
JF: Well, not really.
As
mentioned above, there is no carriage return to mark the end of a
sentence or paragraph. The discourse of a single speaker is transcribed
continuously under PRAAT (with regularly added interval boundaries, each
unit being headed by the initials of the speaker).

Overlapping turns
Turns often overlap in a conversation; three types of interventions are
distinguished in the transcription:
Background responses, typical fillers such as ‘yeah’ ‘really’,
laughter, vocal and other noises uttered by the listener to maintain
interaction are ignored.
Short interventions – i.e. when the listener interrupts the speaker
but does not initiate a new turn, and the speaker goes on speaking – are
transcribed within angled brackets in the following manner:
LC: So it’s, it’s that the
approach <F: The approach.> is different.
DR: I mean he may get uh,
<F: But Nixon came back. I think if I remember he was beaten once and
then.> yeah, yeah that's pretty unusual, pretty unusual.
F:
So it's really your grandparents who are Japanese speakers ? <JF: Yeah.>
Your mum and dad are really English speakers <JF: Yeah.> their, their
first language is English?
When
a listener interrupts the speaker and then ‘takes over’ the
conversation, his/her words uttered at the same time as the those of the
previous speaker are transcribed between angled brackets as indicated
above, and a new turn is marked by a new interval (under PRAAT).
F: Do
you feel American above all or what do you feel? <TS: Sure I,>
TS: I guess I don't know what that really means, (laughter) I've,
you know I’m an American but, I don't, I'm not like, ‘yeah I'm an
American’ you know

Truncation of words
A
slash (followed by a space) indicates
unfinished words:
TS: You think you have this
demo/ democratic freedom but it's, not really there.
LC: the col/ the faculty
are looking for a good fit.
Truncated intonation units (when speakers do not finish their train of
thought, are interrupted, or hesitate, etc.) are marked by a comma or a
full stop:
TS: you know I am an
American but, I don't, I'm not like, ‘yeah I'm an American’ you know

Repetition
Repeated words or expressions are separated by a comma.
DR: I, I like to go skiing
in the snow, but I don't want to have to dig my way out of it every day.
JF: I think it's true that,
that, there is racism in, racism in, in California but it's really
well-hidden.
NB:
Commas mark repetition and short pauses in the discourse. Thus in
the following example the first comma stands for a short pause, the
second for a repetition, the third indicates a repetition that coincides
with a short pause at the same time, and the fourth one marks a short
pause:
JF: Uh, it's okay it's
you know it's, it's really, it's really weird teaching you know, I
don't know.

Parentheses
Observations made by the transcriber on non-linguistic aspects of the
interaction (noises, stammering, laughter, etc.) and on the recording
(background action, quality problems) are placed between parentheses.
DG: That's at the beginning
of the week so it's hard to remember. (laughter) Uh, we read a couple of
theoretical texts comparing irony to allegory,
TB: My father, he is from
Canada. (door opens, F returns) Actually he was born in Massachusetts.
Unintelligible words are indicated by the capital letter X in
parentheses. The number of Xs inserted (ideally) corresponds to the
number of incomprehensible syllables:
JV: because not (XX) all
the cases are
uh, show up in the pronoun system,
Words are often hard to decipher due to noise or other interferences, in
this case the commentaries are inserted in separate parentheses:
RF: kicked everyone out of
the airport and made to go you know (noise) (X) shoot the bag and see if
it blows up, and
uh,
In
cases where the transcriber thinks s/he has probably recognized a word
(or sequence of words) but is not fully sure, the word is put in
parentheses:
JG: Maybe I'll stay in the
technology sector, and
uh hopefully do something with creativity, like maybe product
design, or writing you know (maybe) marketing oriented, something like
that. (laughter)

Reported speech
Reported speech is transcribed between inverted commas (‘ ’):
DR: And then when Bush said
‘read my lips no new taxes’ and then, you know,
TB: And there was a woman
at the other line and she said, ‘oh no message’, and so I was
TB: and she said I had won
the prize and I said ‘didn’t you just call’

Some
features of spoken English in relation to spelling
Obviously, many reductions and contractions occur in spontaneous speech.
Contracted forms are used in our transcriptions only in so far as
they are allowed in standard spelling. Note, in the following example,
the co-occurrence of a non-contracted and a contracted form, the former
bearing a slight emphasis.
JG: Yeah I have heard that
and also I've heard that he seems to be very needy of getting votes.
Sometimes non-contracted forms appear in a more formal style:
F: And were your parents
from there?
TS: My mom has lived in Los
Angeles all her life.
Word internal ellipsis is an equally
frequent feature of spoken English. To avoid a waste of energy at the
initial stage of transcription, such deletions are not transcribed. The
examination of these features is left to the phonological/phonetic stage
of the analysis.
LC: Some very, very
intelligent young people, will apply but not do well here because they
needed more structure.
(and not ‘cause)
LC: a portfolio for music,
you know original music compositions
(and not ‘riginal)
But
note that we do not reintroduce words (or word sequences) which
appear to have been missed out (in relation to normative grammar). Thus
if what we hear is:
F: Was she there?
LC: Think so.
We do not transcribe:
F: Was she there?
LC: I think so.
Realizations for which standard orthography offers distinctions will be
transcribed accordingly. Thus the distinction between yes and
yeah is systematically respected in the transcription.
TS: I don't know. Yeah. <F:
It's confusing.> It's confusing. (laughter) Yes there, there's a lot
involved and I think, to be, to say a real opinion on it you have, I
have to be, really informed.
F: But do you feel now you're from California ? <TS: Yes.> That you're
Californian? <TS: Yes. I guess. (laughter)>
Interjections are another characteristic
feature of conversations, employed to express pain, surprise (ouch,
oops) etc., or simply to provide feedback and to signal active
participation towards the other party in the discourse (uh huh,
oh, ah, hm). For these speech forms, we use the
conventions put forward in the OED.
Most
often, however, the speaker is simply using a filler to gain time while
thinking, hesitating, or searching for an expression (hm, uhm,
uh, er) etc. Regardless of the actual sound pronounced,
this type of intervention will always be described as ‘er’
for British, and ‘uh’ for American speech.
RM: Er, it's, it's, er,
yeah, it was quite a nice place er, (XX) smelly in some places, the (XX)
particularly, er it's very run down and er
DG: Uh, let's see,
uh, I uh, I'm from L.A. and I let's say I've been moved
uh always to magnet schools which are like schools that kind of
specialize in one thing or another

Acronyms
Acronyms – pronounceable words made up from the initial letters of a
multi-word name like, for example, UNESCO for the United Nations
Educational, Scientific, and Cultural Organization – are written in the
usual way: capital letters with no separation of any kind if the word is
pronounced as a unit. If on the other hand it is spelled out letter by
letter, this is indicated by writing a full stop after each letter of
the word: U.N.E.S.C.O.
Any
unexpected form of actual pronunciation will be indicated in parentheses
after the word in SAMPA transcription. SAMPA (Speech Assessment Methods
Phonetic Alphabet) is a machine-readable phonetic alphabet developed by
speech researchers from many different countries in the late eighties.
It is to date the best international collaborative basis for a standard
machine-readable encoding of phonetic notation mapping the symbols of
the International Phonetic Alphabet onto ASCII codes. As with the
ordinary IPA, a string of SAMPA symbols does not require spaces between
successive symbols.

Dialectal expressions
Words or expressions that do not belong to either standard British or
American English will be transcribed by using SAMPA symbols:
LC: Dear, a person, is (hEn),
and that's specific to West of Scotland, 'Ho (hEn), how're you doing?'
However, if there is a
longer stretch of discourse in dialectal speech, "normal" spelling will
be employed. If there is a reference dictionary of the dialect being
described, its conventions should be used.

Reference
orthographic systems
In our transcriptions,
we apply the spelling system normally used in the country where the
speakers live or come from. Thus, if we transcribe British varieties of
English, we use standard British English conventions (adopted in the
OED). If we transcribe American English, we use the conventions adopted
in Webster's (cf. hesitation). An examples transcribed according to the
British and the American conventions, respectively:
Standard British English:
DR: he can't honour the
guidelines of the debate, for even ninety minutes
RF: when I
was, I think, maybe thirteen, just travelling with my mum, and er
American
English:
DR: he can't honor the
guidelines of the debate, for even ninety minutes
RF: when I
was, I think, maybe thirteen, just traveling with my mum, and uh

ttp://w3.pac.univ-tlse2.fr/
|