Abstract
When you hear a person speaking in a familiar language you perceive the speech sounds uttered and the voice that produces them. How are speech sounds and voice related in a typical auditory experience of hearing speech in a particular voice? And how to conceive of the objects of such experiences? I propose a conception of auditory objects of speech perception as temporally structured mereologically complex individuals. A common experience is that speech sounds and the voice that produces them appear united. I argue that the metaphysical underpinnings of the experienced unity of speech sounds and voices can be explained in terms of the mereological view on sounds and their sources. I also propose a psychological explanation (the Voice Shaping Speech model) of how we form and individuate the auditory objects of experiences of listening to speech in a particular voice. Voice characteristics enable determining the identity of auditory objects of speech sound perception by making some features of the speech signal stable and predictable.