Abstract
For those who are physically unable to speak, text-to-speech (TTS) software allows them to communicate with others more easily. However, the speech generÂated in such programs often lacks the addition of inflections, leaving the speech emotionless or neutral. Therefore, it is important to find a means of adding emoÂtions to synthesized or pre-recorded speech. It seems reasonable that, based on theory and through the use of signal processing, changes in characteristics such as pitch and rhythm could produce a desired emotion. The goal of this study is to implement signal processing in a way that applies an emotional filter to a voice recording. For this thesis, we are focused on producing synthetic sarcastic speech, since little research has been done on the generation of sarcasm (though work has been done on identifying it). Sarcastic phrases were read aloud ( with the proper inflection) and analyzed for common characteristics. Next, a program was devised to modulate a voice in pitch and time appropriately. Lastly, listeners will be asked to listen to voice recordings (processed and unprocessed) and rate the level of perÂceived sarcasm in each instance. My hypothesis was that the processed recordings will be rated as having more perceived sarcasm than before. Results showed that the synthesized sarcastic sentences did in fact have a higher perceived rating of sarcasm than the emotionally neutral sentences from which they were derived.