Last update May, 2017

 

Carlos Toshinori Ishi, PhD in Engineering                                                       

Speech Science and Technology Researcher

Group Leader of ATR/HIL Sound Environment Intelligence Research Group

 

Office Address

 

ATR – HIL (Hiroshi Ishiguro Laboratories)

2-2-2 Hikaridai, Seika-cho, Soraku-gun

Kyoto 619-0288, JAPAN

Phone: +81-774-95-2457              Fax:  +81-774-95-1408

E-mail carlos (at) atr (dot) jp

 

 

Academic background

 

Doctor course      Oct. 1998 ~ Sep. 2001 (Japan)

University of Tokyo (Graduate School of Engineering – Dept. of Information and Communication Engineering)

PhD dissertation: "Japanese Prosody Analysis and its Application for Computer-Aided Language Learning (CALL) Systems".  With the aim of constructing a CALL system that can detect pronunciation errors reliably, acoustic-prosodic features on linguistic features of Japanese, such as tokushuhaku (special morae), mora rhythm, accent and intonation, were investigated in the production and perception viewpoints.

 

Master course      Jan. 1997 ~ Feb. 1998 (Brazil)

"Instituto Tecnológico de Aeronáutica" (Electronic Engineering – Dept. of Telecommunications)

Master thesis: "Analysis of Brazilian Portuguese Phonemes for Speech Recognition".  Acoustic properties of Brazilian Portuguese phonemes were analyzed for automatically segmentation purposes.  Neural networks were also implemented for discriminating some devoiced vowels frequent in phrase finals in Brazilian Portuguese.

 

College                Jan. 1992 ~ Dec. 1996 (Brazil)

"Instituto Tecnológico de Aeronáutica"  (Electronic Engineering)

BA thesis: "DSP-Implementation of an Isolated Word Speech Recognition System".  A DTW-based algorithm using mel-cepstral coefficients as features was implemented in DSP Assembly language for recognition of isolated words.

 

Technical High-school       Jan. 1987 ~ Dec. 1990 (Brazil)

"Colégio Industrial Liceu de Artes e Ofícios de São Paulo" (Electronic Technician)

 

 

Vocational background

 

ATR/HIL (Hiroshi Ishiguro Laboratories)

Apr. 2015 ~  Group leader of ATR/HIL Sound Environment Intelligence Research Group

 

ATR/IRC (Intelligent Robotics and Communication) Labs.

Apr. 2013 ~ Mar. 2015 Group leader of ATR/IRC Sound Environment Intelligence Laboratory

Jan. 2005 ~ Mar. 2013 Researcher

 

Research on Speech Science and Technology for Verbal and Non-Verbal Communication in Human-Robot Interaction and Sound Environment Intelligence:

- Acoustic analysis of vocal fry in pressed voice (“rikimi”):  Proposal of an algorithm for automatic detection of vocal fry.

- Use of prosodic and voice quality parameters for automatic extraction of paralinguistic information (speech acts, attitudes, emotions).

- Use of prosodic and linguistic cues for automatic detection of turn-taking and dialog acts.

- Evaluation of robust speech recognition system for communication robots in real environments (Joint work with ATR/SLC Labs.)

- Acoustic and electro-glottographic analysis of pressed voice and other voice qualities.

- Analysis of head motions and linguistic and paralinguistic information carried by speech.

- Head motion control in humanoid robots (androids)

- Sound source localization and utterance interval detection using microphone arrays; sound environment.

- Audio-visual speech interval detection.

- Integration of speech recognition and paralinguistic information extraction (speech act recognition).

- Robust F0 extraction.

- Speech-driven lip motion generation for teleoperation of humanoid robots.

- Sound map generation by integration of microphone arrays and laser range finders.

 

JST/CREST at ATR/HIS Labs.

Feb. 2002 ~ Dec. 2004

- Research in Speech Science and Technology for Expressive Speech Processing

- Acoustic-prosodic analysis of expressive speech: Principal Component Analysis on global acoustic features and impressions about emotional states, attitudes, and speaking styles.

- Analysis focused on pitch movements of phrase finals à Automatic identification of phrase final tones.

- Acoustical analysis of creaky voice à Automatic detection of creaky segments

- Acoustical analysis of breathy/whispery voices à Automatic detection of aspiration noise segments

- Development of algorithms for automatic speech utterance detection, algorithms for pitch extraction, software for prosodic labeling tools.

 

ITA-LASD

Jan. 1997 ~ Feb. 1998

Implementation of software in assembly for Digital Signal Processors

- ADPCM algorithms for audio compression, using ADSP-21XX processors

- FFT-based algorithms for telephone tone detection, using Motorola ESM processors

 

Matec

Jan. 1991 ~ Jan. 1992

Repair of broken Telephone exchange plaques, power supplier modules, and telephone devices.

 

 

Grant

 

- 総務省 SCOPE ICTイノベーション創出型研究開発(July 2015 ~ Mar. 2016)“音環境知能技術を活用した取捨選択型聴覚支援システムの研究開発”(研究代表者)

- 総務省 SCOPE ICTイノベーション創出型研究開発(July 2012 ~ Mar. 2015)“複数のマイクロホンアレイの連携による音環境知能技術の研究開発”(研究代表者)

- 科学研究費補助金 若手研究AApr. 2011 ~ Mar. 2014)“韻律・声質の動的特徴および形態素・品詞を考慮した発話意図認識システムの構築”(研究代表者)

- 科学研究費補助金 若手研究AApr. 2008 ~ Mar. 2011)“発話音声に伴う頭部動作および表情と言語・パラ言語情報との関連構造の構築”(研究代表者)

- 科学研究費補助金 若手研究AApr. 2006 ~ Mar. 2008)“韻律と声質を考慮した発話スタイルの検出機構の構築と実環境への適用”(研究代表者)

- 文部省国費留学生奨学金 (Apr. 1998 ~ Sep. 2001)

 

 

Lecture

 

2015 ~

Visitant Lecturer(非常勤講師)at Doshisha University(同志社大学)part of “特別講義I

 

2013 ~ 2014

Visitant Associate Professor (客員准教授)at Kyoto University (京都大学) part of “知能情報学特殊研究1

 

2012 ~

Visitant Associate Professor (客員准教授)at Kobe University (神戸大学)“マルチモーダル情報処理”(”Multi-modal information processing”

 

2008 ~ 2016

Visitant Lecturer(非常勤講師)at Osaka Prefecture University(大阪府立大学)half of “Advanced Intelligent Media Processing” (“知能メディア処理特論”)

 

 

Language skills

 

- Native language: Brazilian Portuguese

- Second languages: Japanese, English.

 

 

Programming skills

 

- C++, Basic, Pascal

- Visual C++, Visual Basic, JAVA

- Matlab

- Assembly (Analog Devices ADSP-21XX, Motorola ESM, 386)

 

 

Research interests

 

Prosody and Voice quality:

- Analysis of laryngeal voice qualities:  automatic detection of creaky voice; automatic detection of aspiration noise in breathy and whispery voices.

- Mapping between prosodic + voice quality features and linguistic and paralinguistic functions (intentions, emotions, and attitudes) in Japanese.

- Transcription of prosodic events: automatic extraction of perceptually meaningful prosodic events for automatic prosody labeling: focus on phrase final prosody and voice quality.

- Pitch perception: Correspondence between acoustically observed F0 and perceived pitch movements.

- Robust F0 extraction.

 

Speech and Gestures:

- Analysis of head motion and speech in spoken dialogue:  automatic generation of head motions from speech.

- Multi-modal dialogue processing.

- Lip motion generation/synchronization for humanoid robots (including androids) based on speech acoustics.

- Head motion generation from speech acoustics and linguistic information.

- Facial expression and motion generation in humanoid robots (including androids) based on speech acoustics.

 

Robot Audition and Sound Environment Intelligence:

- Microphone array for audio source localization and separation.

- Improvement of speech recognition and understanding in noisy environments.

- Utterance interval detection based on sound directivity.

- Utterance interval detection based on audio-visual information.

- Sound environment map generation

 

Speech Perception and Recognition

- Auditory representation of speech signals: acoustic parameters related to auditory perception; masking functions.

- Prosodic modeling applied to recognition of linguistic and paralinguistic information

 

Speech Production and Synthesis

- Mapping between physiological and acoustic features for laryngeal voice quality control

- Prosodic control and voice quality control for Speech Synthesis

 

 

List of publications

 

Journal Papers

1.        C.T. Ishi, T. Minato, H. Ishiguro. (2017) "Motion analysis in vocalized surprise expressions and motion generation in android robots," IEEE Robotics and Automation Letters, 2017. (IEEE Early Access Articles)

2.        J. Even, J. Furrer, L.Y. Morales Saiki, C.T. Ishi, N. Hagita. (2017) "Probabilistic 3D mapping of sound-emitting structures based on acoustic ray casting," IEEE Transactions on Robotics (T-RO) Vol.33, No.2, 333-345, 2017.

3.        船山智,港隆史,石井カルロス寿憲,石黒浩 (2017)"操作者の笑い声に基づく遠隔操作型アンドロイドの笑い動作生成",情報処理学会論文誌, Vol.58, No.4, 932-944, Apr. 2017.

4.        境くりま,港隆史,石井カルロス寿憲,石黒浩 (2017)"わずかな感情変化を表現可能なアンドロイド動作の生成モデルの提案", 電子情報通信学会論文誌 D, Vol.J100-D, No.3, 310-320, Mar. 2017.

5.        石井カルロス寿憲, エヴァン・イアニ, 萩田紀博 (2016). 複数のマイクロホンアレイによる音源方向情報と人位置情報に基づく音声区間検出および顔の向きの推定の評価, 日本ロボット学会誌,Vol.34 No.3, pp 39-44, April 2016.

6.        境くりま, 石井カルロス寿憲, 港隆史, 石黒浩 (2016). 音声に対応する頭部動作のオンライン生成システムと遠隔操作における効果, 電子情報通信学会和文論文誌AVol. J99-A No.1, pp. 14-24, Jan. 2016.

7.        石井カルロス寿憲 (2015). 人とロボットのコミュニケーションにおける非言語情報の表出−発話に伴う自然な頭部動作生成に焦点を当てて−, 感性工学, Vol. 13, No. 4, pp. 205-210, Dec. 2015.(解説論文)

8.        石井カルロス寿憲 (2015). 音声対話中に出現するパラ言語情報と音響関連量―声質の役割に焦点を当てて , 日本音響学会誌, Vol. 71, No. 9, pp. 476-483, Sep. 2015. (解説論文)

9.        渡辺敦志, エヴァン・イアニ, モラレス・ルイス洋一, 石井カルロス寿憲 (2015). 人間協調型移動ロボットによるコンクリート打音検査記録システム, 日本ロボット学会誌, Vol. 33, No. 7, 68-74, Sep. 2015.

10.    Ishi, C., Even, J., Hagita, N. (2014) Integration of multiple microphone arrays and use of sound reflections for 3D localization of sound sources. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E97-A, No.9, pp.1867-1874, Sep. 2014.

11.    Ishi, C., Ishiguro, H., Hagita, N. (2013). Analysis of relationship between head motion events and speech in dialogue conversations. Speech Communication 57 (2014) 233243, June 2013.

12.    石井カルロス寿憲, 劉超然, 石黒浩, 萩田紀博 (2013). 遠隔存在感ロボットのためのフォルマントによる口唇動作生成手法, 日本ロボット学会誌, Vol. 31, No. 4, 83-90, May 2013.

13.    劉超然, 石井カルロス寿憲, 石黒浩, 萩田紀博 (2013). 人型コミュニケーションロボットのための首傾げ生成手法の提案および評価, 人工知能学会論文誌, vol. 28, no. 2, pp. 112-121, January, 2013.

14.    Liu, C., Ishi, C., Ishiguro, H., Hagita, N. (2013). Generation of nodding, head tilting and gazing for human-robot speech interaction. International Journal of Humanoid Robotics (IJHR), vol. 10, no. 1, January, 2013.

15.    P. Heracleous, M. Sato, C. T. Ishi, and N. Hagita. (2013) Analysis of the visual Lombard effect and automatic recognition experiments. Computer Speech and Language 27(1), 288-300, 2013.

16.    P. Heracleous, C.T. Ishi, T. Miyashita, H. Ishiguro and N. Hagita (2013). Using body-conducted acoustic sensors for human-robot communication in noisy environments. International Journal of Advanced Robotic Systems 10(136), pp 1-7, Feb. 2013.

17.    Becker-Asano, C., Kanda, T., Ishi, C., and Ishiguro, H. (2011). Studying laughter combined with two humanoid robots. AI & Society, Vol. 26 (3), pp. 291-300, 2011.

18.    M. Shiomi, D. Sakamoto, T. Kanda, C.T. Ishi, H. Ishiguro, N. Hagita (2011). Field trial of a networked robot at a train station. International Journal of Social Robotics 3(1), 27-40, Jan. 2011.

19.    石井カルロス寿憲 (2010). ATRのコミュニケーションロボットにおける聴覚および音声理解に関する研究課題, 日本ロボット学会誌, Vol. 28, No. 1, pp. 27-30, Jan.2010. (解説論文)

20.    Ishi, C.T., Ishiguro, H., Hagita, N. (2010). Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP Journal on Audio, Speech, and Music Processing 2010, ID 528193, 1-12 Jan. 2010.

21.    塩見昌裕,坂本大介, 神田崇行,石井カルロス寿憲,石黒浩,萩田紀博 (2009). 半自律型コミュニケーションロボットの開発, 電子情報通信学会論文誌, 人とエージェントのインタラクション特集号, pp.773-783, 2009.

22.    Ishi, C.T., Ishiguro, H., Hagita, N. (2008). Automatic extraction of paralinguistic information using prosodic features related to F0, duration and voice quality. Speech Communication 50(6), 531-543, June 2008.

23.    Ishi, C.T., Matsuda, S., Kanda, T., Jitsuhiro, T., Ishiguro, H., Nakamura, S., Hagita, N. (2008). A robust speech recognition system for communication robots in noisy environments. IEEE Transactions on Robotics, Vol. 24, No. 3, 759-763, June 2008.

24.    Ishi, C.T., Sakakibara, K-I., Ishiguro, H., Hagita, N. (2008). A method for automatic detection of vocal fry. IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, No. 1, 47-56, Jan. 2008.

25.    Ishi, C.T. (2006), The functions of phrase final tones in Japanese: Focus on turn-taking. Journal of Phonetic Society of Japan, Vol. 10 No.3, 18-28, Dec. 2006.

26.    石井カルロス寿憲,榊原健一,石黒浩,萩田紀博 (2006) Vocal Fry発声の自動検出法. 電子情報通信学会論文誌DVol. J89-D, No. 12, 2679-2687, Dec. 2006.

27.    石井カルロス寿憲,石黒浩,萩田紀博 (2006) 韻律および声質を表現した音響特徴と対話音声におけるパラ言語情報の知覚との関連. 情報処理学会論文誌Vol. 47, No. 6, 1782-1793, June 2006.

28.    Ishi, C.T. (2005) Perceptually-related F0 parameters for automatic classification of phrase final tones. IEICE Trans. Inf. & Syst., Vol. E88-D, No. 3, 481-488

29.    Ishi, C.T. (2004). “Analysis of autocorrelation-based parameters in creaky voice,” Acoustical Science and Technology, Vol. 25, No. 4, 299-302.

30.    Ishi, C.T., Hirose, K. & Minematsu, N. (2003). Mora F0 representation for accent type identification in continuous speech and considerations on its relation with perceived pitch values. Speech Communication, Vol. 41, Nos. 2-3, 441-453

 

PhD dissertation

Ishi, C.T. (2001). “Japanese prosody analysis and its applications to Computer-Aided Language Learning systems,” PhD dissertation, University of Tokyo, Sep. 2001.

 

International Conference Papers (refereed)

1.        Ishi, C., Liu, C., Even, J., Hagita, N. (2016). “Hearing support system using environment sensor network,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016), pp. 1275-1280, Oct., 2016.

2.        Ishi, C., Funayama, T., Minato, T., Ishiguro, H. (2016). “Motion generation in android robots during laughing speech,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016), pp. 3327-3332, Oct., 2016.

3.        Ishi, C., Hatano, H., Ishiguro, H. (2016). “Audiovisual analysis of relations between laughter types and laughter motions,” Proc. of the 8th international conference on Speech Prosody (Speech Prosody 2016), pp. 806-810, May, 2016.

4.        Hatano, H., Ishi, C., Komatsubara, T., Shiomi, M., Kanda, T. (2016). “Analysis of laughter events and social status of children in classrooms,” Proc. of the 8th international conference on Speech Prosody (Speech Prosody 2016), pp. 1004-1008, May, 2016.

5.        K. Sakai, T. Minato, C.T. Ishi, and H. Ishiguro,Speech driven trunk motion generating system based on physical constraint,” Proc. of 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2016), pp. 232-239, Aug. 2016.

6.        D.F. Glas, T. Minato, C.T. Ishi, T. Kawahara, and H. Ishiguro,ERICA: The ERATO Intelligent Conversational Android,” Proc. of 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2016), pp. 22-29, Aug. 2016.

7.        Ishi, C., Even, J., Hagita, N. (2015). “Speech activity detection and face orientation estimation using multiple microphone arrays and human position information,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015), pp. 5574-5579, Sep., 2015.

8.        J. Even, F. Ferreri, A. Watanabe, Y. Morales, C. Ishi and N. Hagita (2015). “Audio augmented point clouds for applications in robotics,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015), pp. 4846-4851, Sep., 2015.

9.        A. Watanabe, J. Even, L.Y. Morales and C. Ishi (2015). “Robot-assisted acoustic inspection of infrastructures - Cooperative hammer sounding inspection -,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015), pp. 5942-5947, Sep., 2015.

10.    K. Sakai, C.T. Ishi, T. Minato, H. Ishiguro (2015) ''Online Speech-Driven Head Motion Generating System and Evaluation on a Tele-Operated Robot", In The 24rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2015), Kobe, Hyogo, Japan, pp. 529-534, August, 2015.

11.    Liu, C., Ishi, C.T., Ishiguro, H. (2015) “Bringing the scene back to the tele-operator: auditory scene manipulation for tele-presence systems,” In Proc. of ACM/IEEE International Conference on Human Robot Interaction (HRI 2015). Portland, USA. 279-286, March, 2015.

12.    Ishi, C., Hatano, H., Hagita, N. (2014) "Analysis of laughter events in real science classes by using multiple environment sensor data," Proc. of 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), pp. 1043-1047, Sep. 2014.

13.    J. Even, L. Y. Morales, N. Kallakuri, J. Furrer, C. Ishi, N. Hagita (2014) “Mapping sound emitting structures in 3D”, The 2014 IEEE International Conference on Robotics and Automation (ICRA 2014), June, 2014.

14.    Ishi, C., Hatano, H., and Kiso, M. (2014). “Acoustic-prosodic and paralinguistic analyses of “uun” and “unun”,” Proc. of the 7th international conference on Speech Prosody 2014, pp. 100-104, May, 2014.

15.    Hatano, H., Kiso, M., and Ishi, C. (2014). “Interpersonal factors affecting tones of question-type utterances in Japanese,” Proc. of the 7th international conference on Speech Prosody 2014, pp. 997-1001, May, 2014.

16.    Ishi, C., Even, J., Hagita, N. (2013). “Using multiple microphone arrays and reflections for 3D localization of sound sources,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013), pp. 3937-3942, Nov., 2013.

17.    N. Kallakuri, J. Even, L. Y. Morales, C. Ishi, N. Hagita (2013). “Using Sound Reflections to Detect Moving Entities Out of the Field of View”, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013), pp. 5201-5206, Nov., 2013.

18.    J. Even, N. Kallakuri, L. Y. Morales, C. Ishi, N. Hagita (2013). “Creation of Radiated Sound Intensity Maps Using Multi-Modal Measurements Onboard an Autonomous Mobile Platform”, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013), pp. 3433-3438, Nov., 2013.

19.    Hatano, H., Kiso, M., and Ishi, C. (2013) “Analysis of factors involved in the choice of rising or non-rising intonation in question utterances appearing in conversational speech,” Proc. 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), 2564-2568, August, 2013.

20.    Kallakuri, N., Even, J., Morales, Y., Ishi, C., Hagita, N. (2013) “Probabilistic Approach for Building Auditory Maps with a Mobile Microphone Array,” The 2013 IEEE International Conference on Robotics and Automation (ICRA 2013), pp. 2270-2275, May, 2013.

21.    Ishi, C., Liu, C., Ishiguro, H. and Hagita, N. (2012). “Evaluation of formant-based lip motion generation in tele-operated humanoid robots,” In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2012), Vilamoura, Algarve, Portugal, pp. 2377-2382, October, 2012.

22.    Ishi, C., Liu, C., Ishiguro, H. and Hagita, N. (2012). “Evaluation of a formant-based speech-driven lip motion generation,” In 13th Annual Conference of the International Speech Communication Association (Interspeech 2012), Portland, Oregon, pp. P1a.04, September, 2012.

23.    Ishi, C.T., Hatano, H., Hagita, N. (2012) “Extraction of paralinguistic information carried by mono-syllabic interjections in Japanese,” Proceedings of The 6th International Conference on Speech Prosody (Speech Prosody 2012), 681-684.

24.    Liu, C., Ishi, C., Ishiguro, H., Hagita, N. (2012) “Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction,” Proceedings of 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI2012), 285-292.

25.    Heracleous, P., Even, J., Ishi, C.T., Miyashita, T., Hagita, N. (2011). “Fusion of standard and alternative acoustic sensors for robust speech recognition,” Proc. ICASSP2012, .

26.    Ishi, C., Dong, L., Ishiguro, H., and Hagita, N. (2011). “The effects of microphone array processing on pitch extraction in real noisy environments,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), 550-555.

27.    Ishi, C., Liu, C., Ishiguro, H. and Hagita, N. (2011). “Speech-driven lip motion generation for tele-operated humanoid robots,” Proceedings of International Conference on Auditory-Visual Speech Processing (AVSP2011), 131-135.

28.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2011). “Analysis of acoustic-prosodic features related to paralinguistic information carried by interjections in dialogue speech,” Proceedings of The 12th Annual Conference of the International Speech Communication Association (Interspeech’ 2011), 3133-3136.

29.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2011). “Improved acoustic characterization of breathy and whispery voices,” Proceedings of The 12th Annual Conference of the International Speech Communication Association (Interspeech’ 2011), 2965-2968.

30.    Heracleous, P., Sato, M., Ishi, C.T., Ishiguro, H., Hagita, N. (2011). “Speech production in noisy environments and the effect on automatic speech recognition,” Proc. ICPhS2011, .

31.    Even, J., Heracleous, P., Ishi, C., Hagita, N. (2011) “Range based multi microphone array fusion for speaker activity detection in small meetings,” Proc. Interspeech2011, 2737-2740.

32.    Even, J., Heracleous, P., Ishi, C., Hagita, N. (2011) “Multi-modal front-end for speaker activity detection in small meetings,” Proc. IROS2011, 536-541.

33.    Ishi, C., Dong, L., Ishiguro, H., and Hagita, N. (2010). “Sound interval detection of multiple sources based on sound directivity,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), 1982-1987.

34.    Ishi, C., Sato, M., Lao, S., and Hagita, N. (2010). “Real-time audio-visual voice activity detection for speech recognition in noisy environments,” Proc. International Conference on Auditory-Visual Speech Processing (AVSP2010), 81-84.

35.    Heracleous, P., Sato, M., Ishi, C., and Hagita, N. (2010). “Investigating the role of the Lombard reflex in visual and audiovisual speech recognition,” Proc. International Conference on Auditory-Visual Speech Processing (AVSP2010), 69-72.

36.    Even, J., Ishi, C., Saruwatari, H., Hagita, N. (2010). “Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface” Proc. of The 11th Annual Conference of the International Speech Communication Association (Interspeech2010).

37.    Ishi, C., Ishiguro, H., and Hagita, N. (2010). “Acoustic, electroglottographic and paralinguistic analyses of “rikimi” in expressive speech,” Proceedings of Speech Prosody 2010 (SP2010), ID 100139, 1-4.

38.    Ishi, C.T., Liu, C., Ishiguro, H., and Hagita, N. (2010). “Head motion during dialogue speech and nod timing control in humanoid robots,” Proceedings of 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2010), 293-300.

39.    Ishi, C.T., Chatot, O., Ishiguro, H., and Hagita, N. (2009). “Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), 2027-2032.

40.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2008). “Analysis of inter- and intra-speaker variability of head motions during spoken dialogue,” Proceedings of the International Conference on Auditory-Visual Speech Processing 2008 (AVSP’ 2008), 37-42.

41.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2008). “The meanings of interjections in spontaneous speech,” Proceedings of The 9th Annual Conference of the International Speech Communication Association (Interspeech’ 2008), 1208-1211.

42.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2008). “The roles of breathy/whispery voice qualities in dialogue speech,” Proceedings of Speech Prosody 2008, 45-48.

43.    Ishi, C.T., Haas, J., Wilbers, F.P., Ishiguro, H., and Hagita, N. (2007). “Analysis of head motions and speech, and head motion control in an android,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2007), 548-553.

44.    Wilbers, F.P., Ishi, C.T., Ishiguro, H. (2007). “A blendshape model for mapping facial motions to an android,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2007), 542-547.

45.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2007). “Analysis of head motions and speech in spoken dialogue,” Proceedings of The 8th Annual Conference of the International Speech Communication Association (Interspeech’ 2007), 670-673.

46.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2007). “Acoustic analysis of pressed phonation,” Proceedings of International Conference on Phonetic Sciences (ICPhS’2007), 2057-2060.

47.    Ishi, C.T., Matsuda, S., Kanda, T., Jitsuhiro, T., Ishiguro, H., Nakamura, S., and Hagita, N. (2006). “Robust speech recognition system for communication robots in real environments,” Proceedings of 2006 IEEE-RAS International Conference on Humanoid Robots (Humanois’06), 340-345.

48.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2006). “Evaluation of prosodic and voice quality features on automatic extraction of paralinguistic information,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2006), 374-379.

49.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2006). “Analysis of prosodic and linguistic cues of phrase finals for turn-taking and dialog acts,” Proceedings of The Ninth International Conference of Speech and Language Processing 2006 (Interspeech’2006 - ICSLP), 2006-2009.

50.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2006). “Using Prosodic and Voice Quality Features for Paralinguistic Information Extraction,” CD-ROM Proceedings of The 3rd International Conference on Speech Prosody (SP2006).

51.    Ishi, C.T., Ishiguro, H., and Hagita, N. (2005). “Proposal of Acoustic Measures for Automatic Detection of Vocal Fry,” Proceedings of The 9th European Conference on Speech Communication and Technology (Interspeech’ 2005 - Eurospeech), 481-484.

52.    Ishi, C.T. (2004). “A New Acoustic Measure for Aspiration Noise Detection,” Proceedings of The 8th International Conference of Speech and Language Processing 2004 (ICSLP 2004), Vol. II, 941-944.

53.    Ishi, C.T. (2004). “Analysis of Autocorrelation-based parameters for Creaky Voice Detection,” Proceedings of The 2nd International Conference on Speech Prosody (SP2004), 643-646.

54.    Ishi, C.T., Mokhtari, P., and Campbell, N. (2003). “Perceptually-related acoustic-prosodic features of phrase finals in spontaneous speech,” Proceedings of The 8th European Conference on Speech Communication and Technology (Eurospeech' 03), 405-408.

55.    Mokhtari, P., Pfitzinger, H. R. and Ishi, C. T. (2003). “Principal components of glottal waveforms: towards parameterisation and manipulation of laryngeal voice-quality,” Proceedings of the ISCA Tutorial and Research Workshop on "Voice Quality: Functions, Analysis and Synthesis" (Voqual'03), 133-138.

56.    Ishi, C.T., Campbell, N. (2002). “Analysis of Acoustic-Prosodic Features of Spontaneous Expressive Speech,” Proceedings of 1st International Congress of Phonetics and Phonology, 19.

57.    Ishi, C.T., Hirose, K., Minematsu, N. (2002). “Using Perceptually-related F0- and Power-based Parameters to Identify Accent Types of Accentual Phrases,” Proceedings of 1st International Conference on Speech Prosody (SP2002), 407-410.

58.    Ishi, C.T., Minematsu, N., Hirose, K., Nishide R. (2001). “Identification of Accent and Intonation in sentences for CALL systems,” Proceedings of The 7th European Conference on Speech Communication and Technology (Eurospeech'01), 2455-2458.

59.    Ishi, C.T., Minematsu, N., Hirose, K. (2001). “Recognition of accent and intonation types of Japanese using F0 parameters related to human pitch perception,” Proceedings of ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, 71-76.

60.    Ishi, C.T., Minematsu, N., Hirose, K. (2001). “Investigation on perceived pitch and observed F0 features to represent Japanese pitch accent patterns,” Proceedings of International Conference of Speech Processing, 437-442.

61.    Ishi, C.T., Hirose, K. & Minematsu, N. (2000). “Identification of Japanese Double-Mora Phonemes Considering Speaking Rate for the Use in CALL Systems,” Proceedings of The 6th International Conference of Speech and Language Processing 2000 (ICSLP 2000), Vol. I, 786-789.

62.    Watanabe, M. & Ishi, C.T. (2000). “The distribution of fillers in lectures in the Japanese Language,” Proceedings of The 6th International Conference of Speech and Language Processing 2000 (ICSLP 2000), vol. III, 167-170.

63.    Ishi, C.T. & Hirose, K. (2000). “Influence of speaking rate on segmental duration and its formulation for the use in CALL systems,” Proceedings of Integrating Speech Technology In Language Learning 2000 (InSTiL 2000), 106-108.

64.    Kawai, G & Ishi, C.T. (1999). “A system for learning the pronunciation of Japanese Pitch Accent,” Proceedings of The 6th European Conference on Speech Communication and Technology (Eurospeech' 99), Vol.1, 177-181.

 

 

国内学会・研究会発表論文 (Non-refereed domestic conferences and workshops) (共著の発表論文は一部省略)

1.        石井カルロス寿憲, 港隆史, 石黒浩, "驚き発話に伴う表情および動作の分析", 日本音響学会2017年春季研究発表会, 343-344, Mar. 2017.

2.        石井カルロス寿憲,Jani Even,萩田紀博. "呼び込み音声の韻律特徴の分析", 日本音響学会2017年春季研究発表会, 315-316, Mar. 2017.

3.        劉超然, 石井カルロス寿憲, 石黒浩, "会話ロボットのための談話機能推定", 日本音響学会2017年春季研究発表会, 153-154, Mar. 2017.

4.        井上昂治, 三村正人, 石井カルロス寿憲, 坂井信輔, 河原達也. "DAEを用いたリアルタイム遠隔音声認識", 日本音響学会2017年春季研究発表会, 99-100, Mar. 2017.

5.        石井カルロス,劉超然,Jani Even (2016) “言音環境知能技術を活用した聴覚支援システムの利用効果における予備的評価”,日本音響学会2016年春季研究発表会, 1469-1470, Mar. 2016.

6.        劉超然,石井カルロス,石黒浩 (2016) “言語・韻律情報を用いた話者交替推定の検討”,日本音響学会2016年春季研究発表会, 3-4, Mar. 2016.

7.        波多野博顕,石井カルロス,石黒浩 (2016) “対話相手の違いに応じた発話スタイルの変化:ジェミノイド対話の分析”,日本音響学会2016年春季研究発表会, 343-344, Mar. 2016.

8.        井上昂治, 三村正人, 石井カルロス寿憲, 河原達也 (2016) “自律型アンドロイドERICA のための遠隔音声認識”,日本音響学会2016年春季研究発表会, 1-2, Mar. 2016.

9.        石井カルロス寿憲, 劉超然, Jani Even (2015) “音環境知能技術を活用した聴覚支援システムのプロトタイプの開発”, 43回人工知能学会 AI チャレンジ研究会, Nov. 2015.

10.    境くりま, 港隆史,  石井カルロス寿憲, 石黒浩 (2015) “身体的拘束に基づく音声駆動体幹動作生成システム”, 43回人工知能学会 AI チャレンジ研究会, Nov. 2015.

11.    石井カルロス寿憲, 港隆史, 石黒浩 (2015) “笑い声に伴うアンドロイドロボットの動作生成の検討”, 第33回日本ロボット学会学術講演会,.

12.    石井カルロス寿憲, 波多野博顕, 石黒浩 (2015) “笑いの種類と笑いに伴う表情および動作の分析”, 日本音響学会2015年秋季研究発表会,.

13.    波多野博顕, 石井カルロス寿憲, 石黒浩 (2015) “相槌の「はい」における丁寧度と音響特徴の関係について”, 日本音響学会2015年秋季研究発表会,.

14.    石井カルロス寿憲、Jani Even、萩田紀博 (2015) “音環境知能を利用した家庭内音の識別”,日本音響学会2015年春季研究発表会、Mar. 2015.

15.    波多野博顕, 石井カルロス寿憲, 多胡夏純 (2015) “発話指向性に応じた韻律スタイルの分析−小学校教師の教室発話を対象に−”,日本音響学会2015年春季研究発表会、Mar. 2015.

16.    劉超然,石井カルロス,石黒浩,萩田紀博 (2014) “臨場感の伝わる遠隔操作システムのデザイン:マイクロフォンアレイ処理を用いた音環境の再構築", 41回人工知能学会 AI チャレンジ研究会, Nov. 2014.

17.    波多野博顕, 石井カルロス寿憲 (2014) “自然対話音声における感動詞先行型質問発話の韻律, 日本音響学会2014年秋季研究発表会,.

18.    石井カルロス寿憲、Jani Even、萩田紀博 (2014) “複数のマイクロホンアレイと人位置情報を組み合わせた音声アクティビティの記録システムの改善”,第32回日本ロボット学会学術講演会,.

19.    渡辺敦志, Jani Even, Luis Yoichi Morales, 石井Carlos 寿憲 (2014) “人間協調型移動ロボットによるコンクリート打音検査記録システム”, 第32回日本ロボット学会学術講演会,.

20.    境くりま, 石井カルロス寿憲, 港隆史, 石黒浩 (2014) “発話者の音声に対応する動作生成と遠隔操作ロボットへの動作の付加効果”, 人工知能学会研究会第39回AIチャレンジ研究会 (SIG-Challenge-B303), 7-13, Mar. 2014.

21.    石井カルロス寿憲,波多野博顕,萩田紀博, (2014) "小学校理科室における笑いイベントの分析", 日本音響学会2014年春季研究発表会, 263-264.

22.    石井カルロス寿憲, Jani Even, 塩見昌裕, 萩田紀博, (2013) "複数のマイクロホンアレイを用いた理科室における音源アクティビティの分析", 人工知能学会研究会第38回AIチャレンジ研究会 (SIG-Challenge-B302), 28-33.

23.    石井カルロス寿憲, Jani Even, 塩見昌裕, 小泉智史, 萩田紀博, (2013) "複数のマイクロホンアレイによる音源アクティビティ:小学校理科室におけるデータ分析", 第31回日本ロボット学会学術講演会, RSJ2013AC2D2-01.

24.    石井カルロス寿憲,波多野博顕,萩田紀博,(2013) "「うんうん」と「うーん」の識別における音響特徴の分析", 日本音響学会2013年秋季研究発表会, 265-266.

25.    石井カルロス寿憲、Jani Even、萩田紀博 (2013) “反射音を利用した音源定位と音源の指向性の分析”,日本音響学会2013年春季研究発表会、Mar. 2013887-888.

26.    波多野博顕、新井潤、石井カルロス寿憲 (2013) “自然対話における質問音調の選択に関わる要因の分析”日本音響学会2013年春季研究発表会, Mar. 2013, 429-430

27.    石井カルロス寿憲、Jani Even、萩田紀博 (2012) “複数のマイクロホンアレイおよび空間情報と反射音を利用した音源定位の検討”,人工知能学会AIチャレンジ研究会、Nov. 2012, 64-69.

28.    石井カルロス寿憲、石黒浩、萩田紀博 (2012) “自然対話音声におけるパラ言語情報の伝達に関連するラベルの種類”日本音響学会2012年秋季研究発表会, Sep. 2012, 267-268

29.    波多野博顕、新井潤、石井カルロス寿憲 (2012) “自然対話音声を対象にした発話行為ラベルの構築にむけて”日本音響学会2012年秋季研究発表会, Sep. 2012, 265-266

30.    石井カルロス寿憲、劉超然、石黒浩、萩田紀博 (2012) “フォルマントによる口唇動作生成の試み”日本音響学会2012年春季研究発表会, Mar. 2012, 373-374

31.    石井カルロス寿憲、石黒浩、萩田紀博 (2011) “マイクロホンアレイによる実時間3次元空間での音源定位(デモ)”人工知能学会AIチャレンジ研究会、Dez. 2011

32.    石井カルロス寿憲、劉超然、石黒浩、萩田紀博 (2011) Tele-operating the lip motion of humanoid robots from the operator’s voice 29回日本ロボット学会学術講演会, Sep. 2011, RSJ2011AC1J3-6.

33.    高橋徹、中臺一博、石井Carlos 寿憲、Jani Even、奥乃博、“実環境下での音源定位・音源検出の検討”第29回日本ロボット学会学術講演会、RSJ2011AC1F3-3.

34.    石井カルロス寿憲、新井潤、萩田紀博 (2011) “自然対話音声における発話内の声質の変化の分析”日本音響学会2011年秋季研究発表会, 269-270.

35.    石井カルロス寿憲、石黒浩、萩田紀博 (2011)“気息音発声の音響的表現の改善”日本音響学会2011年春季研究発表会, 269-270.

36.    石井カルロス寿憲、梁棟、石黒浩、萩田紀博 (2010) “ロボットの実環境におけるピッチ抽出に関する考察” 人工知能学会AIチャレンジ研究会 (SIG-Challenge-10), 36-40.

37.    石井カルロス寿憲、佐藤幹、秋本高明、萩田紀博 (2010) “コミュニケーション知能における音声認識モジュール群に関する一考察” 日本ロボット学会学術講演会、ID RSJ2010AC3P3-6.

38.    石井カルロス寿憲、新井潤、萩田紀博 (2010)“対話音声に出現する感動詞における発話意図認識の試み” 日本音響学会2010年秋季研究発表会, Vol. I, 251-252.

39.    石井カルロス寿憲,梁棟,石黒浩,萩田紀博 (2010) “音の指向性を利用した複数音源の発話区間検出の検討 日本音響学会2010年春季研究発表会, Vol. I, 731-734.

40.    石井カルロス寿憲,梁棟,石黒浩,萩田紀博 (2009) MUSIC空間スペクトログラムを用いた複数音源の発話区間検出の検討”第30 人工知能学会 AIチャレンジ研究会 (SIG-Challenge-09), 8-13.

41.    石井カルロス寿憲,石黒浩,萩田紀博 (2009) 声質の変化がもたらすパラ言語情報の分析 日本音響学会2009年秋季研究発表会, Vol. I, 475-476.

42.    石井カルロス寿憲,石黒浩,萩田紀博 (2009) 声質に関連する音響パラメータの分析 日本音響学会2009年秋季研究発表会, Vol. I, 327-328.

43.    石井カルロス寿憲,Olivier Chatot,石黒浩,萩田紀博 (2009) “実環境におけるMUSIC法を用いた3次元音源定位の評価” 28 人工知能学会 AIチャレンジ研究会 (SIG-Challenge-08).

44.  石井カルロス寿憲,Olivier Chatot,石黒浩,萩田紀博 (2009) 3次元空間での音源方向推定の実環境における評価およびリアルタイム性の評価 日本音響学会2009年春季研究発表会, Vol. I, 699-702.

45.    石井カルロス寿憲,石黒浩,萩田紀博 (2008) 自然発話に現れる感動詞の発話スタイルと機能の分析 日本音響学会2008年秋季研究発表会, Vol. I, 269-270.

46.    石井カルロス寿憲,石黒浩,萩田紀博 (2008) Breathy/whispery発声の音響特徴と音声コミュニケーションにおける役割” 電子情報通信学会技術研究報告, Vol. 108, No. 116, 127-132.

47.    石井カルロス寿憲,石黒浩,萩田紀博 (2008) Breathy/whispery発声の音声コミュニケーションにおける役割” 日本音響学会2008年春季研究発表会, Vol. I, 357-358.

48.    石井カルロス寿憲,石黒浩,萩田紀博 (2007) “発話音声に関わる頭部動作の分析及びアンドロイドロボットの頭部制御” 26 人工知能学会 AIチャレンジ研究会 (SIG-Challenge-07), 46-51.

49.    石井カルロス寿憲,石黒浩,萩田紀博 (2007) “発話音声に伴う頭部動作の分析” 日本音響学会2007年秋季研究発表会, Vol. I, 109-110.

50.    石井カルロス寿憲,石黒浩,萩田紀博 (2007) “EGGを用いた「りきみ」発声の音響分析” 日本音響学会2007年春季研究発表会, Vol. I, 221-222.

51.    Ishi. C.T., Ishiguro, H., Hagita, N. (2006) “Acoustic analysis of pressed voice,” Fourth Joint Meeting: ASA and ASJ, J. Acoust,. Soc, Am., Vol. 120, No. 5, Pt. 2, pp. 3374, Nov. 2006.

52.    石井カルロス寿憲,松田繁樹,神田崇行,實廣貴敏,石黒浩,中村哲,萩田紀博 (2006)“コミュニケーションロボットの音声認識システムの実環境における評価”第24回人口知能学会AIチャレンジ研究会 (SIG-Challenge-06), 23-28.

53.    石井カルロス寿憲,石黒浩,萩田紀博 (2006)“りきみの自動検出のための音響分析” 電子情報通信学会技術研究報告,Vol. 106No. 1781-6.

54.    石井カルロス寿憲,石黒浩,萩田紀博 (2006)“喉頭を力んだ発声の音響特徴の分析” 日本音響学会2006年春季研究発表会,Vol. I227-228.

55.    石井カルロス寿憲,石黒浩,萩田紀博 (2005)“対話音声における韻律と声質の特徴を利用したパラ言語情報の抽出の検討”第22回人口知能学会AIチャレンジ研究会(SIG-Challenge-05),71-76.

56.    石井カルロス寿憲,石黒浩,萩田紀博 (2005)“韻律と声質に関連する音響パラメータを用いたパラ言語情報の抽出の検討”日本音響学会2005年秋季研究発表会,233-234.

57.    石井カルロス寿憲 (2004)“母音区間の息漏れに関連する音響パラメータの検討” 日本音響学会2004年秋季研究発表会,Vol. I295-296.

58.    石井カルロス寿憲,ニック・キャンベル (2004)“句末の機能的役割”日本音響学会2004年春季研究発表会,Vol. I235-236.

59.    Mokhtari, P., Pfitzinger, H. R., Ishi, C. T. and Campbell, N. (2004). "Laryngeal voice quality conversion by glottal waveshape PCA", in Proceedings of the Spring­2004 Meeting of the Acoustical Society of Japan, Atsugi, Japan, Paper 2-P-6, pp.341-342.

60.    石井カルロス寿憲 (2003)Creaky発声の音響的特徴の分析” 日本音響学会2003年秋季研究発表会,Vol. I235-236.

61.    Ishi, C.T., Campbell, N. (2003). “Acoustic-Prosodic Analysis of Phrase Finals in Expressive Speech,” Proceedings of The 1st JST/CREST International Workshop on Expressive Speech Processing, 85-88.

62.    石井カルロス寿憲,ニック・キャンベル (2003)“日常会話における句末の音響・韻律的特徴の分析”日本音響学会2003年春季研究発表会,Vol. I311-312.

63.    石井カルロス寿憲,ニック・キャンベル (2002)“表現豊かな発話様式の韻律的特徴の分析”日本音響学会2002年秋季研究発表会,Vol. I275-276.

64.    Ishi, C.T., Hirose, K., Minematsu, N. (2002). “Investigations on a quantified representation of pitch movements in syllable units,” Proceedings of The 2002 Spring Meeting of the Acoustical Society of Japan, Vol. I, 419-420.

65.    石井カルロス寿憲,峯松信明,広瀬啓吉 (2001)“ピッチ知覚に対応したモーラピッチの自動抽出”日本音声学会全国大会予稿集,13-18.

66.    石井カルロス寿憲,峯松信明,広瀬啓吉 (2001)“日本語のアクセント・イントネーションにおけるピッチ知覚と対応したモーラピッチの自動抽出”日本音響学会2001年秋季研究発表会,Vol. I445-446.

67.    石井カルロス寿憲,峯松信明,広瀬啓吉 (2001)“ピッチ知覚を考慮した日本語連続音声のアクセント型判定” 電子情報通信学会技術研究報告,Vol. 101No. 27023-30.

68.    Ishi, C.T., Minematsu, N., Hirose, K. (2001). “Relationship between acoustically observed F0 and perceived pitch for Japanese accent and intonation,” Technical Report of Institute of Electronics, Information and Communication Engineers, SP2001-41, 17-22.

69.    石井カルロス寿憲,広瀬啓吉,峯松信明 (2001)“発音教育システムにおけるイントネーションの自動分類”日本音響学会2001年春季研究発表会,Vol. I327-328.

70.    西出隆二,石井カルロス寿憲,峯松信明,広瀬啓吉 (2001)“日本語のアクセントを対象とした発音教育システム構築に関する検討”日本音響学会研究発表会2001年春季,Vol. I269-270.

71.    石井カルロス寿憲,西出隆二,峯松信明,広瀬啓吉 (2001)“日本語のアクセント・イントネーションを対象とした発音教育システム構築に関する検討” 電子情報通信学会技術研究報告,Vol. 100No. 59433-40.

72.    石井カルロス寿憲,広瀬啓吉,峯松信明 (2000)“等時性の観点からの日本語のモーラタイミングに関する考察”日本音響学会2000年秋季研究発表会,Vol. I199-200.

73.    石井カルロス寿憲,藤本克彦,広瀬啓吉 (2000)“話速を考慮した日本語の特殊拍判別”電子情報通信学会技術研究報告,Vol. 100No. 9717-24.

74.    石井カルロス寿憲,広瀬啓吉,峯松信明 (2000)“話速に伴う音の持続時間変化の分析”日本音響学会2000年春季研究発表会,Vol. I235-236.

75.    石井カルロス寿憲,河合剛,広瀬啓吉 (1999)“日本語単語のピッチアクセント型の発音学習システム”日本音響学会1999年春季研究発表会,Vol. I245-246.