Publications

Journals (Peer reviewed)

Coupling a generative model with a discriminative learning framework for speaker verification
X. Lu, P. Shen, Y. Tsao, H. Kawai
In IEEE/ACM Trans. on Audio, Speech and Language Processing, vol. 29, 2021.
Knowledge distillation-based representation learning for short-utterance spoken language identification
Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
In IEEE/ACM Trans. on Audio, Speech and Language Processing, vol. 28, pp. 2674-2683, 2020.
Regularization of neural network model with distance metric learning for i-vector based spoken language identification
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
In Computer Speech and Language, vol. 44, pp. 48–60, July 2017.
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription
Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori, Hisashi Kawai
In Speech Communication, vol.82, pp. 1-13, Sep, 2016.
Multi-Stream Sparse Representation Features for Noise Robust Audio-Visual Speech Recognition
Peng Shen, Satoshi Tamura and Satoru Hayamizu
In journal of Acoustical Science and Technology, vol.35, no.1, 2014.

International Conferences (Peer reviewed)

Generative linguistic representation for spoken language identification
P. Shen, X. Lu, H. Kawai
in Proc. ASRU, 2023. (Accepted)
Cross-modal alignment with optimal transport for CTC-based ASR
X. Lu, P. Shen, Y. Tsao, H. Kawai
in Proc. ASRU, 2023. (Accepted)
Unsupervised neural adaptation model based on optimal transport for spoken language identification
X. Lu, P. Shen, Y. Tsao, H. Kawai
in Proc. ICASSP, 2022.
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition
P. Shen, X. Lu, H. Kawai
in Proc. SLT, 2022.
Transducer-based language embedding for spoken language identification
P. Shen, X. Lu, H. Kawai
in Proc. Interspeech, 2022.
Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification
X. Lu, P. Shen, Y. Tsao, H. Kawai
in Proc. APSIPA ASC, 2021.
Unsupervised neural adaptation model based on optimal transport for spoken language identification
X. Lu, P. Shen, Y. Tsao, H. Kawai
in Proc. ICASSP, 2021.
Investigation of NICT submission for short-duration speaker verification challenge 2020
Peng Shen, Xugang Lu, and Hisashi Kawai
in Proc. of Interspeech, 2020.
Joint Training End-to-End Speech Recognition Systems with Speaker Attributes
S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai
in Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), 2020.
Compensation on x-vector for short utterance spoken language identification
Peng Shen, Xugang Lu, Komei Sugiura, Sheng Li and Hisashi Kawai
in Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), 2020.
Class-wise Centroid Distance Metric Learning for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
in Proc. of Interspeech, 2019.
Investigating Radical-based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese
Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
in Proc. of Interspeech, 2019.
End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
in Proc. of Interspeech, 2019.
Improving Transformer-based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
Sheng Li, Raj Dabre, Xugang Lu, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
in Proc. of Interspeech, 2019.
Interactive learning of teacher-student model for short utterance spoken language identification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
In Proc. of ICASSP, May 2019.
Improving very deep time-delay neural network with vertical-attention for effectively training CTC-based ASR systems
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
In Proc. of SLT, Athens, Greece, 18-21, Dec. 2018.
Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
In Proc. of Interspeech2018, India, 2-6, Sep. 2018.
Temporal attentive pooling for acoustic event detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
in Proc. of Interspeech2018, India, 2-6, Sep. 2018.
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
In Proc. of Interspeech, India, 2-6, Sep. 2018.
Incremental training and construction the very deep convolutional residual network acoustic models
Sheng Li, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara and Hisashi Kawai
In Proc. of ASRU, Okinawa, Japan, 16-20, Dec. 2017.
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
In Proc. of Interspeech, Stochholm, Sweden, Aug. 20-24, 2017.
Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation
Peng Shen, Xugang Lu, Hisashi Kawai
In The 10th International Symposium on Chinese Spoken Language Processing, Oct. 2016.
"Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition"
Peng Shen, Xugang Lu, Hisashi Kawai,
in The 9th International Symposium on Chinese Spoken Language Processing, Oct. 2016.
A Pseudo-task Design in Multi-task Learning Deep Neural Network for Speaker Recognition
Xugang Lu, Peng Shen, Hisashi Kawai
In The 9th International Symposium on Chinese Spoken Language Processing, Oct. 2016.
Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
Xugang Lu, Peng Shen, Yu Tsao, and Hisashi Kawai
In Proc. Interspeech, Sep. 2016.
Local fisher discriminant analysis for spoken language identification
Peng Shen, Xugang Lu, Lemao Liu and Hisashi Kawai
In Proc. ICASSP, Mar. 2016.
Sparse representation with temporal max-smoothing for acoustic event detection
Xugang Lu, Peng Shen, Yu Tsao, Chiori Hori and Hisashi Kawai
In Proc. Interspeech, pp. 1176-1180, Sep. 2015.
The NICT ASR System for IWSLT 2014
Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori
In International Workshop on Spoken Language Translation (IWSLT), Lake Tahoe, USA, pp.113-118, Dec. 2014.
Spectral Patch Based Sparse Coding for Acoustic Event Detection
Xugang Lu, Yu Tsao, Peng Shen, Chiori Hori
In The 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, Set. 2014.
Audio-visual Interaction in Sparse Representation Features for Noise Robust Audio-visual Speech Recognition
Peng Shen, Satoshi Tamura and Satoru Hayamizu
In The 12th International Conference on Auditory-Visual Speech Processing(AVSP), Annecy, France, pp.43-48, Aug. 2013.
Feature Reconstruction using Sparse Imputation for Noise Robust Audio-Visual Speech Recognition
Peng Shen, Satoshi Tamura and Satoru Hayamizu
In Int. Conf. APSIPA ASC, USA, ps.5-sla.18, no.125, pp.1-4, Dec, 2012.
Evaluation of real-time audio-visual speech recognition
Peng Shen, Satoshi Tamura and Satoru Hayamizu
In The 9th International Conference on Auditory-Visual Speech Processing (AVSP), Hakone, Japan, pp.77-80, Oct. 2010.

Domestic Conferences

Investigation on Multi-task Universal Speech Models
P. Shen, X. Lu, H. Kawai
in Autumn Meeting of Acoustical Society of Japan, 2023.
Investigation on sub-character tokenization for RNN-Transducer
P. Shen, X. Lu, H. Kawai
in Autumn Meeting of Acoustical Society of Japan, 2022.
A Study on Language Modeling with BERT-based Word Embedding
T. Ogura, M. Fujimoto, P. Shen, X. Lu, H. Kawai
in Acoustical Society of Japan, Sep, 2021. [In Japanese]
Unsupervised Feature Learning based on wav2vec for Cross-channel Spoken Language Identification
T. Yoshimoto, P. Shen, X. Lu, R. Takashima, T. Takiguchi, H. Kawai
in Acoustical Society of Japan, spring, 2021. [In Japanese]
Improvement of x-vector for short utterance spoken language identification
P. Shen, X. Lu, K. Sugiura, S. Li, H. Kawai
in Acoustical Society of Japan, spring, 2020.
End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
S. Li, C. Ding, X. Lu, P. Shen and H. Kawai
in Acoustical Society of Japan, spring, 2020.
Joint Training End-to-End Systems for Speech and Speaker Recognition with Speaker Attributes
S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai
in Acoustical Society of Japan, spring, 2020.
Investigation of multi-domain training for speech recognition
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
in 2019 Acoustical Society of Japan, Mar. 2019.
Short utterance-based spoken language identification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
in 2018 Acoustical Society of Japan, Set. 2018.
An empirical comparison of sequence training methods for the very deep residual time-delay neural network
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen and Hisashi Kawai
in 2018 Acoustical Society of Japan, Sep. 2018.
Improving CTC-based acoustic model with very deep residual neural network
S. Li, X. Lu, R.Takashima, P. Shen and H. Kawai
In Acoustical Society of Japan, spring, 2018.
cGAN-classifier: Conditional Generative Adversarial Nets for Classification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
In Acoustical Society of Japan, Set. 2017.
Very deep convolutional residual network acoustic models for Japanese lecture transcription
Sheng Li, Xugang Lu, Peng Shen and Hisashi Kawai
In Acoustical Society of Japan, Set. 2017.
Building WFST based Grapheme to Phoneme Conversion for Khmer
Kak Soky, Xugang Lu, Peng Shen, Hiroaki Kato, Hisashi Kawai, Chuon Vanna, Vichet Chea
In KNLP, 2016.
Investigation on nonparametric discriminant analysis for language identification
Peng Shen, Xugang Lu and Hisashi Kawai
In 2016 Spring Meeting of Acoustical Society of Japan, Mar. 2016.
The 2014 NICT Automatic Speech Recognition System
Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori
In 2015 Spring Meeting of Acoustical Society of Japan, 1-P-20, March, 2015.
Feature Reconstruction using Sparse Imputation for Noise Robust Audio-Visual Speech Recognition
Peng Shen, Satoshi Tamura and Satoru Hayamizu
In 2012 Autumn Meeting Acoustical Society of Japan, 3-P-8, pp.217-218, Set. 2012. (In Japanese)
Recent efforts for high-performance multimodal speech recognition
Satoshi Tamura, Peng Shen, Hiroya Okuda, Naoya Ukai, Takuya Kawasaki, Takumi Seko, Satoru Hayamizu
In Technical Reports. Information Processing Society of Japan, vol.112, no.369, pp.41-46, Dec. 2012. (in Japanese)
Development of Real-time Audio-Visual Speech Recognition System
Peng Shen, Satoshi Tamura and Satoru Hayamizu
In 2010 Spring Meeting of Acoustical Society of Japan, 1-P-27, pp.217-218, March, 2010.