Publications

    Journals (Peer reviewed)

  • Coupling a generative model with a discriminative learning framework for speaker verification
    X. Lu, P. Shen, Y. Tsao, H. Kawai
    In IEEE/ACM Trans. on Audio, Speech and Language Processing, vol. 29, 2021.
  • Knowledge distillation-based representation learning for short-utterance spoken language identification
    Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
    In IEEE/ACM Trans. on Audio, Speech and Language Processing, vol. 28, pp. 2674-2683, 2020.
  • Regularization of neural network model with distance metric learning for i-vector based spoken language identification
    Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
    In Computer Speech and Language, vol. 44, pp. 48–60, July 2017.
  • Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription
    Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori, Hisashi Kawai
    In Speech Communication, vol.82, pp. 1-13, Sep, 2016.
  • Multi-Stream Sparse Representation Features for Noise Robust Audio-Visual Speech Recognition
    Peng Shen, Satoshi Tamura and Satoru Hayamizu
    In journal of Acoustical Science and Technology, vol.35, no.1, 2014.
  • International Conferences (Peer reviewed)

  • Generative linguistic representation for spoken language identification
    P. Shen, X. Lu, H. Kawai
    in Proc. ASRU, 2023. (Accepted)
  • Cross-modal alignment with optimal transport for CTC-based ASR
    X. Lu, P. Shen, Y. Tsao, H. Kawai
    in Proc. ASRU, 2023. (Accepted)
  • Unsupervised neural adaptation model based on optimal transport for spoken language identification
    X. Lu, P. Shen, Y. Tsao, H. Kawai
    in Proc. ICASSP, 2022.
  • Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition
    P. Shen, X. Lu, H. Kawai
    in Proc. SLT, 2022.
  • Transducer-based language embedding for spoken language identification
    P. Shen, X. Lu, H. Kawai
    in Proc. Interspeech, 2022.
  • Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification
    X. Lu, P. Shen, Y. Tsao, H. Kawai
    in Proc. APSIPA ASC, 2021.
  • Unsupervised neural adaptation model based on optimal transport for spoken language identification
    X. Lu, P. Shen, Y. Tsao, H. Kawai
    in Proc. ICASSP, 2021.
  • Investigation of NICT submission for short-duration speaker verification challenge 2020
    Peng Shen, Xugang Lu, and Hisashi Kawai
    in Proc. of Interspeech, 2020.
  • Joint Training End-to-End Speech Recognition Systems with Speaker Attributes
    S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai
    in Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), 2020.
  • Compensation on x-vector for short utterance spoken language identification
    Peng Shen, Xugang Lu, Komei Sugiura, Sheng Li and Hisashi Kawai
    in Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), 2020.
  • Class-wise Centroid Distance Metric Learning for Acoustic Event Detection
    Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
    in Proc. of Interspeech, 2019.
  • Investigating Radical-based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese
    Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
    in Proc. of Interspeech, 2019.
  • End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
    Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
    in Proc. of Interspeech, 2019.
  • Improving Transformer-based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
    Sheng Li, Raj Dabre, Xugang Lu, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
    in Proc. of Interspeech, 2019.
  • Interactive learning of teacher-student model for short utterance spoken language identification
    Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
    In Proc. of ICASSP, May 2019.
  • Improving very deep time-delay neural network with vertical-attention for effectively training CTC-based ASR systems
    Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
    In Proc. of SLT, Athens, Greece, 18-21, Dec. 2018.
  • Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification
    Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
    In Proc. of Interspeech2018, India, 2-6, Sep. 2018.
  • Temporal attentive pooling for acoustic event detection
    Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
    in Proc. of Interspeech2018, India, 2-6, Sep. 2018.
  • Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
    Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
    In Proc. of Interspeech, India, 2-6, Sep. 2018.
  • Incremental training and construction the very deep convolutional residual network acoustic models
    Sheng Li, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara and Hisashi Kawai
    In Proc. of ASRU, Okinawa, Japan, 16-20, Dec. 2017.
  • Conditional Generative Adversarial Nets Classifier for Spoken Language Identification
    Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
    In Proc. of Interspeech, Stochholm, Sweden, Aug. 20-24, 2017.
  • Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation
    Peng Shen, Xugang Lu, Hisashi Kawai
    In The 10th International Symposium on Chinese Spoken Language Processing, Oct. 2016.
  • "Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition"
    Peng Shen, Xugang Lu, Hisashi Kawai,
    in The 9th International Symposium on Chinese Spoken Language Processing, Oct. 2016.
  • A Pseudo-task Design in Multi-task Learning Deep Neural Network for Speaker Recognition
    Xugang Lu, Peng Shen, Hisashi Kawai
    In The 9th International Symposium on Chinese Spoken Language Processing, Oct. 2016.
  • Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
    Xugang Lu, Peng Shen, Yu Tsao, and Hisashi Kawai
    In Proc. Interspeech, Sep. 2016.
  • Local fisher discriminant analysis for spoken language identification
    Peng Shen, Xugang Lu, Lemao Liu and Hisashi Kawai
    In Proc. ICASSP, Mar. 2016.
  • Sparse representation with temporal max-smoothing for acoustic event detection
    Xugang Lu, Peng Shen, Yu Tsao, Chiori Hori and Hisashi Kawai
    In Proc. Interspeech, pp. 1176-1180, Sep. 2015.
  • The NICT ASR System for IWSLT 2014
    Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori
    In International Workshop on Spoken Language Translation (IWSLT), Lake Tahoe, USA, pp.113-118, Dec. 2014.
  • Spectral Patch Based Sparse Coding for Acoustic Event Detection
    Xugang Lu, Yu Tsao, Peng Shen, Chiori Hori
    In The 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, Set. 2014.
  • Audio-visual Interaction in Sparse Representation Features for Noise Robust Audio-visual Speech Recognition
    Peng Shen, Satoshi Tamura and Satoru Hayamizu
    In The 12th International Conference on Auditory-Visual Speech Processing(AVSP), Annecy, France, pp.43-48, Aug. 2013.
  • Feature Reconstruction using Sparse Imputation for Noise Robust Audio-Visual Speech Recognition
    Peng Shen, Satoshi Tamura and Satoru Hayamizu
    In Int. Conf. APSIPA ASC, USA, ps.5-sla.18, no.125, pp.1-4, Dec, 2012.
  • Evaluation of real-time audio-visual speech recognition
    Peng Shen, Satoshi Tamura and Satoru Hayamizu
    In The 9th International Conference on Auditory-Visual Speech Processing (AVSP), Hakone, Japan, pp.77-80, Oct. 2010.
  • Domestic Conferences

  • Investigation on Multi-task Universal Speech Models
    P. Shen, X. Lu, H. Kawai
    in Autumn Meeting of Acoustical Society of Japan, 2023.
  • Investigation on sub-character tokenization for RNN-Transducer
    P. Shen, X. Lu, H. Kawai
    in Autumn Meeting of Acoustical Society of Japan, 2022.
  • A Study on Language Modeling with BERT-based Word Embedding
    T. Ogura, M. Fujimoto, P. Shen, X. Lu, H. Kawai
    in Acoustical Society of Japan, Sep, 2021. [In Japanese]
  • Unsupervised Feature Learning based on wav2vec for Cross-channel Spoken Language Identification
    T. Yoshimoto, P. Shen, X. Lu, R. Takashima, T. Takiguchi, H. Kawai
    in Acoustical Society of Japan, spring, 2021. [In Japanese]
  • Improvement of x-vector for short utterance spoken language identification
    P. Shen, X. Lu, K. Sugiura, S. Li, H. Kawai
    in Acoustical Society of Japan, spring, 2020.
  • End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
    S. Li, C. Ding, X. Lu, P. Shen and H. Kawai
    in Acoustical Society of Japan, spring, 2020.
  • Joint Training End-to-End Systems for Speech and Speaker Recognition with Speaker Attributes
    S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai
    in Acoustical Society of Japan, spring, 2020.
  • Investigation of multi-domain training for speech recognition
    Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
    in 2019 Acoustical Society of Japan, Mar. 2019.
  • Short utterance-based spoken language identification
    Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
    in 2018 Acoustical Society of Japan, Set. 2018.
  • An empirical comparison of sequence training methods for the very deep residual time-delay neural network
    Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen and Hisashi Kawai
    in 2018 Acoustical Society of Japan, Sep. 2018.
  • Improving CTC-based acoustic model with very deep residual neural network
    S. Li, X. Lu, R.Takashima, P. Shen and H. Kawai
    In Acoustical Society of Japan, spring, 2018.
  • cGAN-classifier: Conditional Generative Adversarial Nets for Classification
    Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
    In Acoustical Society of Japan, Set. 2017.
  • Very deep convolutional residual network acoustic models for Japanese lecture transcription
    Sheng Li, Xugang Lu, Peng Shen and Hisashi Kawai
    In Acoustical Society of Japan, Set. 2017.
  • Building WFST based Grapheme to Phoneme Conversion for Khmer
    Kak Soky, Xugang Lu, Peng Shen, Hiroaki Kato, Hisashi Kawai, Chuon Vanna, Vichet Chea
    In KNLP, 2016.
  • Investigation on nonparametric discriminant analysis for language identification
    Peng Shen, Xugang Lu and Hisashi Kawai
    In 2016 Spring Meeting of Acoustical Society of Japan, Mar. 2016.
  • The 2014 NICT Automatic Speech Recognition System
    Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori
    In 2015 Spring Meeting of Acoustical Society of Japan, 1-P-20, March, 2015.
  • Feature Reconstruction using Sparse Imputation for Noise Robust Audio-Visual Speech Recognition
    Peng Shen, Satoshi Tamura and Satoru Hayamizu
    In 2012 Autumn Meeting Acoustical Society of Japan, 3-P-8, pp.217-218, Set. 2012. (In Japanese)
  • Recent efforts for high-performance multimodal speech recognition
    Satoshi Tamura, Peng Shen, Hiroya Okuda, Naoya Ukai, Takuya Kawasaki, Takumi Seko, Satoru Hayamizu
    In Technical Reports. Information Processing Society of Japan, vol.112, no.369, pp.41-46, Dec. 2012. (in Japanese)
  • Development of Real-time Audio-Visual Speech Recognition System
    Peng Shen, Satoshi Tamura and Satoru Hayamizu
    In 2010 Spring Meeting of Acoustical Society of Japan, 1-P-27, pp.217-218, March, 2010.