Zhouhan Lin (林洲汉)

  • E-mail : lastname[dot]firstname[at]gmail.com
  • Phone : 650-788-4556

Hello! I'm Zhouhan Lin, a visiting scientist at Facebook AI Research. In 2021, I am going to join Shanghai Jiaotong University as an assistant professor. I graduated from the Mila lab in the University of Montreal, where I have the honor to be supervised by Yoshua Bengio.
My research interests include machine learning and natural language processing, especially in attention mechanisms and its applications, language modeling, question answering, syntactic parsing, and binary networks.

Selected Publications

Please visit my Google Scholar page for a full list of my publications.

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

Yikang Shen*, Zhouhan Lin*, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio

ACL 2018 | pdf | slides | codes

We propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shiftreduce parsing schemes, our approach is free from the potential problem of compounding errors, while being faster and easier to parallelize. Our model achieves competitive performance amongst single model, discriminative parsers in the PTB dataset and outperforms previous models in the CTB dataset.

Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences

Athul Paul Jacob*, Zhouhan Lin*, Alessandro Sordoni, Yoshua Bengio

ACL 2018 workshop | pdf | poster

We propose a hierarchical model for sequential data that learns a tree on-the-fly, i.e. while reading the sequence. In the model, a recurrent network adapts its structure and reuses recurrent weights in a recursive manner. This creates adaptive skip-connections that ease the learning of long-term dependencies. The tree structure can either be inferred without supervision through reinforcement learning, or learned in a supervised manner. We provide preliminary experiments in a novel Math Expression Evaluation (MEE) task, which is explicitly crafted to have a hierarchical tree structure that can be used to study the effectiveness of our model. Additionally, we test our model in a wellknown propositional logic and language modelling tasks. Experimental results show the potential of our approach.

Neural Language Modeling by Jointly Learning Syntax and Lexicon

Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville

ICLR 2018 | pdf | slides | codes | poster

We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.

A structured self-attentive Sentence Embedding

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou and Yoshua Bengio

ICLR 2017 | pdf | slides | codes | poster

We propose a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

Neural networks with few multiplications

Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio

ICLR 2016 | pdf | slides | codes | poster

For most deep learning algorithms training is notoriously time consuming. Since most of the computation in training neural networks is typically spent on floating point multiplications, we investigate an approach to training that eliminates the need for most of these. Our method consists of two parts: First we stochastically binarize weights to convert multiplications involved in computing hidden states to sign changes. Second, while back-propagating error derivatives, in addition to binarizing the weights, we quantize the representations at each layer to convert the remaining multiplications into binary shifts. Experimental results across 3 popular datasets (MNIST, CIFAR10, SVHN) show that this approach not only does not hurt classification performance but can result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardwarefriendly training of neural networks.

How far can we go without convolution: Improving fully-connected networks

Zhouhan Lin, Roland Memisevic, and Kishore Konda

ICLR 2016 workshop | pdf | slides | codes | poster

We propose ways to improve the performance of fully connected networks. We found that two approaches in particular have a strong effect on performance: linear bottleneck layers and unsupervised pre-training using autoencoders without hidden unit biases. We show how both approaches can be related to improving gradient flow and reducing sparsity in the network. We show that a fully connected network can yield approximately 70% classification accuracy on the permutationinvariant CIFAR-10 task, which is much higher than the current state-of-the-art. By adding deformations to the training data, the fully connected network achieves 78% accuracy, which is just 10% short of a decent convolutional network.

Recurrent Neural Networks with Limited Numerical Precision

Joachim Ott*, Zhouhan Lin*, Ying Zhang, Shih-Chii Liu, and Yoshua Bengio

NIPS 2016 workshop | pdf | slides | poster

Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high. Therefore, there is a great interest in optimizing the computations performed with these models especially when considering development of specialized low-power hardware for deep networks. One way of reducing the computational needs is to limit the numerical precision of the network weights and biases. This has led to different proposed rounding methods which have been applied so far to only Convolutional Neural Networks and Fully-Connected Networks. This paper addresses the question of how to best reduce weight precision during training in the case of RNNs. We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets. The results show that the weight binarization methods do not work with the RNNs. However, the stochastic and deterministic ternarization, and pow2-ternarization methods gave rise to low-precision RNNs that produce similar and even higher accuracy on certain datasets therefore providing a path towards training more efficient implementations of RNNs in specialized hardware.

Deep learning-based classification of hyperspectral data

Yushi Chen, Zhouhan Lin, Xing Zhao, Gang Wang, and Yanfeng Gu

Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2014 | pdf | codes

Classification is one of the most popular topics in hyperspectral remote sensing. In the last two decades, a huge number of methods were proposed to deal with the hyperspectral data classification problem. However, most of them do not hierarchically extract deep features. In this paper, the concept of deep learning is introduced into hyperspectral data classification for the first time. First, we verify the eligibility of stacked autoencoders by following classical spectral information-based classification. Second, a new way of classifying with spatial-dominated information is proposed. We then propose a novel deep learning framework to merge the two features, from which we can get the highest classification accuracy. The framework is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression. Specifically, as a deep learning architecture, stacked autoencoders are aimed to get useful high-level features. Experimental results with widely-used hyperspectral data indicate that classifiers built in this deep learning-based framework provide competitive performance. In addition, the proposed joint spectral–spatial deep neural network opens a new window for future research, showcasing the deep learning-based methods’ huge potential for accurate hyperspectral data classification.



Mila, University of Montreal

Department of Computer Science and Operational Research

Supervisor: Yoshua Bengio

Harbin Institute of Technology

Department of Electronics and Information Engineering

Supervisor: Yushi Chen

Honored Masters Graduate of HIT (2/36)

Excellent Masters Thesis (2/36)

Harbin Institute of Technology

Department of Electronics and Information Engineering

Honored Graduate of HIT (top 10%)

Excellent Graduation Thesis (top 5%)


Google Inc., Montreal

Student Researcher

Google Inc., New York

Summer Intern

Microsoft Research, Montreal

Student Researcher

IBM Research, New York

Summer Intern

Chinese Academy of Sciences, Ningbo

Undergrad Intern

Professional Servicess


  • AdeptMind Scholarship, 2018
  • ICLR Travel Award, 2016, 2017, 2018
  • 2nd Place in NIPS Demonstration, 2017
  • Best workshop paper mention, NIPS 2016
  • First-class Scholarship, 2012, 2013
  • People’s Scholarship, 2011
  • Shinchang Corporation Scholarship, 2010
  • Freshman Foundation for Research and Innovation, 2008

Contact Me

  • Adress:

    Roomm. 3248, Pav. Andre-Aisenstadt,Universite de Montreal
    Montreal, Quebec, Canada. H3T 1J4

  • Email: