【论文学习】论文目录

Optimization / Training Techniques

1

Improving neural networks by preventing co-adaptation of feature detectors (2012), G. Hinton et al. [pdf]
Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al. [pdf]
Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy [pdf]

额外：

Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. “Layer normalization.” arXiv preprint arXiv:1607.06450 (2016). [pdf] (Update of Batch Normalization)
Courbariaux, Matthieu, et al. “Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1.” [pdf] (New Model,Fast)
Jaderberg, Max, et al. “Decoupled neural interfaces using synthetic gradients.” arXiv preprint arXiv:1608.05343 (2016). [pdf] (Innovation of Training Method,Amazing Work) 🌟🌟🌟🌟🌟
Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. “Net2net: Accelerating learning via knowledge transfer.” arXiv preprint arXiv:1511.05641 (2015). [pdf] (Modify previously trained network to reduce training epochs) 🌟🌟🌟
Wei, Tao, et al. “Network Morphism.” arXiv preprint arXiv:1603.01670 (2016). [pdf] (Modify previously trained network to reduce training epochs) 🌟🌟🌟

2 Optimization

Adam: A method for stochastic optimization (2014), D. Kingma and J. Ba [pdf]
Training very deep networks (2015), R. Srivastava et al. [pdf]
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015), K. He et al. [pdf]
Random search for hyper-parameter optimization (2012) J. Bergstra and Y. Bengio [pdf]

额外

Sutskever, Ilya, et al. “On the importance of initialization and momentum in deep learning.” ICML (3) 28 (2013): 1139-1147. [pdf] (Momentum optimizer) 🌟🌟
Andrychowicz, Marcin, et al. “Learning to learn by gradient descent by gradient descent.” arXiv preprint arXiv:1606.04474 (2016). [pdf] (Neural Optimizer,Amazing Work) 🌟🌟🌟🌟🌟
Han, Song, Huizi Mao, and William J. Dally. “Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding.” CoRR, abs/1510.00149 2 (2015). [pdf] (ICLR best paper, new direction to make NN running fast,DeePhi Tech Startup) 🌟🌟🌟🌟🌟
Iandola, Forrest N., et al. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size.” arXiv preprint arXiv:1602.07360 (2016). [pdf] (Also a new direction to optimize NN,DeePhi Tech Startup) 🌟🌟🌟🌟
Glorat Xavier, Bengio Yoshua, et al. “Understanding the difficulty of training deep forward neural networks.” Proceedings of the thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249-256,2010. [pdf] 🌟🌟🌟🌟

Unsupervised / Generative Models

Pixel recurrent neural networks (2016), A. Oord et al. [pdf]
Improved techniques for training GANs (2016), T. Salimans et al. [pdf]
Unsupervised representation learning with deep convolutional generative adversarial networks (2015), A. Radford et al. [pdf]
DRAW: A recurrent neural network for image generation (2015), K. Gregor et al. [pdf]
Generative adversarial nets (2014), I. Goodfellow et al. [pdf]
Auto-encoding variational Bayes (2013), D. Kingma and M. Welling [pdf]
Building high-level features using large scale unsupervised learning (2013), Q. Le et al. [pdf]

Convolutional Neural Network Models

Rethinking the inception architecture for computer vision (2016), C. Szegedy et al. [pdf]
Inception-v4, inception-resnet and the impact of residual connections on learning (2016), C. Szegedy et al. [pdf]
Identity Mappings in Deep Residual Networks (2016), K. He et al. [pdf]
Deep residual learning for image recognition (2016), K. He et al. [pdf]
Spatial transformer network (2015), M. Jaderberg et al., [pdf]
Going deeper with convolutions (2015), C. Szegedy et al. [pdf]
Very deep convolutional networks for large-scale image recognition (2014), K. Simonyan and A. Zisserman [pdf]
Return of the devil in the details: delving deep into convolutional nets (2014), K. Chatfield et al. [pdf]
OverFeat: Integrated recognition, localization and detection using convolutional networks (2013), P. Sermanet et al. [pdf]
Maxout networks (2013), I. Goodfellow et al. [pdf]
Network in network (2013), M. Lin et al. [pdf]
ImageNet classification with deep convolutional neural networks (2012), A. Krizhevsky et al. [pdf]

Image: Segmentation / Object Detection

You only look once: Unified, real-time object detection (2016), J. Redmon et al. [pdf]
Fully convolutional networks for semantic segmentation (2015), J. Long et al. [pdf]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015), S. Ren et al. [pdf]
Fast R-CNN (2015), R. Girshick [pdf]
Rich feature hierarchies for accurate object detection and semantic segmentation (2014), R. Girshick et al. [pdf]
Spatial pyramid pooling in deep convolutional networks for visual recognition (2014), K. He et al. [pdf]
Semantic image segmentation with deep convolutional nets and fully connected CRFs, L. Chen et al. [pdf]
Learning hierarchical features for scene labeling (2013), C. Farabet et al. [pdf]

Image / Video / Etc

Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al. [pdf]
A neural algorithm of artistic style (2015), L. Gatys et al. [pdf]
Deep visual-semantic alignments for generating image descriptions (2015), A. Karpathy and L. Fei-Fei [pdf]
Show, attend and tell: Neural image caption generation with visual attention (2015), K. Xu et al. [pdf]
Show and tell: A neural image caption generator (2015), O. Vinyals et al. [pdf]
Long-term recurrent convolutional networks for visual recognition and description (2015), J. Donahue et al. [pdf]
VQA: Visual question answering (2015), S. Antol et al. [pdf]
DeepFace: Closing the gap to human-level performance in face verification (2014), Y. Taigman et al. [pdf]:
Large-scale video classification with convolutional neural networks (2014), A. Karpathy et al. [pdf]
Two-stream convolutional networks for action recognition in videos (2014), K. Simonyan et al. [pdf]
3D convolutional neural networks for human action recognition (2013), S. Ji et al. [pdf]

Natural Language Processing / RNNs

Neural Architectures for Named Entity Recognition (2016), G. Lample et al. [pdf]
Exploring the limits of language modeling (2016), R. Jozefowicz et al. [pdf]
Teaching machines to read and comprehend (2015), K. Hermann et al. [pdf]
Effective approaches to attention-based neural machine translation (2015), M. Luong et al. [pdf]
Conditional random fields as recurrent neural networks (2015), S. Zheng and S. Jayasumana. [pdf]
Memory networks (2014), J. Weston et al. [pdf]
Neural turing machines (2014), A. Graves et al. [pdf]
Neural machine translation by jointly learning to align and translate (2014), D. Bahdanau et al. [pdf]
Sequence to sequence learning with neural networks (2014), I. Sutskever et al. [pdf]
Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014), K. Cho et al. [pdf]
A convolutional neural network for modeling sentences (2014), N. Kalchbrenner et al. [pdf]
Convolutional neural networks for sentence classification (2014), Y. Kim [pdf]
Glove: Global vectors for word representation (2014), J. Pennington et al. [pdf]
Distributed representations of sentences and documents (2014), Q. Le and T. Mikolov [pdf]
Distributed representations of words and phrases and their compositionality (2013), T. Mikolov et al. [pdf]
Efficient estimation of word representations in vector space (2013), T. Mikolov et al. [pdf]
Recursive deep models for semantic compositionality over a sentiment treebank (2013), R. Socher et al. [pdf]
Generating sequences with recurrent neural networks (2013), A. Graves. [pdf]

Speech / Other Domain

End-to-end attention-based large vocabulary speech recognition (2016), D. Bahdanau et al. [pdf]
Deep speech 2: End-to-end speech recognition in English and Mandarin (2015), D. Amodei et al. [pdf]
Speech recognition with deep recurrent neural networks (2013), A. Graves [pdf]
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012), G. Hinton et al. [pdf]
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition (2012) G. Dahl et al. [pdf]
Acoustic modeling using deep belief networks (2012), A. Mohamed et al. [pdf]

Reinforcement Learning / Robotics

End-to-end training of deep visuomotor policies (2016), S. Levine et al. [pdf]
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection (2016), S. Levine et al. [pdf]
Asynchronous methods for deep reinforcement learning (2016), V. Mnih et al. [pdf]
Deep Reinforcement Learning with Double Q-Learning (2016), H. Hasselt et al. [pdf]
Mastering the game of Go with deep neural networks and tree search (2016), D. Silver et al. [pdf]
Continuous control with deep reinforcement learning (2015), T. Lillicrap et al. [pdf]
Human-level control through deep reinforcement learning (2015), V. Mnih et al. [pdf]
Deep learning for detecting robotic grasps (2015), I. Lenz et al. [pdf]
Playing atari with deep reinforcement learning (2013), V. Mnih et al. [pdf])

应用

OCR

Detecting Text in Natural Image with Connectionist Text Proposal Network pdf

深度学习“里程碑”论文

https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap

1.1 Survey

[1] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.” Nature 521.7553 (2015): 436-444. [pdf] (Three Giants’ Survey) 🌟🌟🌟🌟🌟

机器学习被广泛用于图像、语音、推荐。传统机器学习需要小心的构造特征。
深度学习是一种 Represention learning，可以自动构建特征。它在一层构建特征，然后输出给更抽象的下一层。
以图像为例，第1层检测不同角度和位置的 edge 是否存在，而第2层能检测图案，第3层可能把图案组合起来，如此等等。
深度学习在解决以前的一些难题上是重大突破，它可以很好的发现高维复杂的特征，所以广泛用于科学、商业、政府。在 image recognition、speech recognition 等方面都打破了记录，还在药品分子建模、脑科学、DNA开合预测方便击败其它机器学习模型。在NLP领域，用于 topic classification, sentiment analysis, question answering，translation

有监督学习：设定一个优化目标（得分或距离），然后用 SGD 去优化参数。

现代的神经元基本都用 ReLU，它训练更快，历史上用 sigmoid 更多。

以前人们认为局部最优是一个问题，近期人们发现更大的问题其实是 saddle points

NLP领域

2006年加拿大CIFAR 团队引入了无监督过程，整个系统可以用 backpropagation 做 fine-tune
2009年，在GPU上训练语音识别任务，加快了10～20倍，打破了记录

CV领域。filter layer 可以用来提取特征。pooling layer 也需要，因为 filter layer 提取的特征可能差别很大，

词向量之前，普遍的做法是 n-gram

RNNs 是很强大的动态系统，但是训练困难，因为面临梯度爆炸和梯度消失的问题

翻译模型：先用一个 encoder 把整个句子意义提取成一个向量，然后用 decoder 把这个向量翻译成另一种语言

1.2 Deep Belief Network(DBN)(Milestone of Deep Learning Eve)

[2] Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. “A fast learning algorithm for deep belief nets.” Neural computation 18.7 (2006): 1527-1554. [pdf](Deep Learning Eve) 🌟🌟🌟

[3] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. “Reducing the dimensionality of data with neural networks.” Science 313.5786 (2006): 504-507. [pdf] (Milestone, Show the promise of deep learning) 🌟🌟🌟

借助多层神经网络，高维数据可以转为低维数据。
在这个“autoencode” 中，可以用 gradient descent 来优化，前提是 initial weights 已经接近最优解了。

本文介绍了一种初始化 weights 的方案，使得 autoencoder 表现效果比 PCA 好很多

1.3 ImageNet Evolution（Deep Learning broke out from here）

[4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012. [pdf] (AlexNet, Deep Learning Breakthrough) 🌟🌟🌟🌟🌟

[5] Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014). [pdf] (VGGNet,Neural Networks become very deep!) 🌟🌟🌟

[6] Szegedy, Christian, et al. “Going deeper with convolutions.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [pdf] (GoogLeNet) 🌟🌟🌟

[7] He, Kaiming, et al. “Deep residual learning for image recognition.” arXiv preprint arXiv:1512.03385 (2015). [pdf] (ResNet,Very very deep networks, CVPR best paper) 🌟🌟🌟🌟🌟

1.4 Speech Recognition Evolution

[8] Hinton, Geoffrey, et al. “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.” IEEE Signal Processing Magazine 29.6 (2012): 82-97. [pdf] (Breakthrough in speech recognition)🌟🌟🌟🌟

[9] Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. “Speech recognition with deep recurrent neural networks.” 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013. [pdf] (RNN)🌟🌟🌟

[10] Graves, Alex, and Navdeep Jaitly. “Towards End-To-End Speech Recognition with Recurrent Neural Networks.” ICML. Vol. 14. 2014. [pdf]🌟🌟🌟

[11] Sak, Haşim, et al. “Fast and accurate recurrent neural network acoustic models for speech recognition.” arXiv preprint arXiv:1507.06947 (2015). [pdf] (Google Speech Recognition System) 🌟🌟🌟

[12] Amodei, Dario, et al. “Deep speech 2: End-to-end speech recognition in english and mandarin.” arXiv preprint arXiv:1512.02595 (2015). [pdf] (Baidu Speech Recognition System) 🌟🌟🌟🌟

[13] W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig “Achieving Human Parity in Conversational Speech Recognition.” arXiv preprint arXiv:1610.05256 (2016). [pdf] (State-of-the-art in speech recognition, Microsoft) 🌟🌟🌟🌟

After reading above papers, you will have a basic understanding of the Deep Learning history, the basic architectures of Deep Learning model(including CNN, RNN, LSTM) and how deep learning can be applied to image and speech recognition issues. The following papers will take you in-depth understanding of the Deep Learning method, Deep Learning in different areas of application and the frontiers. I suggest that you can choose the following papers based on your interests and research direction.

2 Deep Learning Method

2.3 Unsupervised Learning / Deep Generative Model

[28] Le, Quoc V. “Building high-level features using large scale unsupervised learning.” 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013. [pdf] (Milestone, Andrew Ng, Google Brain Project, Cat) 🌟🌟🌟🌟

[29] Kingma, Diederik P., and Max Welling. “Auto-encoding variational bayes.” arXiv preprint arXiv:1312.6114 (2013). [pdf] (VAE) 🌟🌟🌟🌟

[30] Goodfellow, Ian, et al. “Generative adversarial nets.” Advances in Neural Information Processing Systems. 2014. [pdf] (GAN,super cool idea) 🌟🌟🌟🌟🌟

[31] Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015). [pdf] (DCGAN) 🌟🌟🌟🌟

[32] Gregor, Karol, et al. “DRAW: A recurrent neural network for image generation.” arXiv preprint arXiv:1502.04623 (2015). [pdf] (VAE with attention, outstanding work) 🌟🌟🌟🌟🌟

[33] Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. “Pixel recurrent neural networks.” arXiv preprint arXiv:1601.06759 (2016). [pdf] (PixelRNN) 🌟🌟🌟🌟

[34] Oord, Aaron van den, et al. “Conditional image generation with PixelCNN decoders.” arXiv preprint arXiv:1606.05328 (2016). [pdf] (PixelCNN) 🌟🌟🌟🌟

[34] S. Mehri et al., “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.” arXiv preprint arXiv:1612.07837 (2016). [pdf] 🌟🌟🌟🌟🌟

2.4 RNN / Sequence-to-Sequence Model

[35] Graves, Alex. “Generating sequences with recurrent neural networks.” arXiv preprint arXiv:1308.0850 (2013). [pdf] (LSTM, very nice generating result, show the power of RNN) 🌟🌟🌟🌟

[36] Cho, Kyunghyun, et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation.” arXiv preprint arXiv:1406.1078 (2014). [pdf] (First Seq-to-Seq Paper) 🌟🌟🌟🌟

[37] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to sequence learning with neural networks.” Advances in neural information processing systems. 2014. [pdf] (Outstanding Work) 🌟🌟🌟🌟🌟

[38] Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv preprint arXiv:1409.0473 (2014). [pdf] 🌟🌟🌟🌟

[39] Vinyals, Oriol, and Quoc Le. “A neural conversational model.” arXiv preprint arXiv:1506.05869 (2015). [pdf] (Seq-to-Seq on Chatbot) 🌟🌟🌟

2.5 Neural Turing Machine

[40] Graves, Alex, Greg Wayne, and Ivo Danihelka. “Neural turing machines.” arXiv preprint arXiv:1410.5401 (2014). [pdf] (Basic Prototype of Future Computer) 🌟🌟🌟🌟🌟

[41] Zaremba, Wojciech, and Ilya Sutskever. “Reinforcement learning neural Turing machines.” arXiv preprint arXiv:1505.00521 362 (2015). [pdf] 🌟🌟🌟

[42] Weston, Jason, Sumit Chopra, and Antoine Bordes. “Memory networks.” arXiv preprint arXiv:1410.3916 (2014). [pdf] 🌟🌟🌟

[43] Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. “End-to-end memory networks.” Advances in neural information processing systems. 2015. [pdf] 🌟🌟🌟🌟

[44] Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. “Pointer networks.” Advances in Neural Information Processing Systems. 2015. [pdf] 🌟🌟🌟🌟

[45] Graves, Alex, et al. “Hybrid computing using a neural network with dynamic external memory.” Nature (2016). [pdf] (Milestone,combine above papers’ ideas) 🌟🌟🌟🌟🌟

2.6 Deep Reinforcement Learning

[46] Mnih, Volodymyr, et al. “Playing atari with deep reinforcement learning.” arXiv preprint arXiv:1312.5602 (2013). [pdf]) (First Paper named deep reinforcement learning) 🌟🌟🌟🌟

[47] Mnih, Volodymyr, et al. “Human-level control through deep reinforcement learning.” Nature 518.7540 (2015): 529-533. [pdf] (Milestone) 🌟🌟🌟🌟🌟

[48] Wang, Ziyu, Nando de Freitas, and Marc Lanctot. “Dueling network architectures for deep reinforcement learning.” arXiv preprint arXiv:1511.06581 (2015). [pdf] (ICLR best paper,great idea) 🌟🌟🌟🌟

[49] Mnih, Volodymyr, et al. “Asynchronous methods for deep reinforcement learning.” arXiv preprint arXiv:1602.01783 (2016). [pdf] (State-of-the-art method) 🌟🌟🌟🌟🌟

[50] Lillicrap, Timothy P., et al. “Continuous control with deep reinforcement learning.” arXiv preprint arXiv:1509.02971 (2015). [pdf] (DDPG) 🌟🌟🌟🌟

[51] Gu, Shixiang, et al. “Continuous Deep Q-Learning with Model-based Acceleration.” arXiv preprint arXiv:1603.00748 (2016). [pdf] (NAF) 🌟🌟🌟🌟

[52] Schulman, John, et al. “Trust region policy optimization.” CoRR, abs/1502.05477 (2015). [pdf] (TRPO) 🌟🌟🌟🌟

[53] Silver, David, et al. “Mastering the game of Go with deep neural networks and tree search.” Nature 529.7587 (2016): 484-489. [pdf] (AlphaGo) 🌟🌟🌟🌟🌟

2.7 Deep Transfer Learning / Lifelong Learning / especially for RL

[54] Bengio, Yoshua. “Deep Learning of Representations for Unsupervised and Transfer Learning.” ICML Unsupervised and Transfer Learning 27 (2012): 17-36. [pdf] (A Tutorial) 🌟🌟🌟

[55] Silver, Daniel L., Qiang Yang, and Lianghao Li. “Lifelong Machine Learning Systems: Beyond Learning Algorithms.” AAAI Spring Symposium: Lifelong Machine Learning. 2013. [pdf] (A brief discussion about lifelong learning) 🌟🌟🌟

[56] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015). [pdf] (Godfather’s Work) 🌟🌟🌟🌟

[57] Rusu, Andrei A., et al. “Policy distillation.” arXiv preprint arXiv:1511.06295 (2015). [pdf] (RL domain) 🌟🌟🌟

[58] Parisotto, Emilio, Jimmy Lei Ba, and Ruslan Salakhutdinov. “Actor-mimic: Deep multitask and transfer reinforcement learning.” arXiv preprint arXiv:1511.06342 (2015). [pdf] (RL domain) 🌟🌟🌟

[59] Rusu, Andrei A., et al. “Progressive neural networks.” arXiv preprint arXiv:1606.04671 (2016). [pdf] (Outstanding Work, A novel idea) 🌟🌟🌟🌟🌟

2.8 One Shot Deep Learning

[60] Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. “Human-level concept learning through probabilistic program induction.” Science 350.6266 (2015): 1332-1338. [pdf] (No Deep Learning,but worth reading) 🌟🌟🌟🌟🌟

[61] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural Networks for One-shot Image Recognition.”(2015) [pdf] 🌟🌟🌟

[62] Santoro, Adam, et al. “One-shot Learning with Memory-Augmented Neural Networks.” arXiv preprint arXiv:1605.06065 (2016). [pdf] (A basic step to one shot learning) 🌟🌟🌟🌟

[63] Vinyals, Oriol, et al. “Matching Networks for One Shot Learning.” arXiv preprint arXiv:1606.04080 (2016). [pdf] 🌟🌟🌟

[64] Hariharan, Bharath, and Ross Girshick. “Low-shot visual object recognition.” arXiv preprint arXiv:1606.02819 (2016). [pdf] (A step to large data) 🌟🌟🌟🌟

3 Applications

3.1 NLP(Natural Language Processing)

[1] Antoine Bordes, et al. “Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing.” AISTATS(2012) [pdf] 🌟🌟🌟🌟

[2] Mikolov, et al. “Distributed representations of words and phrases and their compositionality.” ANIPS(2013): 3111-3119 [pdf] (word2vec) 🌟🌟🌟

[3] Sutskever, et al. ““Sequence to sequence learning with neural networks.” ANIPS(2014) [pdf] 🌟🌟🌟

[4] Ankit Kumar, et al. ““Ask Me Anything: Dynamic Memory Networks for Natural Language Processing.” arXiv preprint arXiv:1506.07285(2015) [pdf] 🌟🌟🌟🌟

[5] Yoon Kim, et al. “Character-Aware Neural Language Models.” NIPS(2015) arXiv preprint arXiv:1508.06615(2015) [pdf] 🌟🌟🌟🌟

[6] Jason Weston, et al. “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks.” arXiv preprint arXiv:1502.05698(2015) [pdf] (bAbI tasks) 🌟🌟🌟

[7] Karl Moritz Hermann, et al. “Teaching Machines to Read and Comprehend.” arXiv preprint arXiv:1506.03340(2015) [pdf] (CNN/DailyMail cloze style questions) 🌟🌟

[8] Alexis Conneau, et al. “Very Deep Convolutional Networks for Natural Language Processing.” arXiv preprint arXiv:1606.01781(2016) [pdf] (state-of-the-art in text classification) 🌟🌟🌟

[9] Armand Joulin, et al. “Bag of Tricks for Efficient Text Classification.” arXiv preprint arXiv:1607.01759(2016) [pdf] (slightly worse than state-of-the-art, but a lot faster) 🌟🌟🌟

3.2 Object Detection

[1] Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. “Deep neural networks for object detection.” Advances in Neural Information Processing Systems. 2013. [pdf] 🌟🌟🌟

[2] Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. [pdf] (RCNN) 🌟🌟🌟🌟🌟

[3] He, Kaiming, et al. “Spatial pyramid pooling in deep convolutional networks for visual recognition.” European Conference on Computer Vision. Springer International Publishing, 2014. [pdf] (SPPNet) 🌟🌟🌟🌟

[4] Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE International Conference on Computer Vision. 2015. [pdf] 🌟🌟🌟🌟

[5] Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015. [pdf] 🌟🌟🌟🌟

[6] Redmon, Joseph, et al. “You only look once: Unified, real-time object detection.” arXiv preprint arXiv:1506.02640 (2015). [pdf] (YOLO,Oustanding Work, really practical) 🌟🌟🌟🌟🌟

[7] Liu, Wei, et al. “SSD: Single Shot MultiBox Detector.” arXiv preprint arXiv:1512.02325 (2015). [pdf] 🌟🌟🌟

[8] Dai, Jifeng, et al. “R-FCN: Object Detection via Region-based Fully Convolutional Networks.” arXiv preprint arXiv:1605.06409 (2016). [pdf] 🌟🌟🌟🌟

[9] He, Gkioxari, et al. “Mask R-CNN” arXiv preprint arXiv:1703.06870 (2017). [pdf] 🌟🌟🌟🌟

[10] Bochkovskiy, Alexey, et al. “YOLOv4: Optimal Speed and Accuracy of Object Detection.” arXiv preprint arXiv:2004.10934 (2020). [pdf] 🌟🌟🌟🌟

[11] Tan, Mingxing, et al. “EfficientDet: Scalable and Efficient Object Detection.” arXiv preprint arXiv:1911.09070 (2019). [pdf] 🌟🌟🌟🌟🌟

3.3 Visual Tracking

[1] Wang, Naiyan, and Dit-Yan Yeung. “Learning a deep compact image representation for visual tracking.” Advances in neural information processing systems. 2013. [pdf] (First Paper to do visual tracking using Deep Learning,DLT Tracker) 🌟🌟🌟

[2] Wang, Naiyan, et al. “Transferring rich feature hierarchies for robust visual tracking.” arXiv preprint arXiv:1501.04587 (2015). [pdf] (SO-DLT) 🌟🌟🌟🌟

[3] Wang, Lijun, et al. “Visual tracking with fully convolutional networks.” Proceedings of the IEEE International Conference on Computer Vision. 2015. [pdf] (FCNT) 🌟🌟🌟🌟

[4] Held, David, Sebastian Thrun, and Silvio Savarese. “Learning to Track at 100 FPS with Deep Regression Networks.” arXiv preprint arXiv:1604.01802 (2016). [pdf] (GOTURN,Really fast as a deep learning method,but still far behind un-deep-learning methods) 🌟🌟🌟🌟

[5] Bertinetto, Luca, et al. “Fully-Convolutional Siamese Networks for Object Tracking.” arXiv preprint arXiv:1606.09549 (2016). [pdf] (SiameseFC,New state-of-the-art for real-time object tracking) 🌟🌟🌟🌟

[6] Martin Danelljan, Andreas Robinson, Fahad Khan, Michael Felsberg. “Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking.” ECCV (2016) [pdf] (C-COT) 🌟🌟🌟🌟

[7] Nam, Hyeonseob, Mooyeol Baek, and Bohyung Han. “Modeling and Propagating CNNs in a Tree Structure for Visual Tracking.” arXiv preprint arXiv:1608.07242 (2016). [pdf] (VOT2016 Winner,TCNN) 🌟🌟🌟🌟

3.4 Image Caption

[1] Farhadi,Ali,etal. “Every picture tells a story: Generating sentences from images”. In Computer VisionECCV 2010. Springer Berlin Heidelberg:15-29, 2010. [pdf] 🌟🌟🌟

[2] Kulkarni, Girish, et al. “Baby talk: Understanding and generating image descriptions”. In Proceedings of the 24th CVPR, 2011. [pdf]🌟🌟🌟🌟

[3] Vinyals, Oriol, et al. “Show and tell: A neural image caption generator”. In arXiv preprint arXiv:1411.4555, 2014. [pdf]🌟🌟🌟

[4] Donahue, Jeff, et al. “Long-term recurrent convolutional networks for visual recognition and description”. In arXiv preprint arXiv:1411.4389 ,2014. [pdf]

[5] Karpathy, Andrej, and Li Fei-Fei. “Deep visual-semantic alignments for generating image descriptions”. In arXiv preprint arXiv:1412.2306, 2014. [pdf]🌟🌟🌟🌟🌟

[6] Karpathy, Andrej, Armand Joulin, and Fei Fei F. Li. “Deep fragment embeddings for bidirectional image sentence mapping”. In Advances in neural information processing systems, 2014. [pdf]🌟🌟🌟🌟

[7] Fang, Hao, et al. “From captions to visual concepts and back”. In arXiv preprint arXiv:1411.4952, 2014. [pdf]🌟🌟🌟🌟🌟

[8] Chen, Xinlei, and C. Lawrence Zitnick. “Learning a recurrent visual representation for image caption generation”. In arXiv preprint arXiv:1411.5654, 2014. [pdf]🌟🌟🌟🌟

[9] Mao, Junhua, et al. “Deep captioning with multimodal recurrent neural networks (m-rnn)”. In arXiv preprint arXiv:1412.6632, 2014. [pdf]🌟🌟🌟

[10] Xu, Kelvin, et al. “Show, attend and tell: Neural image caption generation with visual attention”. In arXiv preprint arXiv:1502.03044, 2015. [pdf]🌟🌟🌟🌟🌟

3.5 Machine Translation

Some milestone papers are listed in RNN / Seq-to-Seq topic.

[1] Luong, Minh-Thang, et al. “Addressing the rare word problem in neural machine translation.” arXiv preprint arXiv:1410.8206 (2014). [pdf] 🌟🌟🌟🌟

[2] Sennrich, et al. “Neural Machine Translation of Rare Words with Subword Units”. In arXiv preprint arXiv:1508.07909, 2015. [pdf]🌟🌟🌟

[3] Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. “Effective approaches to attention-based neural machine translation.” arXiv preprint arXiv:1508.04025 (2015). [pdf] 🌟🌟🌟🌟

[4] Chung, et al. “A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation”. In arXiv preprint arXiv:1603.06147, 2016. [pdf]🌟🌟

[5] Lee, et al. “Fully Character-Level Neural Machine Translation without Explicit Segmentation”. In arXiv preprint arXiv:1610.03017, 2016. [pdf]🌟🌟🌟🌟🌟

[6] Wu, Schuster, Chen, Le, et al. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”. In arXiv preprint arXiv:1609.08144v2, 2016. [pdf] (Milestone) 🌟🌟🌟🌟

3.6 Robotics

[1] Koutník, Jan, et al. “Evolving large-scale neural networks for vision-based reinforcement learning.” Proceedings of the 15th annual conference on Genetic and evolutionary computation. ACM, 2013. [pdf] 🌟🌟🌟

[2] Levine, Sergey, et al. “End-to-end training of deep visuomotor policies.” Journal of Machine Learning Research 17.39 (2016): 1-40. [pdf] 🌟🌟🌟🌟🌟

[3] Pinto, Lerrel, and Abhinav Gupta. “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours.” arXiv preprint arXiv:1509.06825 (2015). [pdf] 🌟🌟🌟

[4] Levine, Sergey, et al. “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection.” arXiv preprint arXiv:1603.02199 (2016). [pdf] 🌟🌟🌟🌟

[5] Zhu, Yuke, et al. “Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning.” arXiv preprint arXiv:1609.05143 (2016). [pdf] 🌟🌟🌟🌟

[6] Yahya, Ali, et al. “Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search.” arXiv preprint arXiv:1610.00673 (2016). [pdf] 🌟🌟🌟🌟

[7] Gu, Shixiang, et al. “Deep Reinforcement Learning for Robotic Manipulation.” arXiv preprint arXiv:1610.00633 (2016). [pdf] 🌟🌟🌟🌟

[8] A Rusu, M Vecerik, Thomas Rothörl, N Heess, R Pascanu, R Hadsell.”Sim-to-Real Robot Learning from Pixels with Progressive Nets.” arXiv preprint arXiv:1610.04286 (2016). [pdf] 🌟🌟🌟🌟

[9] Mirowski, Piotr, et al. “Learning to navigate in complex environments.” arXiv preprint arXiv:1611.03673 (2016). [pdf] 🌟🌟🌟🌟

3.7 Art

[1] Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). “Inceptionism: Going Deeper into Neural Networks”. Google Research. [html] (Deep Dream) 🌟🌟🌟🌟

[2] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. “A neural algorithm of artistic style.” arXiv preprint arXiv:1508.06576 (2015). [pdf] (Outstanding Work, most successful method currently) 🌟🌟🌟🌟🌟

[3] Zhu, Jun-Yan, et al. “Generative Visual Manipulation on the Natural Image Manifold.” European Conference on Computer Vision. Springer International Publishing, 2016. [pdf] (iGAN) 🌟🌟🌟🌟

[4] Champandard, Alex J. “Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks.” arXiv preprint arXiv:1603.01768 (2016). [pdf] (Neural Doodle) 🌟🌟🌟🌟

[5] Zhang, Richard, Phillip Isola, and Alexei A. Efros. “Colorful Image Colorization.” arXiv preprint arXiv:1603.08511 (2016). [pdf] 🌟🌟🌟🌟

[6] Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. “Perceptual losses for real-time style transfer and super-resolution.” arXiv preprint arXiv:1603.08155 (2016). [pdf] 🌟🌟🌟🌟

[7] Vincent Dumoulin, Jonathon Shlens and Manjunath Kudlur. “A learned representation for artistic style.” arXiv preprint arXiv:1610.07629 (2016). [pdf] 🌟🌟🌟🌟

[8] Gatys, Leon and Ecker, et al.”Controlling Perceptual Factors in Neural Style Transfer.” arXiv preprint arXiv:1611.07865 (2016). [pdf] (control style transfer over spatial location,colour information and across spatial scale)🌟🌟🌟🌟

[9] Ulyanov, Dmitry and Lebedev, Vadim, et al. “Texture Networks: Feed-forward Synthesis of Textures and Stylized Images.” arXiv preprint arXiv:1603.03417(2016). [pdf] (texture generation and style transfer) 🌟🌟🌟🌟

[10] Yijun Li, Ming-Yu Liu ,Xueting Li, Ming-Hsuan Yang,Jan Kautz (NVIDIA). “A Closed-form Solution to Photorealistic Image Stylization.” arXiv preprint arXiv:1802.06474(2018). [pdf] (Very fast and ultra realistic style transfer) 🌟🌟🌟🌟

3.8 Object Segmentation

[1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation.” in CVPR, 2015. [pdf] 🌟🌟🌟🌟🌟

[2] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. “Semantic image segmentation with deep convolutional nets and fully connected crfs.” In ICLR, 2015. [pdf] 🌟🌟🌟🌟🌟

[3] Pinheiro, P.O., Collobert, R., Dollar, P. “Learning to segment object candidates.” In: NIPS. 2015. [pdf] 🌟🌟🌟🌟

[4] Dai, J., He, K., Sun, J. “Instance-aware semantic segmentation via multi-task network cascades.” in CVPR. 2016 [pdf] 🌟🌟🌟

[5] Dai, J., He, K., Sun, J. “Instance-sensitive Fully Convolutional Networks.” arXiv preprint arXiv:1603.08678 (2016). [pdf] 🌟🌟🌟