Skypoint Observation Deck looking north © Phalinn Ooi on Wikimedia Commons

DICTA 2016 Conference

Tutorial

Title:

Deep learning and its applications in computer vision.

Abstract:

Deep learning has become the most popular technology in computer vision. And it has been applied to various research areas: from image-level recognition to pixel-level labeling, from image modeling only to visual-language joint modeling, and from supervised learning to unsupervised learning. This tutorial will cover the basic concept and models in the deep learning. Its applications in computer vision will be discussed. There are four parts of this tutorial. In the first part, the basic concept of deep learning and the convolutional neural network will be introduced. Then some recent works using convolutional neural network for image classification and retrieval will be discussed. In the second part of this tutorial, the applications of deep learning in semantic segmentation or pixel labeling will be introduced. Specifically, this part includes the introduction of fully convolutional neural network for dense prediction and the way of transferring a pre-trained network trained for image classification to the task of semantic segmentation. The third part will discuss the joint modeling of the image and language. This is a new area in computer vision and we have seen an upsurge of interest along this direction. In particular, this part will discuss image captioning and Visual Question Answering (VQA), where image captioning requirements of the machine to describe the image using human readable sentences while the VQA asks a machine to answer language-based questions based on the visual information. One key factor to makinge deep learning successful is the availability of very large human annotated datasets like ImageNet. However, the downside of the most of these approaches is the need for expensive labeling. In the last part of this tutorial, the unsupervised feature learning, a methodology to alleviate the burden of expensive labeling, will be introduced. This part will focus on recent works that train the CNN in an unsupervised manner for different vision tasks.

Presenters:

Part 1: Lingqiao Liu (University of Adelaide, Australia)

Part 2: Guosheng Lin (University of Adelaide, Australia)

Part 3: Qi Wu (University of Adelaide, Australia)

Part 4: Vijay Kumar (University of Adelaide, Australia)

Location:

G42 1.02, Gold Coast Campus of Griffith University.

Travel Information:

1. Transportation to Griffith: https://www.griffith.edu.au/about-griffith/campuses-and-facilities/gold-coast/transport-and-parking.

2. Parking option: G55, $6 per day for visitors, pay by credit card.

3. Campus map: https://www162.griffith.edu.au/public/campus-maps/visitor-parking-map-gcc.pdf.