Wednesday, 8 March 2017

The Matrix Encoded: Matrices as first-class citizen

One thing as popular as hydro in the universe, is vector. Most mathematical and data analytical analysis asks for this fundamental structure of the world. PCA, ICA, SVM, GMM, t-SNE, neural nets to name a few, all implicitly assume vector representation of data. The power of vector should not be underestimated. The so-called distributed representation, which is rocking the machine learning and cognitive science worlds, is nothing but vector representation of thought (in Geoff Hinton's words, referring to Skip-Thought vectors).

The current love for distributed representation of things (yes, THINGS, as in Internet-of-Things) has gone really far. There is a huge line of work on [X]2vec, where you can substitute [X] by [word], [sentence],[paragraph], [document], [node], [tweet], [edge] and [subgraph]. I won't be surprised to see thing2vec very soon.

But can you really compress structured things like sentences into vectors? I bet you could, given that the vector is long enough. After all, although the space of all possible sentences in a language is theoretically infinite, the majority of language usage is tightly packed, and in practice the sentence space can be mapped into a linear space of thousands of dimensions.

However, compressing a data begs a question of decompressing it, e.g., to generate a target sentence in another language, as in machine translation. Surprisingly, the simplistic seq2seq trick works well in translation. But since the linguistic structures have been lost to vectorization, language generation from vector will be more difficult. A better way is to treat each sentence as a matrix, where each column is a word embedding. This gives rise to the attention scheme in machine translation, which turns out to a huge success, as in the current Google's Neural Machine Translation system.

Indeed, it has been well-recognized that vectors alone are not enough to memorize long-distant events. The idea is to augment vector-based RNN with an external memory, giving rise to the recent  Memory-augmented RNNs. The external memory is nothing but a matrix.

Enter the world of matrices

Matrices in vector space are used for linear transformation, that is, to map a vector from one space, to another vector in a different space. As a mathematical object, matrices have their own life, just like vectors, e.g., matrix calculus.

In NLP, it has been suggested that noun is a vector and adjective is really a matrix. The idea is cute, because adjective "acts" on noun, which will transform the meaning of the noun.

Matrices also form a basis for parameterization of neural layers. Hence a space of multilayered neural nets is a joint space of matrices.

Our recent paper titled "Matrix-centric neural networks" (co-authored with my PhD student, Kien Do and my boss, Professor Svetha Venkatesh) pushes the line of matrix thinking to the extreme. That is, matrices are fist-class citizen. They are no longer a collection of vectors. The input, hidden layers, and the output are all matrices. The RNNs is now a model of a sequence of input matrices and a sequence of output matrices. The internal memory (as in LSTM) is also a matrix, making it resemble the Memory-augmented RNNs.

To rephrase Geoff Hinton, we want a matrix representation of thought. Somehow, our neocortex looks like a matrix -- it is really a huge thin sheet of grey matter.

May be one day we will live in the space created by matrices.


Saturday, 25 February 2017

Column bundle: a single model for multiple multipe

Supervised machine learning has a few recurring concepts: data instance, feature set and label. Often, a data instance has one feature set and one label. But there are situations when you have multi-[X], where X = instance, view (feature subset), or label. For example, in multiple instance learning, you have more then one instance, but only one label.

Things are getting interesting when you have multiple instances, multiple views and multiple labels at the same time. For example, a video clip can be considered as a set of video segments (instances), each of which has views (audio, visual frames and may be textual subtitle), and the clip has many tags (labels).

Enter Column Bundle (CLB), the latest invention in my group.

CLB makes use of the concept of columns in neocortex. In brain, neurons are arranged in thin mini-columns, each of which is thought to cover a small sensory area called receptive field. Mini-columns are bundled into super-columns, which are inter-connected to form the entire neocortex.  In our previous work, this cute concept has been exploited to build a network of columns for collective classification. For CLB, columns are arranged in a special way:

  • There is one central column that serves as the main processing unit (CPU).
  • There are input mini-columns to read inputs for multiple parts (Input)
  • There are output mini-columns to generate labels (Output)
  • Mini-columns are only connected to the central column.
Columns are recurrent neural nets with skip-connections (e.g., Highway Net, Residual Net or LSTM). Input parts can be instances, or views. The difference is only at the feature mapping: different views are first mapped into the same space.

In a sense, it looks like a neural computer without a RAM.


Sunday, 19 February 2017

Living in the future: AI for healthcare

In a not-so-distant future, it will be a routine to chat to a machine and receive medical advice from it. In fact, many of us have done this - seeking advice from healthcare sites, asking questions online and being recommended for known answers by algorithms. The current wave of AI will only accelerate this trend.

Medicine is by large a discipline of information, where the knowledge power is very asymmetric between doctors and patients. Doctors do the job well because humans are all alike, so that cases can be documented in medical textbooks and findings can be shared in journal articles and validated by others. In other words, medical knowledge is statistical, leading to the so-called evidence-based medicine (EBM). And this is exactly the reason why the current breed of machine learning - deep learning - will do well in majority of cases.

Predictive medicine

In Yann LeCun's words, the future of AI rests on predictive learning, which is basically an alternative way to say unsupervised learning. Technically, this is the capability to fill the missing slots. For those who are familiar with probabilistic graphical models, it is akin to computing pseudo-likelihood, or estimating values of some variables given the rest.

A significant part of medicine is inherently predictive. One is diagnosis - finding out what is happening now, and the other prognosis - figuring out what will be happening if an action (or absence of action) is done. While it is fair to say diagnosis is quite advanced, prognosis has a long way to go.

To my surprise as a machine learning practitioner, doctors are unreasonably poor at prediction into the future, especially when it comes to mental health and genomics. Doctors are, however, excellent in explaining the results after-the-fact. In machine learning's terms, their models can practically fit anything but do not generalize well. This must come from the culture of know-it-all, where medical knowledge is limited to only a handful of people, and doctors are obliged to explain what has happened to the poor patients.

Physical health

Human body is a physical (and to some extent, a statistical) system. Hence it follows physical laws. Physiological processes, in theory, can be fully understood and predictable - at least in a close environment. What are hard to predict, are the (results of) interactions with the open environment. For example, virus infection and car accidents are those hardly predictable. Hence, physical health is predictable up to an accuracy limit, beyond which computers have no hope in predicting. So don't expect the performance to be close to that we have seen in object recognition.

Mental health

Mental health is hard. No one can really tell what happens inside your brain, even if you have it opened. With hundreds of billions neurons and tens of trillions connections between them that give rise to mental processes, the complexity of the brain is beyond human reach at present. But mental health never goes alone. It goes hand-in-hand with physical health. A poor physical condition is likely to worsen a mental condition, and vice versa.

A good sign is that mental health is going computational. There is an emerging field called Computational Psychiatry. They are surprisingly open to new technological ideas.

The future

AI is also eating the healthcare stage with hundreds of startups popping up each month around the world. So what to expect in the near future within 5 years?
  • Medical imaging diagnosis. This is perhaps the most ready space due to the availability of affordable imaging options (CT-Scan, ultra-sound, fMRI, etc) and recent advances in computer vision, thanks to convolutional nets. One interesting form is microscopy imaging diagnosis since getting images from microscopes can be quite cheap. Another one is facial diagnosis -- It turns out, many diseases manifest through facial expression.
  • Medical text to be better understood. There are several types of text: doctor narrative in medical records, user-generated medical text online, social health places, and medical research articles. This field will take more time to take off, but given the high concentration of talents in NLP at present, we have a reason to hope.
  • Cheap, fast sequencing techniques. Sequencing cost has come down to a historic milestone of $1,000 recently, and we still have reasons to believe that it will go down to $100 in a not far future. For example, nanopore sequencing is emerging, and the sequencing using signal processing will be improved significantly
  • Faster and better understanding of genomics. Once the sequencing reaches a critical mass, the understanding of it will be accelerated by AI. Check out, for example, the work of this Toronto professor, Brendan Frey.
  • Clinical data sharing will remain a bottleneck for the years to come. Unless we have access to a massive clinical database, things will move very slowly in clinical settings. But machine learning will have to work in data efficiency regimes, too.

Beyond 5 years, it is far more difficult to predict. Some are still in the realm of sci-fi.
  • Automation of drug discovery. Drug chemical and biological properties will be estimated accurately by machine. The search for a drug given a desirable function will be accelerated by hundred times.
  • A full dialog system for diagnosis and treatment recommendation. You don't need to see doctor for a $100 consultation for just 10 mins. You want a thorough consultation for free.
  • M-health, with distant robotic surgery.
  • Brain-machine interfacing, where humans will rely on machine for high bandwidth communication. This idea is from my favorite technologist Elon Musk.
  • Nano chips will enter the body in millions and kill the nasty bugs, fix the damages and get out without being kicked out by the immune system. This idea is from the 2006 book The Singularity is Near by my favorite futurist Ray Kurzweil.
  • Robot doctors will be licensed, just like self-driving cars now.
  • Patients will be in control. No more know-it-all doctors. Patients will have a full knowledge of their own health. This implies that things must be explainable, and patients must be educated about their own bio & mental.

However, like everything else, it is easy to imagine than done. Don't forget that AI in Medicine (AIIM) is a very old journal, and nothing really magic has happened yet.

What we do

At PRaDA (Deakin University, Australia), we have our own share in this space. Some most recent contributions are:
  • Symbolic ICU  (2017), where we figure out a way to deal with ICU time-series, which are irregular and mostly missing. Again, the work will be in the public domain soon.
  • Matrix-LSTM  (2017) for EEG, where we capture the tensor-like nature of EEG signals over time. The work will be in the public domain soon.
  • DeepCare (2016), where we model the course of health trajectory, which is occasionally intervened at irregular time.
  • Deepr (2016), where we aim to discover explainable predictive motifs though CNN.
  • Anomaly detection  (2016), where we discover outliers in healthcare data, which is inherently mixed-type.
  • Stable risk discovery through Autoencoder (2016), where we discover structure among risk factors.
  • Generating stable prediction rules (2016), where we demonstrate that simple, and statistically stable rules can be uncovered from lots of administrative data for preterm-birth prediction at 25 weeks of gestation.
  • eNRBM  (2015): understanding the group formation of medical concepts through competitive learning and prior medical knowledge.

Thursday, 12 January 2017

On expressiveness, learnability and generalizability of deep learning

Turing machine (
It is a coincidence that Big Data and Deep Learning popped up at the same time, roughly around 2012. And it is told that data to deep learning is fuel to rockets (this line is often attributed to Andrew Ng, co-founder of Coursera and Chief Scientist at Baidu).

It is true that current deep learning flourishes as it leverages big, complex data better than existing techniques. Equipped with advances in hardware (GPU, HPC), deep learning applications are more powerful and useful than ever. However, without theoretical advances, big data might have remained a big pile of junk artifacts.

Let us examine three key principles to any learning system: expressiveness, learnability and generalizability, and see how deep learning fits in.


This requires learning system that can:

  • Represent the complexity of the world.  It was proved in early 1990s that feedforward nets are universal function approximator. It means that any function imaginable can be represented by a suitable neural network. Note that convolutional nets are also feedforward net which represents a function that maps an image to any target values.
  • Compute anything computable. Roughly the same time, it was proved that recurrent nets are Turing-complete. It says that any program written down in a standard computer can be represented by a suitable recurrent neural network (RNN). It is even suggested that Turing machines (and even human brains) are indeed RNN.
These two theoretical guarantees are powerful enough to enable any computable applications, from object recognition to video understanding to automated translation to conversational agents to automated programmers. For example, one biggest challenge set out by OpenAI is to write a program that wins all programming challenges.


But merely proving that there exists a neural net to do a job does not mean that we can find the net within a budget of time, unless there are efficient ways to do so. In a supervised learning setting, learnability means at least three things:

  • Have a correct computational graph that enables effective and efficient passing of information and gradient between inputs and outputs. Finding a near-optimal graph is the job of architecture engineering, which is rather an art than a science. This is because the space of architectures are exponentially large, if not infinite. A right graph helps at least two things: (i) essential information is captured, and (ii) information passing is much easier. For example, convolutional nets allow translation invariance, which is often seen in images, speech and signals. With parameter sharing and the pyramid structure, training signals are distributed evenly between layers, and even weak signals at each image patch can multiply, enabling easier learning. And current skip-connections allow much easier passing of information across hundreds of layers.
  • Have flexible optimizers to navigate the rugged landscape of objective functions. Complex computational graphs are generally non-convex, meaning it is usually impossible to find the global optima in limited time. Fortunately, adaptive stochastic gradient descents are fairly efficient, including Adam, AdaDelta, RMSProp, etc. They can find good local minima in less than a hundred of passes through data.
  • Have enough data to statistically cover all small variations in reality. Practically it means hundreds of thousand data points for moderate problems, and millions for complex problems. An immediate corollary is the need to have very powerful compute, which usually means lots of GPUs, RAM, time and patience.

Having a capacity to learn any function or program is not enough. The learnt program must be able to generalize to unseen data as expected. Overfitting easily occurs in modern models where millions of parameters are common. Fortunately, with lots of data, overfitting is less a problem. Also recent advances have introduced Dropout (and its cousin like Maxout, DropConnect, stochastic layers) and Batch-Norm, and they together help reduce overfitting significantly.

This is evidenced in deep nets systems that work in the wild (Self-driving cars, Google Translate/Voice, AlphaGo).

Of course, these three concepts are not enough to make deep learning work in practice. There are hundreds of models, techniques, and programming frameworks out there to make things happen.

Tuesday, 27 December 2016

Deep learning as new electronics

It is hard to imagine a modern life without electronics: radios, TVs, microwaves, mobile phones and many more gadgets. Dump or smart, they are all based on the principles of semi-conducting and electromagnetism. Now we are using these devices for granted without worrying about these underlying laws of physics.  Most people do not care about circuits that run in chips and carry out most functions of the devices.

For the past 5 years, a new breed of human-like functionalities has emerged through advances of a new field called deep learning: self-driving cars, voice command in mobile phone, translation in hundreds of language pairs and a new kind of art. In 2016, ten years after its revival, deep learning has taken over the Internet. People have used deep learning-powered products in daily life without worrying about how the underlying neural nets work.

These two fields free us from many physical and psychological constraints:

  • Electronic devices give us freedom of communication over distance, a new kind of experiences with augmented reality and many more.
  • Deep learning enables freedom from having to make tedious and incorrect decisions (e.g., driving a car), freedom of information access (personalization), of hand (e.g., voice command), of finance (automated trading), of feature extraction (through representation learning), and many more.
It is worth noting that electronics and deep learning are different in principles.
  • Electronics devices are designed with great precision for specific functions in mind. Imprecision comes from the quantum uncertainty principle and thermal fluctuations.
  • Neural nets on the other hand, are designed to learn to perform a function of its own, where data (and sometimes model) uncertainty is built in.
However, it is also striking that they are quite similar in many ways.

Super-city of interconnected simple parts

Modern electronic devices are truly super-cities built out of just few kinds of primitive building blocks. The same holds for deep neural nets:
  • Electronic primitives: resistor, capacitor, transistor, coil, diode, logic gate and switch.
  • Neural net primitives: integrate-and-fire neuron, multiplicative gating, differentiable logic gate, switch and attention module. Interestingly, one of the most recent idea is called "Highway networks", borrowing the idea that highway traffic is free of traffic lights.
These primitives are connected in graphs:
  • Electronic devices works by moving electrons in correct order and number. The force that makes them move is potential difference. A design circuit captures all necessary information.
  • In neural nets, the activation function is like the electronic current. The main difference is that the magnitude of "current" in neural nets can be learnt. A computational graph is what is needed for model execution.
Not just analogy: A two-way relationship
  • Electronics deep learning: At present, advances in electronics have given huge boost in efficiency of deep learning with GPU, TPU and other initiatives. It is interesting to see if we can learn from electronics in designing deep nets? For example, will something analogous to integrated-circuits in deep architectures?
  • Deep learning  electronics: I predict that soon the reverse will hold true: deep learning will play a great role in improving efficiency and functionalities of electronic devices. Stay tuned.

Sunday, 25 December 2016

Making a dent in machine learning, or how to play a fast ball game

Neil Lawrence had an interesting observation about the current state of machine learning, and linked it to fast ball games:
“[…] the dynamics of the game will evolve. In the long run, the right way of playing football is to position yourself intelligently and to wait for the ball to come to you. You’ll need to run up and down a bit, either to respond to how the play is evolving or to get out of the way of the scrum when it looks like it might flatten you.”
Neil Lawrence is known for his work in Gaussian Processes and is a proponent of data efficiency. He used to be professor at University of Sheffield, is now with Amazon. Apparently the strategy works. The ball has come to him.

I once heard about a professor who said he would come to top conferences just to learn what others were busy doing and tried to do something else.

I also read somewhere from a top physicist that students who applied to work with him often expressed the wish to study shiny-and-clean fields. Some other fields were too messy and seemed unsexy. The professor insisted that the messy fields were exactly the best to work on.

In "Letters to a young scientist", Edward Osborne Wilson told his life story. He spent his entire life cataloging ants since childhood, right at the time where ant ecology wasn't a shiny field. He is considered as father of biodiversity.

Wonder what to do in deep learning now?

It is an extremely fast ball game with thousands of top players. You will be either crushed with ideas being stolen weekly, or out of steam pretty quickly.

It looks like most of the low hanging fruits have been picked.

Then ask yourself, what is your unique position? What are your strengths and advantages that people do not have? Can you move faster than others? It may be by having access to data, access to expertise in the neighborhood, or borrowing angles outside the field. Sometimes digging up old ideas is highly beneficial, too.

Alternatively, just calm down, and do boring-but-important stuffs. Important problems are like the goal areas in ball games. The ball will surely come.

30 years of a Swiss army knife: Restricted Boltzmann machines

I read somewhere, but cannot recall exactly who said so, that in ancient worlds, 30 years are long enough for the new generation to settle down with a new system, regime or ideology. As there are only a few days away from 2017, I would like to look back the history of a 30-year old model which has captured my research attention for the past 10 years.

To some of you, restricted Boltzmann machine (RBM) may be a familiar name, especially for those who follow the current deep learning literature since the beginning. But RBM has also passed its prime time, so you may have heard about it in passing.

I was attracted to RBM for several reasons. When I was studying conditional random fields in 2004 and was looking for a fast way to train with arbitrary structures, Contrastive Divergence (CD) appears to be an interesting one. While CD is a generic technique, it was derived especially for RBMs. Second, RBM has "Boltzmann" in the name, which is kind of interesting, because physicists are kind of sexy :)

Needless too say, another big reason is that RBM, together with its cousin, Autoencoder are building blocks of unsupervised deep nets, which started the current revolution -- deep learning.

The greatest reason is that I think RBM is one of the most important classes of data models known to date, perhaps comparable to PCA in dimensionality-reduction  and k-means in clustering in terms of usefulness.

First introduced in 1986 by  Paul Smolensky under the name Harmonium in a classic two-volume book known as PDP (Parallel Distributed Processing), co-edited by Rumelhart and McLelland. RBMs were subsequently popularised by Geoff Hinton in the 2000s, especially in 2001 with the introduction of Contrastive Divergence (CD), and  in 2006 with the introduction of a deep version known as Deep Belief Nets (DBN).

Statistically, RBM is a probabilistic model of data, i.e., it assigns a probability (or density) to a multivariate data. Initially, RBMs are limited to binary data (known as Bernoulli-Bernoulli RBM), but subsequently extended to Gaussian data (known as Gaussian-Bernoulli RBM), and mixed-types (known as Mixed-variate RBM, or Thurstonian Boltzmann machine).


RBM is a special case of Boltzmann machine, which is in turn a special case of Markov random field. It has two layers, one for observed data, the other for latent representation. Due to its special bipartite structure, MCMC inference can be implemented in a block-wise fashion. Learning is relatively fast with CD or its Persistent version. Estimating of latent representation is very fast with a single matrix operation. RBM is also a powerful model in the sense that it can represent any distribution given enough hidden units. As a Markov random field, it has log-linear paramerization which makes it easy to incorporate a variety of domain knowledge.

With all of these advantages, RBMs have been used successfully in many applications, ranging from density modelling, feature extraction, dimensional reduction, clustering, topic modeling, imputation, classification, retrieval and anomaly detection.

Some bias selection of developments
  • 1986: first introduced as Harmonium.
  • 2001: fast approximate biased learning introduced as Contrastive Divergence (CD)
  • 2004: generalized Harmonium introduced
  • 2006: used successfully in Deep Belief Networks
  • 2007: demonstrated with great success on a very large-scale task within the Netflix challenge
  • 2007: temporal RBM
  • 2008: recurrent temporal RBM
  • 2008: classification RBM
  • 2008: persistent CD introduced, essentially another variant of Young's.
  • 2008: convolutional RBMs
  • 2008: universality property proved
  • 2009: topic models with Replicated Softmax
  • 2009: matrix modelling with non i.i.d. RBMs, ordinal data, semi-restricted RBM
  • 2009: implicit mixtures of RBMs
  • 2010: factored high-order RBM
  • 2010: mean-covariance RBM
  • 2010: rectifier linear units RBM
  • 2010: deep BM
  • 2011: mixed-variate RBM
  • 2012: a proper modeling of ordinal matrix data
  • 2013: Thurstonian BM for joint modeling of most known data types
  • 2013: nonnegative RBMs for parts-based representation
  • 2015: trained with graph priors, demonstrating better generalization
  • 2015: extended to tensor-objects
  • 2016: infinite RBM
In short, most of the work has been on extending the representational power of RBM to suit problem structures. The rest is about analysing theoretical properties of RBMs, making deep nets out of RBMs, and improving training speed & accuracy. For the past few years, research about RBMs has slowed down significantly, mostly because the superb accuracy of supervised deep nets, and the ease of deployment of deterministic nets on large-scale problems. 

Some of our own work
  • Multilevel Anomaly Detection for Mixed Data, K Do, T Tran, S Venkatesh, arXiv preprint arXiv: 1610.06249.
  • Learning deep representation of multityped objects and tasks, Truyen Tran, D. Phung, and S. Venkatesh, arXiv preprint arXiv: 1603.01359.
  • Outlier Detection on Mixed-Type Data: An Energy-based Approach, K Do, T Tran, D Phung, S Venkatesh, International Conference on Advanced Data Mining and Applications (ADMA 2016).
  • Graph-induced restricted Boltzmann machines for document modeling, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, Information Sciences. doi:10.1016/j.ins.2015.08.023.
  • Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (e-NRBM), Truyen Tran, Tu D. Nguyen, D. Phung, and S. Venkatesh, Journal of Biomedical Informatics, 2015, pii: S1532-0464(15)00014-3. doi: 10.1016/j.jbi.2015.01.012. 
  • Tensor-variate Restricted Boltzmann Machines, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, AAAI 2015
  • Thurstonian Boltzmann machines: Learning from multiple inequalities, Truyen Tran, D. Phung, and S. Venkatesh, In Proc. of 30th International Conference in Machine Learning (ICML’13), Atlanta, USA, June, 2013.
  • Learning parts-based representations with Nonnegative Restricted Boltzmann Machine, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, Journal of Machine Learning Research (JMLR) Workshop and Conference Proceedings, Vol. 29, Proc. of 5th Asian Conference on Machine Learning, Nov 2013.
  • Latent patient profile modelling and applications with Mixed-Variate Restricted Boltzmann Machine, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh,  In Proc. of 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13), Gold Coast, Australia, April 2013.
  • Learning sparse latent representation and distance metric for image retrieval, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, In Proc. of IEEE International Conference on Multimedia and Expo (ICME), San Jose, California, USA, July 2013.
  • Learning from Ordered Sets and Applications in Collaborative Ranking, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 4th Asian Conference on Machine Learning (ACML2012), Singapore, Nov 2012.
  • Cumulative Restricted Boltzmann Machines for Ordinal Matrix Data Analysis, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 4th Asian Conference on Machine Learning (ACML2012), Singapore, Nov 2012.
  • Embedded Restricted Boltzmann Machines for Fusion of Mixed Data Types and Applications in Social Measurements Analysis, Truyen Tran, Dinh Phung, Svetha Venkatesh, in Proc. of 15-th International Conference on Information Fusion (FUSION-12), Singapore, July 2012.
  • Learning Boltzmann Distance Metric for Face Recognition, Truyen Tran, Dinh Phung, Svetha Venkatesh, in Proc. of IEEE International Conference on Multimedia & Expo (ICME-12), Melbourne, Australia, July 2012.
  • Mixed-Variate Restricted Boltzmann Machines, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 3rd Asian Conference on Machine Learning (ACML2011), Taoyuan, Taiwan, Nov 2011.
  • Ordinal Boltzmann Machines for Collaborative Filtering. Truyen Tran, Dinh Q. Phung and Svetha Venkatesh. In Proc. of 25th Conference on Uncertainty in Artificial Intelligence, June, 2009, Montreal, Canada. Runner-up for the best paper award.