New AI Method Wins Coveted NeurIPS Award

ODE network is an innovative deep neural network model.

Posted Jan 08, 2019

pixabay/geralt
Source: pixabay/geralt

Recent breakthroughs in artificial intelligence (AI) are largely due to deep learning, a machine learning technique that enables a computer to learn from data put through a multiple processing layers, rather than to run from explicit hardcoding. Most deep learning models are artificial neural networks with architectural concepts somewhat inspired by biological neurons of the human brain. Last month at the NeurIPS conference, a team of AI researchers from the University of Toronto and the Vector Institute of Toronto, Canada, won a “Best Paper Award” for “Neural Ordinary Differential Equations”—one out of just four papers selected from the many thousands of scientific papers submitted to one of the largest conferences focused on artificial intelligence.

Training a deep neural network with many layers is far more difficult than a shallow architecture that contains one or two layers of computation. One of the challenges of gradient-based training of deep supervised neural networks is that with more layers of computation it is more difficult to arrive at a good generalization as degradation occurs. Kaiming He and his team at Microsoft Research addressed the degradation problem by reformulating the layers as learning residual functions with reference to the layer inputs. Residual networks work by defining a discrete sequence of finite transformations. The researchers found that their residual networks could gain accuracy with increased network depth and that were also easier to optimize.

However this approach could prove problematic for AI systems where the data input happens at random, rather than discrete intervals. Traditional recurrent neural network time series architecture require discrete intervals to input data. Take automobiles for example. A well-functioning vehicle typically may visit the dealer for regularly scheduled maintenance. But what happens when there is a car accident, a recall, or unexpected malfunction? In real life, data points often occur at random times—fitting data to discrete intervals may contribute to less accuracy.

The AI research team of David Duvenaud, Jesse Bettencourt, Ricky T.Q. Chen, and Yulia Rubanova debuted a new type of scalable deep neural network model that is both memory and parameter efficient. Rather than using discrete sequence of finite transformations layers, they applied principles of calculus to create a continuous-depth model comprised of an ODE (Ordinary Differential Equation) Network.

The research team parameterized the “continuous dynamics of hidden units using an ordinary differential equation (ODE) specified by a neural network.” The ODE Network creates output using a black-box differential equation solver that uses the adjoint method to compute gradients.

This structural approach may have several advantages. Their model does not store the intermediate quantities of the forward pass so it is cost efficient when it comes to memory. The solution is also parameter efficient. For supervised learning tasks, fewer parameters are needed because parameters of nearby layers are automatically joined when the hidden unit dynamics are parameterized as a continuous function of time. The ODE Network model is a continuous time-series model designed to incorporate the random timing of input data.

With these benefits, the ODE Network has the potential to disrupt deep neural networks across many areas where time-series data events may not occur at regular intervals, such as health care patient monitoring, manufacturing, personalized medicine, scientific research, autonomous vehicles, pharmacogenomics, asset tracking systems, financial trading, customer service, business intelligence, and many more applications. It is a new model for deep neural networks that has the potential to take artificial intelligence to the next level in the future.

References

Chen, Ricky T.Q., Rubanova, Yulia, Bettencourt, Jesse, Duvenaud, David. “Neural Ordinary Differential Equations.” arXiv: 1806.07366. 19 Jun 2018.

Bengio, Yoshua. “Learning Deep Architectures for AI.” Foundations and Trends in Machine Learning. Vol.2, no.1 (2009).

He, Kaiming, Zhang, Xiangyu, Ren Shaoquing, Sun, Jian. “Deep Residual Learning for Image Recognition.” arXiv: 1512.03385v1. 10 Dec 2015.