Overcoming Priors for Visual Question Answering

Abstract

A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the model from ‘cheating’ by primarily relying on priors in the training data. Specifically, GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. GVQA is built off an existing VQA model – Stacked Attention Networks (SAN). Our experiments demonstrate that GVQA significantly outperforms SAN on both VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in several cases. GVQA offers strengths complementary to SAN when trained and evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more transparent and interpretable than existing VQA models.

Aishwarya AgrawalDhruv BatraDevi Parikh, Aniruddha Kembhavi
[Facebook AI Research]

Download the full paper here

The importance of single directions for generalization

ABSTRACT

Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which generalize and those which do not remain unclear. Additionally, the tuning properties of single directions (defined as the activation of a single unit or some linear combination of units in response to some input) have been highlighted, but their importance has not been evaluated. Here, we connect these lines of inquiry to demonstrate that a network’s reliance on single directions is a good predictor of its generalization performance, across networks trained on datasets with different fractions of corrupted labels, across ensembles of networks trained on datasets with unmodified labels, across different hyperparameters, and over the course of training. While dropout only regularizes this quantity up to a point, batch normalization implicitly discourages single direction reliance, in part by decreasing the class selectivity of individual units. Finally, we find that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance.

Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, & Matthew Botvinick
@DeepMind

Download the full paper here

SingularityNET: A decentralized, open market and inter-network for AIs

ABSTRACT

The value and power of Artificial Intelligence is growing dramatically every year, and will soon dominate the internet – and the economy as a whole. However, AI tools today are fragmented by a closed development environment; most are developed by one company to perform one task, and there is no way to plug two tools together. SingularityNET aims to become the key protocol for networking AI and machine learning tools to form a coordinated Artificial General Intelligence. SingularityNET is an open-source protocol and collection of smart contracts for a decentralized market of coordinated AI services. Within this framework, the benefits of AI become a global commons infrastructure for the benefit of all; anyone can access AI tech or become a stakeholder in its development. Anyone can add an AI/machine learning service to SingularityNET for use by the network, and receive network payment tokens in exchange. SingularityNET is backed by the SingularityNET Foundation, which operates on a belief that the benefits of AI should not be dominated by any small set of powerful institutions, but shared by all. A key goal of SingularityNET is to ensure the technology is benevolent according to human standards, and the network is designed to incentivize and reward beneficial players.

Ben Goertzel

Download the full paper here

What is Offline AI?

[Originally posted by Jason Hadjioannou on Medium – 30th June 2017]

Offline AI refers to Artificial Intelligence programs that run on-device, as opposed to server-side APIs that run programs to perform AI tasks remotely. Why is this a thing? Well there are three big benefits to using Offline AI.

Speed

The first is operation speed. If a device has all the data it needs and possess the ability to perform intelligent tasks such as image recognition and natural language processing without needing to send/receive data processed on a remote server somewhere, then the speed of the operation is greatly improved due to the lack of reliance on network connectivity and/or server hardware performance.

An on-device AI program can run trained Machine Learning models and Neural Networks using nothing but the device hardware and software. Not having to rely on network connectivity greatly improves the speed of operation and has a positive impact on user experience. (Core ML for macOS and iOS is a framework by Apple that will allow such programs to run on a Mac or an iOS device. I’ll be talking more about Core ML in future posts).

Cost

The second benefit to using Offline AI comes hand in hand with the first. If Online is False then Network costs are Zero! To give you an example of how much of an impact this can have on an AI business..

The company I work for is an Artificial Intelligence and Augmented Reality company with a consumer facing mobile App, and it’s not uncommon for even medium-sized tech companies to spend millions of dollars per month on the server-side technology needed to perform the AI tasks that make an App’s feature-set possible.

The majority of this huge cost goes towards server hosting and data bandwidth fees that occur whenever the App sends image data from a user’s device camera and up to our online Neural Nets for processing. If you want really fast image-recognition performance for example, you’ll need to send up a lot of image data, multiple times per second. The promise of Offline AI eliminates this process altogether.

Privacy

The third benefit is one of increasing importance to society as consumer technology and the social media industry matures, so is it that the responsibility to protect people’s data is more greatly required. User data privacy is an ethically important practice made possible by Offline AI.

Processing all data on-device means that it is sandboxed and better protected against data abuse and server hacking. Yes the device could still be hacked or stolen for that matter, but the risk of user data abuse is greatly reduced as the data is never sent to a remote network or stored server-side. User data can be processed, used for current tasks and then purged without leaving digital breadcrumbs when the data is no longer needed.

In time, as AI applications become more intwined with our daily lives, the need for this type of responsibility will increase and the onus is on us, program developers, software engineers and computer scientists to build such applications that behave respectively towards the personal security of the people that use them.

For more talks on Offline AI and specifically the use of Core ML in iOS mobile Apps, check out my posts on Medium: https://medium.com/@jason.io

WaveNet: A Generative Model for Raw Audio

ABSTRACT

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-ofthe-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
{avdnoord, sedielem, heigazen, simonyan, vinyals, gravesa, nalk, andrewsenior, korayk}@google.com
Google DeepMind

Download the full paper here

AI XPRIZE – AI competition with IBM Watson

The IBM Watson AI XPRIZE is a $5 million AI and cognitive computing competition challenging teams globally to develop and demonstrate how humans can collaborate with powerful AI technologies to tackle the world’s grand challenges. This prize will focus on creating advanced and scalable applications that benefit consumers and businesses across a multitude of disciplines. The solutions will contribute to the enrichment of available tools and data sets for the usage of innovators everywhere. The goal is also to accelerate the understanding and adoption of AI’s most promising breakthroughs.

Every year leading up to TED 2020, the teams will compete for interim prizes and the opportunity to advance to the next year’s competition. The three finalist teams will take the TED stage in 2020 to deliver jaw-dropping, awe-inspiring TED Talks demonstrating what they have achieved.

Typical of all XPRIZE competitions, the IBM Watson AI XPRIZE will crowdsource solutions from some of the most brilliant thinkers and entrepreneurs around the world, creating true exponential impact.

To compete in the IBM Watson AI XPRIZE you must be a fully registered team. To complete your registration, you must create a Team profile, sign the Competitor’s Agreement and pay the registration fee.

AI Xprize Timeline

PRIZE PURSE

Grand Prizes

The $3,000,000 Grand Prize, $1,000,000 2nd Place, and $500,000 3rd Place purses will be awarded at the end of competition at TED2020, for a total of $4.5 million.

Milestone and Special Prizes

Two Milestone Competition prize purses will be awarded at the end of each of the first two rounds of the competition, and the Judges may award additional special prizes to recognize special accomplishments. A total of $500,000 will be available for these prizes and will be allocated by the Judges for special accomplishments.

THE NEED FOR THE PRIZE

The progress in AI research and applications in the past 20 years makes it timely to focus attention not only on making AI more capable, but also on maximizing the societal benefit of AI. The democratization of exponential technology enables AI and cognitive computing to put empowerment into the hands of innovators everywhere. Driven by long term capabilities of AI impact, and to better understand the prospects of human and AI collaboration, the IBM Watson AI XPRIZE provides an interdisciplinary platform for domain experts, developers and innovators to, through collaboration, push the boundaries of AI to new heights. The competition will bring the AI community together and accelerate the development of scalable, hybrid solutions and audacious breakthroughs to address humanity’s grandest challenges.

You can register for the competition at: https://aiportal.xprize.org/en/registration

Watch AlphaGo take on Lee Sedol, the world’s top Go player

Watch AlphaGo take on Lee Sedol, the world’s top Go player, in the final match of the Google DeepMind challenge.

Match score: AlphaGo 3 – Lee Sedol 1.
[Game five: Seoul, South Korea, 15th March at 13:00 KST; 04:00 GMT; for US at -1 day (14th March) 21:00 PT, 00:00 ET.]

The Game of Go 

The game of Go originated in China more than 2,500 years ago. The rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture the opponent’s stones or surround empty space to make points of territory. As simple as the rules are, Go is a game of profound complexity. There are more possible positions in Go than there are atoms in the universe. That makes Go a googol times more complex than chess. Go is played primarily through intuition and feel, and because of its beauty, subtlety and intellectual depth it has captured the human imagination for centuries. AlphaGo is the first computer program to ever beat a professional, human player. Read more about the game of Go and how AlphaGo is using machine learning to master this ancient game.

Match Details 

In October 2015, the program AlphaGo won 5-0 in a formal match against the reigning 3-times European Champion, Fan Hui, to become the first program to ever beat a professional Go player in an even game. Now AlphaGo will face its ultimate challenge: a 5-game challenge match in Seoul against the legendary Lee Sedol, the top Go player in the world over the past decade, for a $1M prize. For full details, see the press release.

The matches were held at the Four Seasons Hotel, Seoul, South Korea, starting at 13:00 local time (04:00 GMT; day before 20:00 PT, 23:00 ET) on March 9th, 10th, 12th, 13th and 15th.

The matches were livestreamed on DeepMind’s YouTube channel as well as broadcast on TV throughout Asia through Korea’s Baduk TV, as well as in China, Japan, and elsewhere.Match commentators included Michael Redmond, the only professional Western Go player to achieve 9 dan status. Redmond commentated in English, and Yoo Changhyuk professional 9 dan, Kim Sungryong professional 9 dan, Song Taegon professional 9 dan, and Lee Hyunwook professional 8 dan commentated in Korean alternately.The matches were played under Chinese rules with a komi of 7.5 (the compensation points the player who goes second receives at the end of the match). Each player received two hours per match with three lots of 60-second byoyomi (countdown periods after they have finished their allotted time).

Singularity Or Bust [Documentary]

In 2009, film-maker and former AI programmer Raj Dye spent his summer following futurist AI researchers Ben Goertzel and Hugo DeGaris around Hong Kong and Xiamen, documenting their doings and gathering their perspectives. The result, after some work by crack film editor Alex MacKenzie, was the 45 minute documentary Singularity or Bust — a uniquely edgy, experimental Singularitarian road movie, featuring perhaps the most philosophical three-foot-tall humanoid robot ever, a glance at the fast-growing Chinese research scene in the late aughts, and even a bit of a real-life love story. The film was screened in theaters around the world, and won the Best Documentary award at the 2013 LA Cinema Festival of Hollywood and the LA Lift Off Festival. And now it is online, free of charge, for your delectation.

Singularity or Bust is a true story pertaining to events occurring in the year 2009. It captures a fascinating slice of reality, but bear in mind that things move fast these days. For more recent updates on Goertzel and DeGaris’s quest for transhuman AI, you’ll have to consult the Internet, or your imagination.

[Full Documentary]

Machine Learning With Stanford University

Stanford University are holding a Machine Learning course with coursera.org.

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.

Can I earn a Course Certificate if I completed this course before they were available?
In order to verify one’s identity and maintain academic integrity, learners who completed assignments or quizzes for Machine Learning prior to November 1st will need to redo and resubmit these assessments in order to earn a Course Certificate. To clarify, both quizzes and programming assignments need to be resubmitted. Though your deadlines may have technically passed, please be assured that you may resubmit both types of assessments at any time. We apologise for the inconvenience and appreciate your patience as we strive to ensure the integrity and value of our certificates.

Please note that, in order to earn a Course Certificate, you must complete the course within 180 days of payment, or by May 1, 2016, whichever is earlier.

Enrolment ends February 27

The State of Artificial Intelligence – Davos 2016 Talk

How close are technologies to simulating or overtaking human intelligence and what are the implications for industry and society?

This talk took place on 20th January 2016 – at the World Economic Forum Annual Meeting (and was developed in partnership with Arirang).

Moderated by:
Connyoung Jennifer Moon, Chief Anchor and Editor-in-Chief, Arirang TV & Radio, Republic of Korea

Matthew Grob, Executive Vice-President and Chief Technology Officer, Qualcomm, USA
Andrew Moore, Dean, School of Computer Science, Carnegie Mellon University, USA
Stuart Russell, Professor of Computer Science, University of California, Berkeley, USA
Ya-Qin Zhang, President, Baidu.com, People’s Republic of China