Monthly Roundup - July 2018

What a great month! Read all about what we’ve been doing in sunny Brisbane, how we are using simulated environments to do machine learning and some of the exciting developments in the field of AI & ML.

Events

UseR! 2018 - Brisbane, July 10-13

UseR2018! was a fantastic conference, a big thank you to all the organisers (in particular Dianne Cook and Steph De Silva) for organising such a great event. We use open source technologies like R in our day to day work and think it’s important to support community events like this. One of my personal highlight’s was getting to chat with Max Kuhn, sounds like he’s got some great stuff in the pipeline.

We also presented deckard - our R package for large scale geo-spatial data visualisation. If you couldn’t make it to the conference you can check out the presentation below (along with other talks from the conference).



Breakfast Meetup

We spoke about reinforcement learning at the breakfast meetup, covering a few topics that we are going to be teaching at our first training event in two weeks, there’s only a few tickets left so make sure you get in before we stop selling next Monday.



Natural Language Processing Mini-Conference

Thanks to Ben Hachey, Adam Schuck and Will Radford for inviting us to talk at the Sydney NLP mini-conference.

It showcased some of the interesting work being done by a very active and strong NLP community in Sydney.

Upcoming

Insert AI to Continue

Learn the theory and application of reinforcement learning, a key approach behind self-driving cars, autonomous drones and playing computer games.

??? (Watch this space)

We’ve got something big planned in the next month or so, stay in touch for more details.

From the Verge Labs Blog

Intro to OpenAI Gym - presentation and code

How to get started with reinforcement learning and the OpenAI Gym.

Introducing Deckard for large scale data visualisation

Why we decided to build a package linking R and deck.gl.

Interesting articles we’ve read

How I find stories in data
In Edmund Tadros, an Accounting and consulting page editor and Data Editor at The Australian Financial Review links data and journalism by offering practical tips on how to use and manipulate copious amounts of large datasets that are available online in order to extract newsworthy data and backup journalism reports. Tadros speaks specifically about Australian web resources, although his insights could equally apply to any publicly available data. He makes a point that the data volume is so huge that you only need to know where to look for to weave a story backed by numbers.

Data sources journalists can use come from public organisations, for instance, the Department of Immigration and Border Protection and the Australian Tax Office, and from private organisations, such as the pharmaceuticals company Medicines Australia. By analysing data, Tadros was able to conclude that suicide is the leading cause of death among Australians 25-44 years old and that doctors spend 40,000 over a six-month period on traveling to present research findings.

The Natural Language Decathlon
The Salesforce research centre on artificial intelligence conducted a Natural Language Decathlon (DecaNLP) to develop a deep learning architecture that is not task-specific and could work across a variety of Natural Language Processing models. The team consisting of four members - Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher, created the DecaNLP with a span of ten NLP tasks, including question answering, machine translation, summarisation, natural language inference, sentiment analysis, semantic role labelling, relation extraction, goal-oriented dialogue, database query generation, and pronoun resolution. The goal was to see how a generalised model would perform when compared to the performance of task-specific models. The performance was measured with an aggregate DecaScore on a scale from 1 to 1,000. All separate tasks were cast as question answering in a jointly learning multi-task question answering network (MQAN).

Although the MQAN was designed for general question answering, it performed well on single tasks as well, for example, on the WikiSQL semantic parsing task, in goal-oriented dialogue, and on SQuAD for a model without direct span supervision. In view of generalised models, the MQAN demonstrated improvements in transfer learning for machine translation and named entity recognition, domain adaptation for sentiment analysis and natural language inference, and zero-shot capabilities for text classification.

An important conclusion drawn out from the baseline approaches used in this research is that separating question from context could enrich the presentation of various metrics and that a question pointer plays a key role in boosting the performance of the MQAN. All code and the full text of the research paper are publicly available to read and use from the Salesforce research page.

Building AI-first products
David Bessis, the CEO of Tinyclues, talks about bridging the gap from AI-inside to AI-first products by drawing an interesting parallel to the first cross-Atlantic sailing boats, whose initial intention was to sail the Atlantic without having the technology to do it. Bessis claims AI products are currently on the level of that first boat, including mostly AI-inside features, which means that they are built build upon existing technology with essential AI features. AI-inside products could exist without the AI-feature and that wouldn’t make a significant difference at a core level. In contrast, AI-first products apply AI technology on a core level, which makes them look simple when observed by users, but invariably complex from the perspective of the included AI tools.

Bessis mentions that the current agile frameworks don’t work well for developing minimally viable AI products. In order to make the shift from AI-inside to AI-first, Bessis suggests looking for solutions to people’s problems in the sense of removing something that used to be a part of their jobs. But such a solution must also implement something people will keep doing in order to turn it into a viable market product. Another challenge for AI-first products is the lack of systemic stability or appropriateness for long-term scaling. In Tinyclues, the long-term plan is based on three elements - antifragile design, modular architecture and instrumentation, and a real science team.Finally, an AI-first product startup must solve the market problem and the technology problem at the same time. More precisely, it needs to reliably solve a massive scale problem which users are comfortable enough to see it as chiefly supported by AI. Such market/technology alignment is critical for the success of AI products.

AI at Google: our principles

In this recent blog post, Google’s CEO Sundar Pichai elaborates the major implications AI will have on our lives. Google is, therefore, leading the change with seven governing principles for using AI technology. Pichai also mentions what the company will and won’t do in terms of AI application.

Google’s seven AI principles are:

  1. Wide social benefits of AI technologies, including industries such as healthcare, security, energy, transportation, manufacturing, and entertainment. Benefits must surpass the downsides, be locally appropriate and carefully introduced on a wider scale.
  2. Avoiding unfair bias, especially unjust impact in terms of sensitive traits such as race, ethnicity, gender, nationality, income, sexual orientation, ability, and political or religious belief.
  3. Designing cautious, safe, and secure AI systems initially tested in constrained AI environments.
  4. Accountability to people, including options for feedback, comments, explanations, and appeal.
  5. An adequate balance between privacy, transparency, and control in AI technologies.
  6. AI scientific research must be based on rigorous, multidisciplinary, collaborative, and inquisitive standards, which will be made publicly available to benefit vital areas, such as biology, chemistry, medicine, and environmental sciences.
  7. Actively working on removing potentially dangerous and harmful AI applications, by measuring the impact of the following four factors - primary purpose and use, nature and uniqueness, scale, and nature of Google’s involvement.

Google won’t work on AI technologies that could:

  • Be harmful
  • Develop weapons
  • Illegitimate surveillance
  • Violate human rights and the international law

If you want to stay up to date with what we are doing then you can follow us on twitter.

Previous article Intro to OpenAI Gym -...

To generate some buzz about our upcoming course, Insert AI to Continue, Anthony...

Next article Data Privacy: Don't Forget the...

Anthony Tockar (CIPM, CIPP/E) is an IAPP-certified information privacy manager...

Get in touch

To contact us, please fill in your details.
We're fast responders.

Do you have data?
No
Yes