Sunday, December 17, 2017

My NIPS write up

Just as a quick disclaimer, this post is about my personal experience and opinions at NIPS 2017, and I'm not an AI researcher, I work as a data scientist in the industry. For a more technical summary of the talks and papers presented, you may want to check this document by David Abel.

Deep learning rigor and interpretability

This is quite a controversial topic, but this is how I see it. There are two main approaches to the idea of statistics/learning:
  1. Understand how learning works, and replicate it based on this understanding
  2. Focus on results, no matter if it's at the cost of poor understanding
I think these two approaches were first dividing statisticians and machine learning practitioners, as Leo Breiman describes in The two cultures. And in a similar way, today it divides the Deep learning school, which is somehow winning in terms of results, from other techniques.

My view on deep learning is that we've managed to understand in a general way the how the human brain works. Not why, but with the research of people like Santiago Ramon y Cajal, Camilo Golgi, Donald Hebb..., we know that it's a network of neurons, and that the "intelligence" is on how the neurons connect, and not in the neurons themselves.

With the research of Warren McCulloch, Walter Pitts, John Hopfield, Geofreey Hinton..., we can replicate this structure of neurons in an artificial way. Just with a set of connected linear regressions, with activation functions to break the linearity. And with current computation power, including optimized hardware like GPUs, we can implement networks of neurons at a huge scale. We know that the model works, because it works for the human brain, and we're confident it's the same. But we don't know how each neuron is connected in the brain (how much signal it needs to receive from the other networks to activate), so we miss the weights of the linear regressions.

With techniques like backpropagation, stochastic gradient decent... we can optimize the weights to make useful things, like image or sound recognition and generation.

So, how I see it, the main question is:
  • Does it matter the rigor, how much we understand about what we do, how much we understand our models and their predictions? Or we just care about minimizing the out of sample error?
This may be a free interpretation of what was being discussed at NIPS, for example at Ali Rahimi's talk, or at the interpretability debate. It was interesting to see how excited people was about the debate, and the "celebrities" on the stage:

I think someone important was missing from the debate, and it's what Chris Olah and Shan Carter describe as research debt. Like in software, it's not only important what do you have today. It's important what will you have in the future. The best the internal quality of your software, the easier will be to improve it and add new features in the future. I think every good sofware engineer is aware of how important is to keep technical debt under control. But I don't think most researchers are aware that our understanding of the research today, is key for future research.

So, in my opinion, it's not that important that with deep learning we can have state of the art results in many areas. I don't think we'll have much better results in the future, unless we focus on quality research, and not just trying random things to get a small increase in the model accuracy.


I think Generative Adversarial Networks were by far the most popular topic at NIPS. I'm not sure how many talks Ian Goodfellow gave, but it don't think it wasn't far from one every day. And it was all sort of applications of GANs, including many for creativity and design. We're not yet in the point of being able to generate arbitrary images with high definition, but it doesn't seem it'll take that long to have even more impressive results than what we've already seen. One of the most discussed articles was the GAN that generates celebrity faces.

Bayesian statistics

Bayesian statistics was also very present during the whole NIPS. Many times together with deep learning, like in the Bayesian deep learning and deep Bayesian learning talk, the Bayesian deep learning workshop, or the Bayesian GAN paper. Gaussian processes and Bayesian optimization was also present from the tutorials, to the workshops.

Surprisingly to me, most of the papers presented about multi-armed bandit problems were based on frequentist statistics. And I say surprisingly, because I think the industry is mostly adopting Bayesian methods for A/B testing, one of the main applications. In my opinion Bayesian methods are much simpler and intuitive, and tend to offer better results. One of the hot topics in this area is lowering the false discovery rate in repeated tests. And many paper about contextual bandits were also presented, and are that I discovered at NIPS.

Reinforcement learning

RL was the last of the main topics that kept repeating during the whole NIPS, if I'm not missing any. Both based on the classic q-learning, or by using deep learning representations.

Other topics

There were a couple of other topics that I found interesting, and that they were new to me:
  • Optimal transportation
  • Distribution regression
A great talk, but not because of the technical content, was the "Improvised Comedy as a Turing Test", where two researchers and comedian performed improvised comedy with a robot implemented by them:

About the conference

It was the first time for me attending an academic conference, and some things weren't very intuitive, being used to open source of business conferences. This is a random list with my thoughts:
  • I found the location quite good:
    • Near to a main airport, so I could fly directly from London
    • Good temperature
    • Many hotels nearby
    • English speaking country
    • The only problem with the location was that people from several countries (e.g. Iran) were banned from attending, as the organizers mentioned in the home page of the conference
  • I found the use of an app to communicate during the conference quite convenient. Even if the app had some obvious flaws, like the mess with the list of discussions, it added a lot of value
  • I found it difficult to know what to expect about food. I think in all previous conference I attended (and they are not few), breakfast and lunch was provided. At #NIPS it was advertised in the schedule that breakfast wasn't offered first time in the morning, no other mention. Then, breakfast was provided later in the morning (one day the breakfast was obviously decided by an algorithm). Lunch wasn't provided, and dinner was provided, but in a different undisclosed location in the venue. One day dinner was provided twice (the regular, plus a voucher for a food truck, only valid that day for dinner).
  • The sponsors were quite interesting. Not only because I managed to get up to 10 t-shirts (including one with Thomas Bayes face), but because I've got very interesting conversations with many people at the booths. I found it interesting the diversity of countries represented in the sponsor area. While one could expect that Silicon Valley companies could eclipse the rest, the number of Chinese and English companies was at the same level, and some other countries represented, like Canada or Germany. One of the fun things on the sponsors sections were the live cameras performing predictions or style transfer:

  • Compared to open source conferences, I found the atmosphere at NIPS very different. May be it's by the nature of research and open source, but my experience is that open source conferences have a very collaborative environment. You don't necessarily need to like or use someone else's project, to have a friendly discussion or appreciate his contribution. But I felt research quite a competitive environment. More than once I saw people in presentations or posters addressing the presenter in a not very nice way. Challenging their research, trying to point out that they know better. I think providing constructive feedback is always great, but I found sad this feeling of mine (that may be biased by just the few examples I saw) that researchers see each others more as rivals, than as part of a community that delivers together.


On the systems part (mainly in the workshop), it was very interesting to see the talks about the main tensor software from the big companies at Silicon Valley:
On the fun side, TensorFlow presented their eager mode, and Soumith Chintala mentioned that "PyTorch implementes the eager mode, before the eager mode existed". And some time after he mentioned that PyTorch will implement distributions soon, the way TensorFlow does. So, the main innovation from each project, is copied from the competitor. :)

Tensors aside, the star of the ML Systems workshop was Jeff Dean. He discussed TPUs, and how Google is creating the infrastructure for training deep learning models. The interest in Google, deep learning and Jeff Dean was maximum, and the room was as crowded as a room can be. Some time before the talk, I had the honor to meet Jeff Dean, as the picture proves:

On the more pragmatic part, it was interesting to see the poster about CatBoost, Yandex's version of gradient boosting trees. I found the ideas in the paper quite interesting. There are different novel parts compared to xgboost. I spent a bit of time testing if the results were as good as presented, but the documentation is not yet as good as could be, and the API a bit confusing, and I finally gave up.

One of the most interesting insights from NIPS, wasn't actually presented. It was in a discussion with Gael Varoquaux, core contributor of scikit-learn. I wanted to talk with him about scikit-learn, and see if we could help with its development as part of the London Python Sprints group. But given the current state and the nature of the project, that doesn't seem very useful at this point (See this comment for clarification on this). But what it was interesting about the conversation, was to discover the new ColumnTransformer. While it's not yet merged, a pull request already exists to be able to apply sklearn transformers to a subset of columns. At the moment sklearn doesn't provide an easy way (or a way that you can understand your models later), and I think most of us were implementing this ourselves in our own projects.

A sad story

To conclude, I want to mention not something that I experienced myself at NIPS, but that many of us read later on, and it's Kristian Lum story about sexual harassment in research. Hopefully all this wave of scandals is the beginning of the end, from English politicians, to Hollywood... And it may not be fair, but while equally disgusting as all the other cases, I found it more surprising in research. That the brightest minds in their fields have been abusing and abused, is something that I find more shocking than in an industry like Hollywood.

The second part of the story, this one with names, came not much later, in this Bloomberg article.

On a positive note, I think the problem is not that difficult to solve. In the Python community I think we've got all the mechanisms in place in order to avoid these problems as much as possible. With strict codes of conducts, to whistleblower channels in conferences like EuroPython, to a friendly and inclusive environment. The paradox is that the proportion of female attendees in Python conferences is much smaller than what I saw at NIPS. I'd bet a large number of women should make these cases less likely.

I hope the example of Kristian is not only useful to fix this specific case, but also to make it easier for other people to speak up, and finish with this forever.


  1. Thanks Marc, this write up was super interesting for those of us who couldn't make it to NIPS this year!

    "But given the current state and the nature of the project, that doesn't seem very useful at this point." — what do you mean by that?


    1. Hehe, that wasn't very specific, sorry.

      By the nature of sklearn, I meant that it's a very academic project. Meaning that to write sklearn code, it's not enough to have an excellent Python level, but also have a very good understanding of the algorithms. In general, participants of our sprints are very good at Python, but in most cases we don't know the details of the projects, or the theory behind them.

      And the current state of sklearn as described by Gael, is that pull requests are not their bottleneck. At the time of writing this comment there are 572 open pull requests in sklearn repository. So, their bottle neck is reviewers and core developers. Something that we can't help much with our sprints.

      May be in the future there are some tasks we can't help with, but so far doesn't seem feasible. That's what I meant with that imprecise comment, thanks for pointing out.