The Machine Learning Salon


What's New?


The Machine Learning Salon provides free information about Machine Learning and Artificial Intelligence. Its aim is to develop the understanding of Machine Learning Theory and its applications by providing a first set of useful websites to people who are interested in Machine Learning. All descriptions are coming from the websites themselves.



Pint of Science UK 2016

From the 23rd to the 25th of May, 2016, Pint of Science held their annual festival, which spanned 12 countries, 107 cities, with 935 events, 2058 speakers and 48580 attendees.

Explore attendance of the 320 events across the UK.

Use the map to filter topics, click on a city in the chart for more information.



CS 229 Machine Learning, Course Materials by Professor John Duchi, Stanford University


CS 229 Machine Learning, Final Projects, Spring 2016, Stanford University


Wekinator by Rebecca Fiebrink

The Wekinator is free, open source software originally created in 2009 by Rebecca Fiebrink. It allows anyone to use machine learning to build new musical instruments, gestural game controllers, computer vision or computer listening systems, and more.

The Wekinator allows users to build new interactive systems by demonstrating human actions and computer responses, instead of writing programming code.


Machine Learning as Creative Tool for Designing Real-Time Expressive Interactions by Rebecca Fiebrink, Microsoft Research Youtube channel

Supervised learning algorithms can be understood not only as a set of techniques for building accurate models of data, but also as design tools that can enable rapid prototyping, iterative refinement, and embodied engagement- all activities that are crucial in the design of new musical instruments and other embodied interactions. Realising the creative potential of these algorithms requires a rethinking of the interfaces through which people provide data and build models, providing for tight interaction-feedback loops and efficient mechanisms for people to steer and explore algorithm behaviours. In this talk, I will discuss my research on better enabling composers, musicians, and developers to employ supervised learning in the design of new real-time systems. I will show a live demo of tools that I have created for this purpose, centering around the Wekinator software toolkit for interactive machine learning. I'll discuss some of the outcomes from 6 years of employing and observing others using machine learning in creative contexts. These include a better understanding how machine learning can be used as a tool for design by end users and developers, and how using machine learning as a design tool differs from more conventional application contexts.



New free Starter Kit coming soon ... I'll tweet on the @TheMLSalon to let you know.

I'm always looking for relevant Worldwide Machine Learning information such as:

- websites of Machine Learning Department from Universities or Institutions (anywhere in the World);

- non commercial relevant Machine Learning information (as I don't do any advertising);

- information that the author has agreed to publish.

I've got quite a few links in my free 300-page starter kit, but please, let me know if I missed something at Many thanks in advance!




The risk of machine learning by Alberto Abadie & Maximilian Kasy, Assistant Professor of Economics, Harvard University

Applied economists often simultaneously estimate a large number of parameters of interest, for instance treatment effects for many treatment values (teachers, locations...), treatment effects for many treated groups, and prediction models with many regressors. To avoid overfitting in such settings, machine learning estimators (such as ridge, lasso, pre-testing) combine (i) regularized estimation, and (ii) data-driven choice of regularization parameters. We aim to provide guidance for applied researchers to choose between such estimators, assuming they are interested in precise estimates of many parameters. We characterize the risk (mean squared error) of regularized estimators and analytically compare their performance, assuming regularization parameters are chosen optimally. We furthermore show that data-driven choices of regularization parameters (using Stein's unbiased risk estimate or cross-validation) yield estimators with risk uniformly close to the optimal choice. We apply our results using data from the literature, on the causal effect of locations on intergenerational mobility, on illegal trading by arms companies with conflict countries under an embargo, and on series regression estimates of the effect of education on earnings. In these applications the relative performance of alternative estimators corresponds to that suggested by our results.


César A. Hidalgo, Macro Connections group at The MIT Media Lab (great talks & research!)

César A. Hidalgo leads the Macro Connections group at The MIT Media Lab and is also an Associate Professor of Media Arts and Sciences at MIT. Hidalgo's work focuses on understanding the evolution of information in natural, social, and economic systems, and on the development of big data visualization engines that make available unwieldy volumes of data. Hidalgo's academic publications have been cited more than 5,000 times and his visualization engines have received more than 8 million visits. He is the author of Why Information Grows (Basic Books, 2015) and the co-author of The Atlas of Economic Complexity (MIT Press, 2014). He lives in Somerville Massachusetts with his wife Anna and their daughter Iris.



In 2014, Deloitte, Datawheel, and Cesar Hidalgo, Professor at the MIT Media Lab and Director of MacroConnections, came together to embark on an ambitious journey -- to understand and visualize the critical issues facing the United States in areas like jobs, skills and education across industry and geography. And, to use this knowledge to inform decision making among executives, policymakers and citizens.



New updated free Starter Kit coming soon ... I'll tweet on the @TheMLSalon to let you know.

I'm always looking for relevant Worldwide Machine Learning information such as:

- websites of Machine Learning Department from Universities or Institutions (anywhere in the World);

- non commercial relevant Machine Learning information (as I don't do any advertising);

- information that the owner as agree to publish.

I've got quite a few links in my free 300-page starter kit, but please, let me know if I missed something at Many thanks in advance!




MIT Tech Review: Will Machines Eliminate Us? Interview with Professor Yoshua Bengio

People who worry that we’re on course to invent dangerously intelligent machines are misunderstanding the state of computer science.



Google DeepMind: Ground-breaking AlphaGo masters the game of Go

In a paper published in Nature on 28th January 2016, we describe a new approach to computer Go. This is the first time ever that a computer program “AlphaGo” has defeated a human professional player.

The game of Go is widely viewed as an unsolved “grand challenge” for artificial intelligence. Games are a great testing ground for inventing smarter, more flexible algorithms that have the ability to tackle problems in ways similar to humans. The first classic game mastered by a computer was noughts and crosses (also known as tic-tac-toe) in 1952. But until now, one game has thwarted A.I. researchers: the ancient game of Go.

Despite decades of work, the strongest computer Go programs only played at the level of human amateurs. AlphaGo has won over 99% of games against the strongest other computer Go programs. It also defeated the human European champion by 5-0 in tournament games, a feat previously believed to be at least a decade away. In March 2016, AlphaGo will face its ultimate challenge: a 5-game challenge match in Seoul against the legendary Lee Sedol—the top Go player in the world over the past decade

This video tells the story so far...

With Demis Hassabis, Google DeepMind

Google AI in landmark victory over Go grandmaster



Math and ML Videos by Ritvik Kharkar, UCLA

The Undergraduate Mathematics Students Association

The Undergraduate Mathematics Students Association is a student group sponsored by the UCLA Mathematics Department. It is open to all people that are interested in mathematics, particularly catering to the UCLA pure and applied mathematics communities.

UMSA focuses on helping undergraduates gain a sense of community among their fellow UCLA math enthusiasts. This is achieved through regular social events, guest speakers, and info sessions. Our organization recognizes the academic concerns of our members, whether it's applying to graduate school, finding a career path, or even choosing the right major. As a group we will do our best to meet these needs, so we appreciate your ideas and feedback!

Mentorship Director: Ritvik Kharkar

Year: 3rd

Major: Math of Computation + Economics

Career Interests: Career Interest: Actuary

Extracurricular Activities: Running, Boxing, Eating

Fun Fact: I make math videos on YouTube!



Papers We Love (Awesome idea!)

Papers We Love is a repository of academic computer science papers and a community who loves reading them.


She Started It, A Documentary On Women Tech Founders, Five young women will stop at nothing to pursue their start-up dreams (A great idea, and so true! PS: I hope I can soon launch my startup too!)

She Started It is a feature length documentary film on women tech founders, shot on location in Silicon Valley, NYC, Europe, Vietnam, Mississippi & more, that aims to highlight successful role models for young women, to encourage more girls to develop technical and entrepreneurial skills.


Entrepreneurs are the new rock stars. The names that come to mind are usually Bill Gates, Mark Zuckerberg or Steve Jobs. Rarely does one think a young woman. And why would they? Only 3 percent of all tech start-ups are founded by women. She Started It is a debut documentary film by Nora Poggi & Insiyah Saeed aiming to dispel those myths and spotlight those women that are starting companies. She Started It profiles five of those driven women – they are; Stacey Ferreira, Thuy Truong, Brienne Ghafourifar, Agathe Molinar & Sheena Allen. The film follows a few of these young entrepreneurs over the course of 2+ years and illuminates the ups and downs they face as they attempt to build their new businesses, for the first time on screen in an exciting, narrative, story driven film. By showcasing these young role models, we hope to ignite that spark, “If she could do it, I could do it!”


OpenAI by Greg Brockman, Ilya Sutskever, and the OpenAI team (So happy that this company was created!)

OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.

Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as possible.

The outcome of this venture is uncertain and the work is difficult, but we believe the goal and the structure are right. We hope this is what matters most to the best in the field.



Machine Learning Courses by Professor Hsuan-Tien Lin (Videos in Mandarin), NTU

Prof. Hsuan-Tien Lin received a B.S. in Computer Science and Information Engineering from National Taiwan University in 2001, an M.S. and a Ph.D. in Computer Science from California Institute of Technology in 2005 and 2008, respectively. He joined the Department of Computer Science and Information Engineering at National Taiwan University as an assistant professor in 2008, and has been an associate professor since August 2012. He is a consultant to Appier, a startup company that specializes in online advertisement.

Prof. Lin received the Distinguished Teaching Award from the university in 2011, and the Outstanding Mentoring Award from the university in 2013. He co-authored the introductory machine learning textbook Learning from Data and offered two popular Mandarin-teaching MOOCs Machine Learning Foundations and Machine Learning Techniques based on the textbook. His research interests include theoretical foundations of machine learning, studies on new learning problems, and improvements on learning algorithms. He received the 2012 K.-T. Li Young Researcher Award from the ACM Taipei Chapter, and the 2013 D.-Y. Wu Memorial Award from National Science Council of Taiwan. He co-led the teams that won the third place of KDDCup 2009 slow track, the champion of KDDCup 2010, the double-champion of the two tracks in KDDCup 2011, the champion of track 2 in KDDCup 2012, and the double-champion of the two tracks in KDDCup 2013. He served as the Secretary General of Taiwanese Association for Artificial Intelligence between 2013 and 2014.


Professor Hsuan-Tien Lin Homepage, National Taiwan University



Summer Arctic Sea Ice Extent 1980 - 2012 by Tyler Reid and Paul Tarantino, Stanford University (video published on Dec 6, 2015)

This shows the change in the September summer minimum Arctic sea ice extent from 1980 - 2013.

This was prepared for a paper presented at the 2014 International Conference on Machine Learning Applications (ICMLA) in Detroit, Michigan by Tyler Reid and Paul Tarantino. The paper can be found here:


NASA | Ask a Climate Scientist: Thinning Ice Sheets by Dr Kelly Brunt

How can Greenland's ice sheets still be more than 10,000 feet thick, if carbon dioxide is warming the planet?

This was the question posed to polar scientist Kelly Brunt as part of NASA's Ask A Climate Scientist series. According to Dr. Brunt, the concept of thickness is very important to polar scientists. It is easy to see that the ice sheets and sea ice are changing horizontally (, covering less and less of the polar regions. But the ice is also thinning, getting smaller vertically.

Greenland's ice is very thick and only the outer layer is experiencing warming temperatures. But that layer is melting more often ( and a little bit of melting over such a large area produces a lot of extra water contributing to sea level rise.


NASA | Arctic Sea Ice Reaches 2015 Minimum Extent


NASA | Arctic Sea Ice Reaches 2014 Minimum Extent


NASA | Sea Ice Max 2013: An Interesting Year for Arctic Sea Ice



Machine Learning FAQ by Sebastian Raschka


Sebastian Raschka Homepage

This is the personal website of a data scientist and machine learning enthusiast with a big passion for Python and open source. Born and raised in Germany, now living in East Lansing, Michigan.


07-Nov-2015, Latest Submissions is the first online service which allows you to generate images styled to your favorite artist!

We apply an algorithm based on convolutional neural networks to combine the content of one image with the style of another image. The algorithm was introduced by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge in the research paper entitled 'A neural algorithm of artistic style'.

We are two researchers, trying to give the access to novel machine learning techniques to larger audience. Unfortunatelly the server costs a lot and we cannot afford many... Thus, if you like our work, please consider helping us with a small donation for covering server costs.



Generating Stories about Images by Samin

Recurrent neural network for generating stories about images

Stories are a fundamental human tool that we use to communicate thought. Creating a stories about a image is a difficult task that many struggle with. New machine-learning experiments are enabling us to generate stories based on the content of images. This experiment explores how to generate little romantic stories about images (incl. guest star Taylor Swift).


Neural-storyteller by Ryan Kiros, Machine Learning Group, University of Toronto

neural-storyteller is a recurrent neural network that generates little stories about images. This repository contains code for generating stories with your own images, as well as instructions for training new models.


Ryan Kiros, PhD student, Machine Learning Group, University of Toronto

I am a 3rd year PhD student under the supervision of Dr. Ruslan Salakhutdinov and Dr. Richard Zemel.

My research interests are in Statistical Machine Learning, Computer Vision and Natural Language Processing.


MIT EmTech Videos

Discover the technologies and innovators changing our world

MIT Technology Review's mission is to equip audiences with the intelligence to understand a world shaped by technology. We identify the technologies and innovators who have the greatest potential to change business, society, and the world for the better.

EmTech MIT brings this journalism to life. It's an opportunity to discover future trends and begin to understand the technologies that will drive the new global economy. It's where tech, business, and culture converge, and where you gain access to the most innovative people and companies in the world.

From AI and robotics to data-driven health care and future cities, the 15th annual EmTech MIT explores technologies highlighted in the recent 10 Breakthrough Technologies list and celebrates the 2015 Innovators Under 35.

November 2–4, 2015, MIT Media Lab, Cambridge, MA



ScienceTake | Making Faces, NYTimes

Computer scientists have a way to manipulate videos to change a person’s facial expressions in real time.



Large Scale Machine Learning by Professor Yoshua Bengio, IBM Research Youtube Channel

Dr. Yoshua Bengio’s current interests are centered on a quest for AI through machine learning, and include fundamental questions on deep learning and representation learning, the geometry of generalization in high-dimensional spaces, manifold learning, biologically inspired learning algorithms, and challenging applications of statistical machine learning. He is the author of two books and more than 200 publications, with the most influential being from the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing, and manifold learning. Dr. Bengio received a Ph.D. from McGill University in 1991, before completing two post-doctoral years at M.I.T. and AT&T Bell Laboratories. He is the Canada Research Chair in Statistical Learning Algorithms.


IBM Research Youtube Channel



Google Street View Hyperlapse (2 years old but so awesome!)

Hyper-lapse photography – a technique combining time-lapse and sweeping camera movements typically focused on a point-of-interest – has been a growing trend on video sites. It’s not hard to find stunning examples on Vimeo. Creating them requires precision and many hours stitching together photos taken from carefully mapped locations. We aimed at making the process simpler by using Google Street View as an aid, but quickly discovered that it could be used as the source material. It worked so well, we decided to design a very usable UI around our engine and release Google Street View Hyperlapse.

The site settings are purposely low (like having a maximum of 60 frames per animation) for greater accessibility. However, all the source code is available on Github (including examples and documentation) so developers can play with higher frame rates, better image quality, and more complicated camera movements.



The mission of the Interactive Data Lab is to enhance people's ability to understand and communicate data through the design of new interactive systems for data visualization and analysis. We study the perceptual, cognitive and social factors affecting data analysis in order to improve the efficiency and scale at which expert analysts work, and to lower barriers for non-experts.

Motivating questions include: How might we enable users to transform and integrate data with minimal programming? How can we support expressive and effective visualization designs? Can we build systems to query and visualize massive data sets at interactive rates? How might we enable domain experts to guide machine learning methods to produce better models?

Advances in computing and statistics provide new opportunities for data-driven discovery. However, breakthroughs in science and industry ultimately lie with the ability of empowered investigators to pursue questions, uncover domain-specifc patterns, identify errors, and assess model outputs. Though voiced nearly 50 years ago, the sentiments of Tukey & Wilk ring true today: to facilitate effective human involvement at all stages of data analysis remains a grand challenge.



Machine Learning CMU Fall 2015 10-715 by Alex Smola and Barnabas Poczos

The rapid improvement of sensory techniques and processor speed, and the availability of inexpensive massive digital storage, have led to a growing demand for systems that can automatically comprehend and mine massive and complex data from diverse sources. Machine Learning is becoming the primary mechanism by which information is extracted from Big Data, and a primary pillar that Artificial Intelligence is built upon.

This course is designed for Ph.D. students whose primary field of study is machine learning, or who intend to make machine learning methodological research a main focus of their thesis. It will give students a thorough grounding in the algorithms, mathematics, theories, and insights needed to do in-depth research and applications in machine learning. The topics of this course will in part parallel those covered in the general graduate machine learning course (10-701), but with a greater emphasis on depth in theory and algorithms. The course will also include additional advanced topics such as RKHS and representer theory, Bayesian nonparametrics, additional material on graphical models, manifolds and spectral graph theory, reinforcement learning and online learning, etc.


The Michigan Institute for Data Science (MIDAS), University of Michigan Data Science Initiative

The Michigan Institute for Data Science (MIDAS) is the focal point for the new multidisciplinary area of data science at the University of Michigan. This area covers a wide spectrum of scientific pursuits (development of concepts, methods, and technology) for data collection, management, analysis, and interpretation as well as their innovative use to address important problems in science, engineering, business, and other areas.

Data science is now widely accepted as the fourth mode of scientific discovery, on par with theory, physical experimentation and computational analysis. Techniques based on Big Data are showing promise not only in scientific research, but also in education, health, policy, and business.

Active research in Data Science at U-M ranges from data management, data curation and data-sharing incentives to statistics, machine learning, and data visualization, addressing problems in astronomy, evolutionary biology, disease model discovery, health policy, materials synthesis, personalized medicine, social sciences, and teaching and learning. Since 2010, the university has been running an annual symposium that has drawn participants from over 30 departments across the university.

The Michigan Institute for Data Science (MIDAS) was created in July 2015 as part of the University of Michigan Data Science Initiative.


MIDAS kickoff symposium, Advanced Research Computing at the University of Michigan


NELL: Never-Ending Language Learning, a Research Project at Carnegie Mellon University

Browse the Knowledge Base!

Can computers learn to read? We think so. "Read the Web" is a research project that attempts to create a computer system that learns over time to read the web. Since January 2010, our computer system called NELL (Never-Ending Language Learner) has been running continuously, attempting to perform two tasks each day:

First, it attempts to "read," or extract facts from text found in hundreds of millions of web pages (e.g., playsInstrument(George_Harrison, guitar)).

Second, it attempts to improve its reading competence, so that tomorrow it can extract more facts from the web, more accurately.

So far, NELL has accumulated over 50 million candidate beliefs by reading the web, and it is considering these at different levels of confidence. NELL has high confidence in 2,560,970 of these beliefs — these are displayed on this website. It is not perfect, but NELL is learning. You can track NELL's progress below or @cmunell on Twitter, browse and download its knowledge base, read more about our technical approach, or join the discussion group.


What machine learning teaches us about the brain | Tom Mitchell, World Economic Forum

Tom Mitchell introduces us to Carnegie Mellon’s Never Ending learning machines: intelligent computers that learn continuously with little need for human input. He says big data and machine learning will transform decision making across all aspects of life.


Early detection of epidemics with predictive analytics | Aarti Singh

Machine learning tools can revolutionize the speed and accuracy of diagnosing brain disease, says Aarti Singh from Carnegie Mellon University.


Machine learning for hospitals and health insurers | Daniel Neill

Daniel Neill turns digital detective as he explores the future of Public Health Surveillance. The scientist from Carnegie Mellon University says monitoring twitter feeds can help track disease outbreaks.


MLSS Sydney 2015


RE.WORK Deep Learning Summit, London 2015

The 3rd global Deep Learning Summit took place in London on 24-25 September 2015, to showcase the opportunities of advancing trends in deep learning and their impact on business & society.



Future directions of machine learning (Part 2) at the Royal Society

Published on Oct 12, 2015

On 22 May 2015, the Royal Society launched a new series of high-level conferences on major scientific and technical challenges of the next decade. 'Breakthrough Science and Technologies: Transforming our Future' conferences feature cutting-edge science from industry and academia and bring together leading experts from the wider scientific community, industry, government, funding bodies and charities.

The first conference was on the topic of machine learning and was organised by Dr Hermann Hauser KBE FREng FRS and Dr Robert Ghanea-Hercock.

This video session features:

- Professor Steve Furber CBE FREng FRS, “Building Brains” (00:00)

- Dr Demis Hassabis, "General learning algorithms” (1:25:55)

- Simon Knowles, “Machines for intelligence” (28:58)

- Professor Simon Benjamin, “Machine learning as a near-future applications of emerging quantum technologies” (56:05)


Future directions of machine learning (Part 1) at the Royal Society

Uploaded on Jul 21, 2015

On 22 May 2015, the Royal Society launched a new series of high-level conferences on major scientific and technical challenges of the next decade. 'Breakthrough Science and Technologies: Transforming our Future' conferences feature cutting-edge science from industry and academia and bring together leading experts from the wider scientific community, industry, government, funding bodies and charities.

The first conference was on the topic of machine learning and was organised by Dr Hermann Hauser KBE FREng FRS and Dr Robert Ghanea-Hercock.

This video session features:

- Professor Zoubin Ghahramani FRS, 'Future directions in probabilistic machine learning and the automatic statistician' (00:00)

- Professor Chris Bishop FREng, 'Model-based machine learning' (30:00)


Machine learning - breakthrough science and technologies  at the Royal Society

Uploaded on Jul 22, 2015

On 22 May 2015, the Royal Society launched a new series of high-level conferences on major scientific and technical challenges of the next decade. 'Breakthrough Science and Technologies: Transforming our Future' conferences feature cutting-edge science from industry and academia and bring together leading experts from the wider scientific community, industry, government, funding bodies and charities.

The first conference was on the topic of machine learning and was organised by Dr Hermann Hauser KBE FREng FRS and Dr Robert Ghanea-Hercock.

This video session features the keynote speaker Professor Geoff Hinton FRS, “Deep Learning”.


Applications and impacts of machine learning at the Royal Society

Uploaded on Jul 21, 2015

On 22 May 2015, the Royal Society launched a new series of high-level conferences on major scientific and technical challenges of the next decade. 'Breakthrough Science and Technologies: Transforming our Future' conferences feature cutting-edge science from industry and academia and bring together leading experts from the wider scientific community, industry, government, funding bodies and charities.

The first conference was on the topic of machine learning and was organised by Dr Hermann Hauser KBE FREng FRS and Dr Robert Ghanea-Hercock.

This video session features:

- Dr Miranda Mowbray, 'Machine learning for enterprise network security' (00:00)

- Professor Andrew Davison, 'Visual SLAM and scene understanding for robots' (18:45)

- Dr Simon Thomson, 'The promise (and problems and pitfalls) of machine learning in telecoms' (51:45)


Socioeconomic impacts of machine learning at the Royal Society

Uploaded on Jul 21, 2015

On 22 May 2015, the Royal Society launched a new series of high-level conferences on major scientific and technical challenges of the next decade. 'Breakthrough Science and Technologies: Transforming our Future' conferences feature cutting-edge science from industry and academia and bring together leading experts from the wider scientific community, industry, government, funding bodies and charities.

The first conference was on the topic of machine learning and was organised by Dr Hermann Hauser KBE FREng FRS and Dr Robert Ghanea-Hercock.

This video session features:

- Sir Malcolm Grant CBE, 'Potential of machine learning in support of diagnosis and treatment in the era of human genomics' (00:00)

- Professor Nick Bostrom, 'Superintelligence' (18:14)

- Professor Nick Jennings FREng, 'Machine learning and the internet of things' (33:32)

- Panel discussion (45:35)


Royal Society Machine Learning Playlist


IPAM Videos

Machine Learning for Many-Particle Systems, 2015

Graduate Summer School: Computer Vision, 2013



Association for Computing Machinery (ACM)

ACM, the Association for Computing Machinery, is the world's largest educational and scientific computing society with 100,000+ members, and unites computing professionals, educators, and researchers from industry, academia, and government. ACM is dedicated to advancing computing as a science and a profession. ACM inspires dialogue, shares resources and addresses the field's challenges through its programs, publications, and policy initiatives. ACM strengthens the profession's collective voice by promoting the highest standards, supporting members' professional development, and fostering policies and research that benefit society.


Christine Doig

Christine Doig is a Data Scientist at Continuum Analytics. She holds a M.S. in Industrial Engineering from UPC, Barcelona, where she also started a Masters in Innovation and Research in Informatics, Data Mining and BI, before joining Continuum Analytics.

She is interested in Data Science and Python and loves to share her knowledge with others. She has taught tutorials and presented talks on conda, Blaze, Bokeh, scikit-learn and Data Science at PyCon, PyTexas, PyCon Spain, PyData Dallas, ScipyConf and local meetup groups like PyBCN, PyladiesBCN, APUG and ACM SIGKDD. Blogposts, talks, slides and videos can be found on her site


Beginner's Guide to Machine Learning Competitions by Christine Doig at PyTexas

Published on Oct 9, 2015

This tutorial will offer a hands-on introduction to machine learning and the process of applying these concepts in a Kaggle competition. We will introduce attendees to machine learning concepts, examples and flows, while building up their skills to solve an actual problem. At the end of the tutorial attendees will be familiar with a real data science flow: feature preparation, modeling, optimization and validation.

Packages used in the tutorial will include: IPython notebook, scikit-learn, pandas and NLTK. We’ll use IPython notebook for interactive exploration and visualization, in order to gain a basic understanding of what’s in the data. From there, we’ll extract features and train a model using scikit-learn. This will bring us to our first submission. We’ll then learn how to structure the problem for offline evaluation and use scikit-learn’s clean model API to train many models simultaneously and perform feature selection and hyperparameter optimization.

At the end of session, attendees will have time to work on their own to improve their models and make multiple submissions to get to the top of the leaderboard, just like in a real competition. Hopefully attendees will not only leave the tutorial having learned the core data science concepts and flow, but also having had a great time doing it.


NYC Data Science Academy Videos


Computerphile Videos


Videos all about computers and computer stuff. Sister channel of Numberphile.


Rabbits, Faces & Hyperspaces - Computerphile by Robert Miles

Hyperspace was hijacked by science fiction, but what is a space? Robert Miles explains with the use of small red rabbits and human faces.


Machine Learning e Gesture Recognition: nuove frontiere per la musica | Bruno Zamborlin | TEDxPadova

Published on Oct 8, 2015

Gli oggetti della vita di tutti i giorni diventano strumenti musicali convertendo le vibrazioni che emettiamo quando li tocchiamo in suoni. Tutto questo grazie ad un'app per smartphone che trasforma le proprietà acustiche degli oggetti in musica. In questa performance è presente anche il contributo eccezionale dell'Orchestra Sperimentale.

Bruno Zamborlin, 30 anni, è originario di Lonigo (Vicenza) ed ha un sogno: dare il proprio contributo all'evoluzione della musica contemporanea. Mogees, la sua creazione, è un sensore che abbinato a un’ app permette di “suonare gli oggetti”.

Bruno ha iniziato il suo progetto durante gli studi all'IRCAM/Pompidou Centre di Parigi, e alla Goldsmiths University of London, due sedi riconosciute internazionalmente come incubatori d’innovazione e talento.



"An Overview of Probabilistic Programming" by Vikash K. Mansinghka, MIT

Ajoutée le 28 sept. 2015

Probabilistic inference is a widely-used, rigorous approach for processing ambiguous information based on models that are uncertain or incomplete. However, models and inference algorithms can be difficult to specify and implement, let alone design, validate, or optimize. Additionally, inference often appears to be intractable. Probabilistic programming is an emerging field that aims to address these challenges by formalizing modeling and inference using key ideas from probability theory, programming languages, and Turing-universal computation.

This talk will illustrate the common underlying principles of probabilistic programming using three research platforms:

BayesDB, a Bayesian database that enables users to directly query the probable implications of data tables without training in statistics. It provides BQL, an SQL-like language for Bayesian data analysis, and MML, a minimal language for building generative population models by combining automatic model-building techniques with qualitative constraints and custom statistical code. BayesDB has been applied to problems such as cleaning and exploring a public database of Earth satellites and assessing the evidence for microbial biomarkers of Kwashiorkor, a form of severe malnutrition.

Picture, an imperative probabilistic language for 3D scene perception. Picture uses deep neural networks and statistical learning to invert generative models based on computer graphics. 50-line Picture programs can infer 3D models of human poses, faces, and other object classes from single images.

Venture, an integrated platform that aims to be sufficiently expressive, efficient, and extensible for general-purpose use. It provides VentureScript, a language that gives users fine-grained control over both modeling and inference, and defines a common interface for integrating components written in other probabilistic languages. Recent applications include structure discovery from time-series via Gaussian processes and reflective AI techniques such as Bayesian optimization.

Vikash K. Mansinghka


Vikash Mansinghka is a postdoctoral researcher at MIT, where he leads the Probabilistic Computing Project. Vikash holds S.B. degrees in Mathematics and in Computer Science from MIT, as well as an M.Eng. in Computer Science and a PhD in Computation. He also held graduate fellowships from the National Science Foundation and MIT's Lincoln Laboratory. His PhD dissertation on natively probabilistic computation won the MIT George M. Sprowls dissertation award in computer science, and his research on the Picture probabilistic programming language won an award at CVPR. He co-founded a venture-backed startup based on this research that was acquired by and was an advisor to Google DeepMind. He served on DARPA's Information Science and Technology advisory board from 2010-2012, and currently serves on the editorial boards for the Journal of Machine Learning Research and the journal Statistics and Computation.


"How machine learning helps cancer research" by Evelina Gabasova, University of Cambridge

Ajoutée le 26 sept. 2015

Machine learning methods are being applied in many different areas - from analyzing financial stock markets to movie recommender engines. But the same methods can be applied to other areas that also deal with big messy data. In bioinformatics I use similar machine learning, only this time to help find the underlying mechanisms of cancer.

The problems in bioinformatics might seem opaque and confusing - sequencing, DNA, methylation, ChIP-seq, motifs etc. But underneath, the same algorithms that are used to find groups of customers based on their buying behavior can be used to find subtypes of cancer that respond differently to treatments. Algorithms for text analysis can be used to find important patterns in DNA strands. And software verification tools can help analyze biological systems.

In this talk, I'll show you the exciting world of machine learning applications in bioinformatics. No knowledge of biology is required, the talk will be mostly in developer-speak.

Evelina Gabasova



Evelina is a machine learning researcher working in bioinformatics, trying to reverse-engineer cancer at University of Cambridge. Her background is mainly in computer science, statistics and machine learning. Evelina is a big fan of F# and uses it frequently for data manipulation and exploratory analysis in her research. Outside of academia, she also speaks at developer conferences and user groups about using F# for data science. She writes a blog at


"The Gamma: Programming Tools for Data Journalism" by Tomas Petricek, University of Cambridge

Ajoutée le 28 sept. 2015

Computer programming may not be the new literacy, but it is finding its way into many areas of modern society. In this submission, we look at data journalism, which is a discipline combining programming, data analysis and traditional journalism. In short, data journalism turns articles from a mix of text and images into something that is much closer to a computer program.

Most data journalists today use a wide range of tools that involve a number of manual steps. This makes the analysis error prone and hard to reproduce. In this video, we explore the idea of treating a data driven article as an executable program. We look how ideas from programming language research can be used to provide better tools for writing (or programming) such articles, but also to enable novel interactive experience for the reader.

The project also makes data journalism more accountable and reproducible. We let the reader verify how exactly are the visualizations generated, what are the data sources and how are they combined together.

Tomas Petricek


Tomas is a computer scientist, book author and open-source developer. He wrote a popular book called "Real-World Functional Programming" and is a lead developer of several F# open-source libraries, but he also contributed to the design of the F# language as an intern and consultant at Microsoft Research. He is a partner at fsharpWorks ( where he provides trainings and consulting services. Tomas recently submitted his PhD thesis at the University of Cambridge focused on types for understanding context usage in programming languages, but his most recent work also includes two essays that attempt to understand programming through the perspective of philosophy of science.


"How to run Neural Nets on GPUs' by Melanie Warrick

Ajoutée le 27 sept. 2015

This talk is just what the title says. I will demonstrate how to run a neural net on a GPU because neural nets are solving some interesting problems and GPUs are a good tool to use.

Neural networks have regained popularity in the last decade plus because there are real world applications we are finally able to apply them to (e.g. Siri, self-driving​ cars, facial recognition). This is due to significant improvements in computational power and the amount of data that is available for building the models. However, neural nets still have a barrier to entry as a useful tool in companies because they can be computationally expensive to obtain value and implement.

GPUs are popular processors in gaming and research due to their computational speed. Deep Neural Net's parallel structures (millions of identical nodes that perform the same operation on different data), are ideal for GPU's. Depending on the neural net, you can use a single server with GPUs vs. a CPU cluster and improve communication latency as well as reduces size and power consumption. Running an optimization method (training algorithm) like Stochastic Gradient Descent on a CPU vs. a GPU can be up to 40 times faster.

This talk will briefly explain what neural nets are and why they're important, as well as give context about GPUs. Then I will walk through the code and actually launch a neural net on a GPU. I will cover key pitfalls you may hit and techniques to diagnose and troubleshoot. You will walk away understanding how to approach using GPUs on your own and have some resources to dive into for further understanding.

Melanie Warrick



Deep Learning Engineer at Skymind. Previous experience included data science and engineering at and a comprehensive consulting career. I have a passion for working on machine learning problems at scale and AI.


Strange Loop Videos


NASA Earth Exchange (NEX): Big Data Challenges, High-Performance Computing, and ML Innovations by Sangram Ganguly, Senior Research Scientist, NASA

Ajoutée le 1 oct. 2015

BIDS Data Science Lecture Series | September 25, 2015 | 1:00-2:30 p.m. | 190 Doe Library, UC Berkeley

Speaker: Sangram Ganguly, Senior Research Scientist, NASA

Sponsors: Berkeley Institute for Data Science, Data, Society and Inference Seminar

NASA Earth Exchange (NEX) provides a unique collaborative platform for scientists and researchers around the world to do research in a scientifically complex area. NEX provides customized open source tools, scientific workflows, access to petabytes of satellite and climate data, models, and computing power. Over the past three years, NEX has evolved in terms of handling projects that deal with data complexity, model integration, and high-performance computing. Another unique aspect of NEX is its collaboration with Amazon Web Services (AWS) to create the OpenNEX platform, which leverages the full stack of AWS’s cloud computing platform to demonstrate scientifically relevant projects for government agencies, commercial companies, and other stakeholders. OpenNEX provides access to a wide variety of data through AWS’s public datasets program and virtual machines that replicates a certain workflow capturing data access, search, analysis, computation, and visualization. OpenNEX collaborated with Berkeley’s Geospatial Innovation Facility (GIF) to create an open source visualization dashboard for visualizing the downscaled climate projections dataset. A pressing need in both initiatives is how to deal with large image datasets and efficiently analyze these images using high-performance and cloud computing infrastructures. With funding from several NASA program elements (e.g., AIST, ACCESS, CMS), NEX has showcased activities in which new machine learning algorithms can be deployed and scaled across these computer architectures to process very high-resolution imagery datasets for object classification, segmentation, and feature extraction. An example relates to processing quarter million image scenes from the 1-m multispectral NAIP dataset to estimate tree cover for the continental United States given the large complexities and heterogeneity in land cover types. New computational techniques using open source tools and cloud architectures are a must in achieving performance efficiency in some of the heritage scientific research domains and analyses.


Berkeley Institute for Data Science (BIDS)


Why Unsupervised (Deep) Learning is Important, Max Welling - Deep Learning Summit London by Max Wellin, Professor of Computer Science at the University of Amsterdam and the University of California Irvine


Perpetual Learning Machines: Deep Neural Networks with Brain-like On-The-Fly Learning & Forgetting by Andrew Simpson



Building Intelligent Applications with Machine Learning at Scale by Carlos Guestrin (the talk starts at 23:00), University of Washington

Diffusé en direct le 17 sept. 2015

Machine learning (ML) has become the hottest topic in computing. Industries are being disrupted by intelligent applications that use ML at their core. From e-commerce, through movie streaming, to taxis, new companies that rely on ML are displacing old incumbents.

Today, implementing ML-infused applications, especially with real data and at scale, is a slow and complex process, requiring much of the system to be built from scratch, in a bottom-up fashion.

In this talk, we propose and alternative approach that combines high-level machine learning toolkits to easily build intelligent applications. We will scale these methods to huge datasets, even on one machine, by using SFrames, an open-source, out-of-core data frame. We will illustrate these methods through several live demos and implementations, showing that these techniques are accessible to non-ML experts who want to build exciting applications and potentially disrupt new markets.

About the Speaker

Carlos is the CEO of Dato, and the Amazon Professor of Machine Learning in Computer Science and Engineering at the University of Washington. A world-recognized leader in the field of machine learning, Carlos was named one of the “Brilliant 10″ by Popular Science Magazine. He received the IJCAI Computers and Thought Award for his contributions to artificial intelligence, and a Presidential Early Career Award for Scientists and Engineers (PECASE) from President Obama.


Carlos Guestrin Homepage, University of Washington

Amazon Professor of Machine Learning

Associate Professor in Computer Science & Engineering

Adjunct Professor in Statistics Department

Co-director of the MODE Lab with Emily Fox and Ben Taskar


Spark Machine Learning, Paco Nathan


Paco Nathan Homepage



Deep Learning Summer School, Montreal 2015

Deep neural networks that learn to represent data in multiple layers of increasing abstraction have dramatically improved the state-of-the-art for speech recognition, object recognition, object detection, predicting the activity of drug molecules, and many other tasks. Deep learning discovers intricate structure in large datasets by building distributed representations, either via supervised, unsupervised or reinforcement learning.

The Deep Learning Summer School 2015 is aimed at graduate students and industrial engineers and researchers who already have some basic knowledge of machine learning (and possibly but not necessarily of deep learning) and wish to learn more about this rapidly growing field of research.


Fall 2015 Machine Learning by Bert Huang (new lectures posted), Virginia Tech University

I am an assistant professor in the Virginia Tech Department of Computer Science. I investigate machine learning with a special focus on models and data with structure stemming from natural networks. Within this focus, my work addresses open questions on theory, algorithms, and applications.


UofU Data Youtube Channel, School of Computing, University of Utah

Machine Learning (Fall 2015), Visualization, Clustering, etc.


ODSC Boston 2015 - From BigData to Data Science

From BigData to Data Science: Predictive Analytics in a Changing Data Landscape


Gael Varoquaux at ODSC Boston 2015 - Scikit-Learn for Easy Machine Learning

Scikit-learn is a popular machine learning tool. What can it do for you? Why you you want to use it? What can you do with it? Where is it going?In this talk, I will discuss why and how scikit-learn became popular. I will argue that it is successful because of its vision: it fills an important slot in the rich ecosystem of data science. I will demonstrate how scikit-learn makes predictive analysis easy and yet versatile.I will shed some light on our development process: how do we, as a community, ensure the quality and the growth of scikit-learn? What are the new exciting development? What may we expect in the near future?

Presenter Bio:

Gaël Varoquaux is an INRIA faculty researcher working on data science for brain imaging in the Neurospin brain research institute (Paris, France). His research focuses on modeling and mining brain activity in relation to cognition. Years before the NSA, he was hoping to make bleeding-edge data processing available across new fields, and he has been working on a mastermind plan building easy-to-use open-source software in Python. He is a core developer of scikit-learn, joblib, Mayavi and nilearn, a nominated member of the PSF, and often teaches scientific computing with Python using the scipy lecture notes.



Coursera: Machine Learning Specialization, University of Washington

About This Specialization

Extract insights from data, build self-improving applications, and apply algorithms to real-world problems.

This Specialization provides a case-based introduction to the exciting, high-demand field of machine learning. You’ll learn to analyze large and complex datasets, build applications that can make predictions from data, and create systems that adapt and improve over time. In the final Capstone Project, you’ll apply your skills to solve an original, real-world problem through implementation of machine learning algorithms.



Perception Lab Online Experiments, St Andrews University

The Perception Lab

The Perception Lab is run by Dave Perrett. We are based in the University of St Andrews in Fife, Scotland. Our research is funded by the ESRC, BBSRC, S.I.N.A.P.S.E, and the EPSRC.

In the lab, we investigate the many facets of face perception... what makes one person appear more trustworthy and cooperative than another? What is the relationship between health and attractiveness, and which physiological factors influence this relationship? How big do differences between facial characteristics have to be for us to perceive them? These are just a few of the topics we are interested in.


Some Applications of Data Analysis and Machine Learning by Alexey Chervonenkis, Royal Holloway


Intelligent Learning: Similarity Control and Knowledge Transfer by Vladimir Vapnik, Royal Holloway


The Vapnik-Chervonenkis Theory


Commemorative Talks In Memory of Alexey Chervonenkis Chaired by Vladimir Vapnik


Science at Royal Holloway


The Conscious Phenotype by Professor Geraint Rees

Professor Geraint Rees, Institute of Cognitive Neuroscience and the Wellcome Trust Centre for Neuroimaging, University College London, delivered Royal Holloway's keynote psychology lecture on 11 February. This lecture, which is suitable for a non-specialist audience, explores the nature of individual differences in conscious perception and their neural basis, focusing on both structure and function of the human brain


Facial cues to health and attractiveness by Professor David Perrett

Learn what makes a person attractive to others! Professor David Perrett, University of St Andrews, presents this Psychology keynote lecture explaining how our face provides a health certificate for others to read.



Fall 2015 Machine Learning Videos by Bert Huang, Virginia Tech Department of Computer Science


Bert Huang Homepage

About Me

I am an assistant professor in the Virginia Tech Department of Computer Science. I investigate machine learning with a special focus on models and data with structure stemming from natural networks. Within this focus, my work addresses open questions on theory, algorithms, and applications.



Brains, Minds & Machines Summer School 2015


Center for Brains, Minds and Machines (CBMM) Youtube Channel


The Center for Minds, Brains and Machines (CBMM) supported by the National Science Foundation (NSF), under a Science and Technology Centers (STCs): Integrative Partnerships award, Grant No. CCF-1231216.


Prof. Surya Ganguli - The statistical physics of deep learning

The statistical physics of deep learning: on infant category learninig, dynamic criticality, random landscapes, and the reversal of time


Prof. Lorenzo Rosasco (part 1) - Machine Learning: A basic toolkit


Prof. Lorenzo Rosasco (part 2) - Machine Learning: A basic toolkit


Prof. Lorenzo Rosasco (part 3) - Machine Learning: A basic toolkit



XLDBConf 2015

Extremely Large Databases (XLDB) conference focuses on practical issues related to extremely large (petascale) databases.

XLDB2015: On the Practice of Predictive Modeling with Big Data

XLDB2015: Statistical Tools and Machine Learning (Pannel Discussion)

XLDB2015: Accelerating Deep Learning at Facebook


JuliaCon 2015

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, largely written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing. In addition, the Julia developer community is contributing a number of external packages through Julia's built-in package manager at a rapid pace.



Julia Cheat-sheet by Steven G. Johnson, MIT


Steven G. Johnson Homepage, Teaching, Exercises & Solutions (18.335: Introduction to Numerical Methods (Fall 2008, 2009, 2010, 2011, 2012, Spring 2015))

Research Interests

The influence of complex geometries (particularly in the nanoscale) on solutions of partial differential equations, especially for wave phenomena and electromagnetism — analytical theory, numerics, and design of devices and phenomena. (See, for example, photonic crystals.)

High-performance computation, such as fast Fourier transforms, solvers for numerical electromagnetism, and large-scale optimization.



Sahand Neghaban: Individualized rank aggregation using nuclear norm regularization

In recent years rank aggregation has received significant attention from the machine learning community. The goal of such a problem is to combine the (partially revealed) preferences over objects of a large population into a single, relatively consistent ordering of those objects. However, in many cases, we might not want a single ranking and instead opt for individual rankings. We study a version of the problem known as collaborative ranking. In this problem we assume that individual users provide us with pairwise preferences (for example purchasing one item over another). From those preferences we wish to obtain rankings on items that the users have not had an opportunity to explore. The results here have a very interesting connection to the standard matrix completion problem. We provide a theoretical justification for a nuclear norm regularized optimization procedure, and provide high-dimensional scaling results that show how the error in estimating user preferences behaves as the number of observations increase.

Recording during the "Meeting in mathematical statistics: new procedures for new data" the December 16, 2014 at the Centre International de Rencontres Mathématiques (Marseille, France)


Sahand Neghaban Homepage

I am currently an Assistant Professor in the Statistics Department at Yale University. Prior to that I worked with Prof. Devavrat Shah at MIT as a postdoc and Prof. Martin J. Wainwright at UC Berkeley as a graduate student.

Research Focus

The focus of my research is to develop theoretically sound methods, which are both computationally and statistically efficient, for extracting information from large datasets. A salient feature of my work has been to understand how hidden low-complexity struc- ture in large datasets can be used to develop computationally and statistically efficient methods for extracting meaningful information for high-dimensional estimation problems. My work borrows from and improves upon tools of statistical signal processing, machine learning, probability and convex optimization.


A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers by Sahand N. Negahban, Pradeep Ravikumar, Martin J. Wainwright and Bin Yu

Abstract. High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the

sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various

types of low-dimensional structure, including sparse vectors, sparse and structured matrices, low-rank matrices and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. This paper provides a unified framework for establishing consistency and convergence rates for such regularized M-estimators under highdimensional scaling. We state one main theorem and show how it can be used to re-derive some existing results, and also to obtain a number of new results on consistency and convergence rates, in both ℓ2-error and related norms. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure corresponding regularized M-estimators have fast convergence rates and which are optimal in many well-studied cases.


CIRM's Audiovisual Mathematics Library

Find this video and other talks given by worldwide mathematicians on CIRM's Audiovisual Mathematics Librar

And discover all its functionalities:

- Chapter markers and keywords to watch the parts of your choice in the video

- Videos enriched with abstracts, bibliographies, Mathematics Subject Classification

- Multi-criteria search by author, title, tags, mathematical area



Artificial Intelligence in Korea by Jiwon Kim (Huge list of ML resources both in Korean & English, Thanks!)


PyCon Australia 2015, 99 Videos (uploaded yesterday)

PyCon Australia is the national conference for users of the Python programming language


Sebastian Raschka Post on Kaggle: Neural Network in vanilla Python/NumPy for Digit Recognizer

Hi, if someone is interested, I wanted to post my solution here. It's a Multi-layer Perceptron (feedforward Nnet with backprop, minibatch learning, adaptive learning rate with momentum, and regularization). I only got ~95%, but it was also the first pass with some naive parameter choices -- more like an experiment ;)

I also attached the IPython notebook if you'd like to run it. I thought it's probably better to post it here in the forum rather than in the Script section since I can add additional comments and images.



EuroPython Conference 2015 Videos (Just posted online)


The official YouTube Channel of the EuroPython conferences. Current edition: EuroPython 2015.

EuroPython is the official European conference for the Python programming language.


Valerio Maggio - Machine Learning Under Test

One point usually underestimated or omitted when dealing with machine learning algorithms is how to write *good quality* code. The obvious way to face this issue is to apply automated testing, which aims at implementing (likely) less-buggy and higher quality


However, testing machine learning code introduces additional concerns that has to be considered. On the one hand, some constraints are imposed by the domain, and the risks intrinsically related to machine learning methods, such as handling unstable data, or avoid under/overfitting. On the other hand, testing scientific code requires additional testing tools (e.g., `numpy.testing`), specifically suited to handle numerical data.

In this talk, some of the most famous machine learning techniques will be discudded and analysed from the `testing` point of view, emphasizing that testing would also allow for a better understanding of how the whole learning model works under the hood.

The talk is intended for an *intermediate* audience. The content of the talk is intended to be mostly practical, and code oriented. Thus a good proficiency with the Python language is **required**. Conversely, **no prior knowledge** about testing nor Machine Learning algorithms is necessary to attend this talk.


The leaking battery, A privacy analysis of the HTML5 Battery Status API by Lukasz Olejnik, Gunes Acar, Claude Castelluccia, and Claudia Diaz

Abstract. We highlight the privacy risks associated with the HTML5 Battery Status

API. We put special focus on its implementation in the Firefox browser. Our study shows

that websites can discover the capacity of users’ batteries by exploiting the high precision

readouts provided by Firefox on Linux. The capacity of the battery, as well as its level,

expose a fingerprintable surface that can be used to track web users in short time intervals.

Our analysis shows that the risk is much higher for old or used batteries with reduced

capacities, as the battery capacity may potentially serve as a tracking identifier. The

fingerprintable surface of the API could be drastically reduced without any loss in the API’s

functionality by reducing the precision of the readings. We propose minor modifications to

Battery Status API and its implementation in the Firefox browser to address the privacy

issues presented in the study. Our bug report for Firefox was accepted and a fix is deployed.




Account for Deep Learning related news, papers, software, reading materials and also other machine learning related news and facts


2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), Edmonton 2015

Over the last few decades, reinforcement learning and decision making have been the focus of an incredible wealth of research spanning a wide variety of fields including psychology, artificial intelligence, machine learning, operations research, control theory, animal and human neuroscience, economics and ethology. Key to many developments in the field has been interdisciplinary sharing of ideas and findings, yet there has been no single conference that brings all these communities together. The idea of RLDM is to become that conference.

The focus of the new meeting can be broadly construed as “decision making over time to achieve a goal”. Our aim is to create a recurring meeting characterized by the multidisciplinarity of the presenters and attendees, with cross-disciplinary conversations and teaching and learning being central objectives along with the dissemination of novel theoretical and experimental results.


Machine Learning at Quora by Xavier Amatriain

At Quora our mission is to share and grow the world’s knowledge. In order to accomplish this we need to build a complex ecosystem where we value issues such as content quality, engagement, demand, interests, or reputation. On the other hand, the ecosystem itself, generates lots of very good quality data on which to build machine learning solutions that can help address all of our requirements. In this talk I will describe uses of machine learning at Quora that range from different recommendation approaches such as personalized ranking to classifiers built to detect duplicate questions or spam. I will describe some of the modeling and feature engineering approaches that go into building these systems. I will also share some of the challenges faced when building such a large-scale knowledge base of human-generated knowledge.


Scaling Instagram with Mike Krieger

In the three years since Instagram joined forces with Facebook some things have stayed the same and some things have changed including rapid development of Instagram's infrastructure. In this talk Mike Krieger, co-founder of Instagram will share what he's learned on this journey. He’ll also showcase some upcoming developments in machine learning and recommendation algorithms that power Instagram's search and discovery tools.


Innovation at Netflix by Carlos Gomez-Uribe, VP of Product Innovation at Netflix

Carlos Gomez-Uribe took the stage at OpenAir to describe the innovation process they use at Netflix to improve our product, particularly the recommendation and search algorithms that help our members find movies and TV shows to watch. Netflix's recommendation and search algorithms are key to retaining our members, so improving them has been a top priority for over a decade. Their innovation process, at its core, is an application of the scientific method, relying heavily on experimentation, very large data sets, carefully chosen metrics, and advanced statistical and mathematical models. Carlos discusses the challenges and limitations of our innovation process, and the remarkable evolution in our recommendation and search algorithms it has enabled.


Machine Learning as the Key to Personalized Curation - Kamelia Aryafar, Data Scientist, Etsy

Personalization is becoming increasingly more important in creating a curated e-commerce experience for consumers. Machine learning methods are the cornerstone of recommendation engines that are used to personalize a browsing and shopping experience in an online market place. Etsy is an online market place for artisans selling unique handcrafted good, and vintage wares that couldn’t be found elsewhere. Kamelia has spent the last two years building recommendations in Etsy. She’ll discuss the machine learning models behind user, shop and listing recommendations that create a personalized experience for Etsy users.


Much more on Airbnb Youtube Channel


Eyeo Festival 2015 Talk, Workshop, etc.

Since its inception in 2011, the team behind the Eyeo Festival has been inspired by the notion that this decade presents an exceptionally exciting time to be interested in art, interaction, and information. The way we experience all three is changing. The way all three interact and overlap is quickly evolving. Easier access to powerful tools and technologies continues to increase. What data is, where it comes from, and how we utilize it, looks different than ever before.

What can we do with it all? What can’t we do? Artists, designers and coders build and bend technology to see what’s possible. What’s next with interaction, what’s revealed by the data. Eyeo brings together the most intriguing and exciting people in these arenas today.


Chris Sugrue

What stories can be created with light, code and a bit of playfulness? The focus of my work over the past years has been on playing with light, movement, and clever bits of code to create unique poetic experiences. From bugs that crawl out of the computer screen, to uncanny hand distortions, to animated optical illusions, we will journey though these endeavors, the tools used in their creation, and some of my inspirations.

Chris Sugrue is an artist, designer and programmer. She develops creative digital works including interactive installations, audiovisual performances and algorithmic animations.


Amanda Cox

One of the first rules of journalism is: don’t make anything up. Using examples from the New York Times graphics department, I want to claim that, for some types of data problems, it’s better if we do.



Simply Statistics Blog

We are three biostatistics professors (Jeff Leek, Roger Peng, and Rafa Irizarry) who are fired up about the new era where data are abundant and statisticians are scientists. The views represented here are our own and do not represent the views of Johns Hopkins University, Harvard University or Dana Farber Cancer Institute.

About this blog: We’ll be posting ideas we find interesting, contributing to discussion of science/popular writing, linking to articles that inspire us, and sharing advice with up-and-coming statisticians.

Why “Simply Statistics”: We needed a title. Plus, we like the idea of using simple statistics to solve real, important problems. We aren’t fans of unnecessary complication -- that just leads to lies, damn lies and something else.


Model-based Machine Learning by John Winn and Christopher Bishop with Thomas Diete (Early access)

This is an early access version of the book, so that we can get feedback on the book as we write it. The completed book will have an additional four chapters, along with source code for every chapter. Please do send feedback, since it will help us to shape and improve the book!

How can machine learning solve my problem?

During the last few years, the field of machine learning has moved to centre stage in the world of technology. Today thousands of scientists and engineers are applying machine learning to an extraordinarily broad range of domains. However, making effective use of machine learning in practice can be daunting, especially for newcomers to the field. Here are some of the principal challenges encountered when trying to solve real-world problems using machine learning:

“I am overwhelmed by the choice of machine learning methods and techniques. There’s too much to learn!”

“I don’t know which algorithm to use or why one would be better than another for my problem.”

“My problem doesn’t seem to fit with any standard algorithm.”

Machine learning can seem daunting to newcomers.

In this book we look at machine learning from a fresh perspective which we call model-based machine learning. This viewpoint helps to address all of these challenges, and makes the process of creating effective machine learning solutions much more systematic. It is applicable to the full spectrum of machine learning techniques and application domains, and will help guide you towards building successful machine learning solutions without requiring that you master the huge literature on machine learning.


CS224d: Deep Learning for Natural Language Processing Lecture notes and Videos by Richard Socher, Stanford University, Spring 2015

Natural language processing (NLP) is one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate most everything in language: web search, advertisement, emails, customer service, language translation, radiology reports, etc. There are a large variety of underlying tasks and machine learning models powering NLP applications. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. These models can often be trained with a single end-to-end model and do not require traditional, task-specific feature engineering. In this spring quarter course students will learn to implement, train, debug, visualize and invent their own neural network models. The course provides a deep excursion into cutting-edge research in deep learning applied to NLP. The final project will involve training a complex recurrent neural network and applying it to a large scale NLP problem. On the model side we will cover word vector representations, window-based neural networks, recurrent neural networks, long-short-term-memory models, recursive neural networks, convolutional neural networks as well as some very novel models involving a memory component. Through lectures and programming assignments students will learn the necessary engineering tricks for making neural networks work on practical problems.

Previous Years Project Reports



Statistical Machine Learning Videos, Assignments & Solutions by Ryan Tibshirani and Larry Wasserman, CMU, Spring 2015

Statistical Machine Learning is a second graduate level course in advanced machine learning, assuming students have taken Machine Learning (10-715) and Intermediate Statistics (36-705). The term “statistical” in the title reflects the emphasis on statistical theory and methodology.

The course combines methodology with theoretical foundations and computational aspects. It treats both the “art” of designing good learning algorithms and the “science” of analyzing an algorithm’s statistical properties and performance guarantees. Theorems are presented together

with practical aspects of methodology and intuition to help students develop tools for selecting appropriate methods and approaches to problems in their own research.

The course includes topics in statistical theory that are important for researchers in machine learning, including nonparametric theory, consistency, minimax estimation, and concentration of measure.


A Visual Introduction to Machine Learning by Stephanie Yee and Tony Chu (Great visualization! Congratulations!)

Stephanie interprets R2

Stephanie is a business person with statistical inclinations. She is finishing up a masters degree at Stanford in statistics and was the first business hire at Sift Science. Despite time at Google, her benchmarks for work-life balance are strategy consulting and private equity.

Tony visualizes with D3

Tony is a designer who loves data visualizations and information design. He is currently the lead designer at Sift Science. Prior to Sift Science, he tried to change congress with a fancy infographic, and earned an MFA in Interaction Design at the School of Visual Arts in New York City.


Tony Chu's Master Thesis



New PDF and ePub Starter Kit! (more than 300 pages)

I'm still working on the Starter Kit but most links should work fine.. I hope you will enjoy the Starter Kit and I'm looking forward to your feedback.




Kristen Grauman's Homepage (Publications, Code, Student theses, assignments with solutions, etc.)

I am an Associate Professor in the Department of Computer Science at the University of Texas at Austin, where I lead the UT-Austin Computer Vision Group.  I received my Ph.D. from MIT in the Computer Science and Artificial Intelligence Laboratory in 2006.

My research interests are in computer vision and machine learning.  In general, the goal of computer vision is to develop the algorithms and representations that will allow a computer to autonomously analyze visual information.   I am especially interested in learning and recognizing visual object categories, and scalable methods for content-based retrieval and visual search.

Large amounts of interconnected visual data (images, videos) are readily available---but we don’t yet have the tools to easily access and analyze them.  My group’s research aims to remove this disparity, and transform how we retrieve and evaluate visual information.  This requires robust methods to recognize objects, actions, and scenes, and to automatically organize and search images and videos based on their content.  Key research issues that we are exploring are scalable search for meaningful similarity metrics, unsupervised visual discovery, and cooperative learning between machine and human vision systems.


SFrame and SGraph: Scalable External Memory Data Frame and Graph Structures for Machine Learning by Jay Gu

BIDS Guest Lecture | July 22, 2015 | 1:30-3:00 p.m. | 190 Doe Library, UC Berkeley

Speaker: Jay Gu, Co-Founder and Software Engineer, Dato

A good machine learning platform requires not just robust implementations of statistical models and algorithms but also the right data structures for efficient and scalable feature engineering and data cleaning. In this talk, we discuss SFrame and SGraph, two scalable data structures designed with machine learning tasks in mind. These external memory structures make efficient use of disks and utilize a whole bag of tricks for speed. On a single machine, SFrame supports real-time interactive query on terabytes of data. When used in a distributed setting, SGraph supports iterative graph analytics tasks at unparalleled speed. On a graph with 100 billion edges, SGraph computes Pagerank at 30secs/iter with only 16 EC2 machines. We walk through the architectural design and discuss tricks for scale and speed. SFrame and SGraph are the backbone of a new Python machine learning platform called GraphLab Create. Both are available for download as open source projects or as part of the GraphLab Create binary.


Official YouTube account from the Centre for Intelligent Sensing, Queen Mary University of London


Centre for Intelligent Sensing, Queen Mary University of London

The Centre for Intelligent Sensing is a focal point for research in Intelligent Sensing at Queen Mary University of London. The Centre focuses on breakthrough innovations in computational intelligence that are expected to have a major impact in transforming human and machine utilisation of multiple sensor inputs for interpretation and decision making.

The Centre facilitates sharing of resources, exchange of ideas and results among researchers in the areas of theory and application of signal acquisition, processing, communication, abstraction, control and visualisation.

The expertise in the Centre includes camera and sensor networks, image and signal processing, computer vision, pattern recognition and learning, coding, 3D imaging, reconstruction and rendering, 3D graphics, bio-inspired computing, human-computer interaction, face and gesture recognition, affective computing and social signal processing, and data mining.

The centre provides post-graduate research and teaching in Intelligent Sensing, and is responsible for the MSc programme in Computer Vision.


IETF 93 - Thursday Lunch Presentation - Recent Advances in Machine Learning by David Meyer (Video starts at 6:15)

Topic: Recent Advances in Machine Learning and Their Application to Networking

Speaker:David Meyer, CTO and Chief Scientist at Brocade Communications

The recent progress of Machine Learning and Deep Learning in particular has been nothing short of spectacular. Machine learning applications now span wide variety of application spaces including perceptural tasks such as image search, object and scene recognition and captioning, voice and natural language recognition and generation, self-driving cars and automated assistants such as Siri, as well as various engineering, financial, medical and scientific applications.

More generally, cutting-edge startups, established technology companies and Universities are increasingly finding new, novel, and exciting ways to apply powerful machine learning tools such as neural networks to new and existing problems in many different industries.

The network domain, however, has been virtually untouched by all of this activity. This talk will outline recent advanced in Machine Learning with an eye towards network applications. In addition, we outlines a few "Machine Learning for DevOps" applications which are focused on next generation network automation.


Identifying the needle in the Internet of Things haystack | Richard Skeggs | TEDxUniversityofEssex

This talk highlights why the Internet should be viewed as a utility for all. The advent of Internet of Things (IoT) is slowly opening up a can of worms. How can we best process the sheer volume of data that will be generated with millions of devices connected to the internet generating data. In a highly connected world is the computer itself best placed to analyze the vast quantity of machine generated data. Along with info-graphics and data science, machine learning has the potential to identify the hidden needle in the IoT haystack. So what can machine learning bring to the table and how can it be utilized to monetize the potential of IoT?

Richard Skeggs is ‎ESRC Business Data Development Manager



Jake VanderPlas Homepage


After completing my undergraduate studies in Physics at Calvin College, I took a few years off before returning to academia. In those years I lived in Sendai, Japan, where I taught English at a non-profit student center, then returned to the US and spent two years as an outdoor educator with Mount Hermon Outdoor Science School (just outside Santa Cruz, CA) and Summit Adventure (just outside Yosemite). Those two years working and teaching outdoors among the Redwoods, Sequoias, granite peaks, and dark skies rekindled my interest in science in general, and Astronomy in particular. I began my doctoral studies at the UW in 2006. When not working on my research, I enjoy growing food in my garden, training and competing in local triathlons, and playing bluegrass mandolin.

Research Interests

Machine Learning/Open Source Software

In 2008 I began exploring a nonlinear dimensionality reduction technique called Locally Linear Embedding (LLE). LLE has been broadly studied in various fields relating to computer perception, and we showed that it is useful in processing galaxy spectra. In particular, the dimensionality reduction is sensitive to nonlinear effects that are lost by more familiar techniques such as PCA. I ended up submitting a version of this code to the open source [scikit-learn]( project, and have remained involved. I've since invested a lot of energy into other open source Python projects in the realm of machine learning, data mining, and visualization, and have co-authored a textbook on these subjects, focusing on applications to astronomy and astrophysics.


Jake VanderPlas Videos


PyData Seattle 2015 Scikit-Learn Tutorial by Jake VanderPlas

The following links are to notebooks containing the tutorial materials. Note that many of these require files that are in the directory structure of the github repository in which they are contained. There is not time during the tutorial to cover all of this material, but I left it in in case attendees would like to go deeper on their own.

1. Preliminaries


2. Introduction to Machine Learning with Scikit-Learn



3. Supervised Learning In-Depth



4. Unsupervised Learning In-Depth




5. Model Validation In-Depth



CERN Open Data Portal



International Society for Bayesian Analysis (ISBA)

Welcome to ISBA!

The International Society for Bayesian Analysis (ISBA)  was founded in 1992 to promote the development and application of Bayesian analysis useful in the solution of theoretical and applied problems in science, industry and government. By sponsoring and organizing meetings, publishing the electronic journal of Bayesian statistics Bayesian Analysis, and other activities ISBA provides a focal point for those interested in Bayesian analysis and its applications.  For an interesting historical perspective, Arnold Zellner recaps the beginnings of ISBA.



Emily B. Fox Lectures, University of Washington

Amazon Professor of Machine Learning

Assistant Professor in Statistics

Adjunct Assistant Professor in Computer Science & Engineering

Adjunct Assistant Professor in Electrical Engineering

Data Science Fellow of the eScience Institute

Co-director of the MODE Lab with Carlos Guestrin and Jeff Bilmes


Bayesian Dynamic Modeling: Sharing Information Across Time and Space by Emily B. Fox

This talk will highlight some of the benefits and challenges associated with harnessing the temporal structure present in many datasets. The focus is on Bayesian dynamic modeling approaches, and in particular, the idea of sharing information across time and "space," where space generically refers to the dimensions of the time series. Emily Fox, UW Assistant Professor of Statistics, discusses how to exploit nonparametric and hierarchical models to capture repeated patterns in time and similar structure in space, enabling the modeling of complex and high-dimensional time series. Applications of such approaches are quite diverse, and she demonstrate this by touching upon work in the tasks of speaker diarization, analyzing human motion, detecting changes in volatility of stock indices, parsing EEG, word classification from MEG, and predicting rates of violent crimes in DC and influenza rates in the US.


Emily B. Fox Lecture at the IHME seminar - October 9, 2013

The University of Washington’s Amazon Assistant Professor of Machine Learning Emily Fox speaks in a seminar titled “Bayesian dynamic modeling: sharing information across time and space” at IHME on October 9, 2013.


Emily Fox - Gaussian Processes on the Brain


In this talk, we focus on a set of modeling challenges associated with Magnetoencephalography (MEG) recordings of brain activity: (i) the time series are high dimensional with long-range dependencies, (ii) the recordings are extremely noisy, and (iii) gathering multiple trials for a given stimulus is costly. Our goal then is to harness shared structure both within and between trials. Correlations between sensors arise based on spatial proximity, but also from coactivation patterns of brain activity that change with time. Capturing this unknown and changing correlation structure is crucial in effectively sharing information.

Motivated by the structure of our high-dimensional time series, we propose a Bayesian nonparametric dynamic latent factor model based on a sparse combination of Gaussian processes (GPs). Key to the formulation is a time-varying mapping from the lower dimensional embedding of the dynamics to the full observation space, thus capturing time-varying correlations between sensors.

Finally, we turn to another challenge: in addition to long-range dependencies, there are abrupt changes in the MEG signal. We propose a multiresolution GP that hierarchically couples GPs over a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes in the time series. The inherent conjugacy of the GPs allows for efficient inference of the hierarchical partition, for which we employ graph-theoretic techniques.


UWTV, University of Washington


ICML 2015 Tutorial Slides

ICML 2015 will present 6 invited tutorials.  Note that these tutorials will take place in 3 sessions (1 in the morning and 2 in the afternoon).  During each session, 2 tutorials will be running in parallel.


BBC Click Youtube Channel + Iplayer (55 episodes, only available in the UK I guess)

Click visits Silicon Valley, the heart of tech innovation, to see Google's new AI app and meet the startups trying to compete. Plus VR's scariest game.


NASA Visible Earth Dataset

A catalog of NASA images and animations of our home planet



MIT Review Technology: Google's Deep Learning Machine Learns to Synthesize Real World Images

Give Google’s DeepStereo algorithm two images of a scene and it will synthesize a third image from a different point of view.


The Next Web (TNB), Machine Learning publications

Founded in 2006, The Next Web is one of the world’s largest online publications that delivers an international perspective on the latest news about Internet technology, business and culture. With an active, influential audience consisting of more than 7.2 million monthly visits and 9.5 million monthly page views, The Next Web continues to expand its global presence on its website with the addition of new channels and content partnerships, as well as through events in North America and Europe.


Enthought's Videos: SciPy 2015: Scientific Computing with Python Conference

SciPy 2015, the fourteenth annual Scientific Computing with Python conference, was held July 6-12, 2015 in Austin, Texas. SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.



Kaggle's Academic Machine Learning Competitions

Kaggle hosts free projects for hundreds of universities around the globe. Engage students with an oppurtunity to apply machine learning to real problems.


2015 Hadoop Summit Videos



Human learning vs. Machine learning | Josep Marc Mingot | TEDxYouth@Barcelona (in Spanish)

Machines are now learning many different things very quickly; from self-driving cars to automatic medical diagnosis. The unstoppable progress has more and more ambitious goals. Machines seem to be getting closer to human intelligence. Will a machine ever substitute us in our job? What are the intrinsic human capabilities that (for now) make us more valuable?

Josep Marc is the co-founder of Arcvi, an analytic and strategic consulting firm focused on Big Data and Analytics projects. He is also co-founder of BCNAnalytics, a non-profit group of like-minded individuals that aim to turn Barcelona into a European hub for analytics and the co-founder of, an online community around investment and innovation in the banking industry.

Josep Marc worked at MIT (Boston, USA) as a research assistant in the Computer Vision Lab at the Computer Science and Artificial Intelligence Laboratory. He was also a member of the MIT Global Startup Initiative in Colombia, working as an organizer and teacher, whose aim is to promote entrepreneurship in developing countries.

Josep Marc has a Master’s in both Mathematics and Telecommunications Engineering from the UPC with the CFIS program. During his studies, he also co-funded Academia Bruc and worked as a teacher’s assistant at one of the world’s most prestigious business schools, ESADE.


Download Terabyte Click Logs by Criteo Labs


PURDUE Machine Learning Summer School 2011

Founded in 1869, Purdue is Indiana's land-grant university. It is one of the nation's premier institutions with more than 200 areas of undergraduate study and renowned research initiatives. Purdue's programs in a wide variety of undergraduate and graduate disciplines consistently rank among the best in the country. Twenty-two of America's astronauts hold Purdue degrees. Students from all 50 states and more than 130 countries, bring rich diversity to the main campus in West Lafayette. Although a large university, Purdue maintains an intimate atmosphere that highly values individual needs and achievements.


MarI/O Machine Learning for Video Games SethBling


MarI O Followup Super Mario Bros, Donut Plains 4, and Yoshi's Island 1 SethBling



An Infinite Restricted Boltzmann Machine by Marc-Alexandre Côté, Hugo Larochelle

We present a mathematical construction for the restricted Boltzmann machine (RBM) that doesn't require specifying the number of hidden units. In fact, the hidden layer size is adaptive and can grow during training. This is obtained by first extending the RBM to be sensitive to the ordering of its hidden units. Then, thanks to a carefully chosen definition of the energy function, we show that the limit of infinitely many hidden units is well defined. As with RBM, approximate maximum likelihood training can be performed, resulting in an algorithm that naturally and adaptively adds trained hidden units during learning. We empirically study the behaviour of this infinite RBM, showing that its performance is competitive to that of the RBM, while not requiring the tuning of a hidden layer size.


The Fourth Paradigm: Data-Intensive Scientific Discovery by Microsoft Research, 2009

Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.

The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud computing technologies.

In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized. Flu Trends

How does this work?

We've found that certain search terms are good indicators of flu activity. Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time.


Tech entrepreneurs are using innovation to tackle some of the world’s biggest challenges. We invest in teams with bold ideas that create lasting global impact.


Information Theory, Pattern Recognition and Neural Networks Videos by David McKay, 2012 (I missed them)



Model-Based Machine Learning by John Winn and Christopher Bishop with Thomas Diethe

This is an early access version of the book, so that we can get feedback on the book as we write it. The completed book will have an additional four chapters, along with source code for every chapter. Please do send feedback, since it will help us to shape and improve the book!

What is model-based machine learning?

Over the last five decades, researchers have created literally thousands of machine learning algorithms. Traditionally an engineer wanting to solve a problem using machine learning must choose one or more of these algorithms to try, or otherwise try to invent a new one. In practice, their choice of algorithm may be constrained by those algorithms they happen to be familiar with, or by the availability of specific software, and may not be the best choice for their problem.



Report: Learning from Machine Intelligence: The next wave of Digital Transformation by Orange Silicon Valley

It’s a very exciting time for technology. In just the past 5 years we’ve surrounded ourselves with connected mobile devices, moved our businesses to the cloud, and watched as every aspect of the content industry has been radically disrupted. Now, just as we’re starting to accept all these changes, another big shift is getting underway. Software is starting to become intelligent and autonomous.

Research into deep learning, statistical modeling, and neural networks has been advancing in the halls of academia for decades but suddenly there’s a Cambrian explosion of solutions impacting every sector. These tools finally have enough computation and enough data to really perform – and a large amount of talent and capital is pouring into the fast-moving sector...


Report: Who cares ?! Transforming the connection with customers by Orange Silicon Valley


Orange Silicon Valley Publications

These stories are about a changing world where legacy systems and incumbent organizations are continuously challenged by a new wave of innovators and entrepreneurs. At Orange Silicon Valley, we actively participate in the ongoing dialog between technology and humanity that is transforming the way we live, work, play, and communicate.



Introducing Letters

At Medium, we are building a new kind of media product — one centered around conversation and reaction, one where audience engagement is not an afterthought, but something fundamentally new and participatory at its core. The benefits of writing on Medium — into a thriving network, being connected to many others — have always been a big part of what we’re all about. But we’ve been missing an important ingredient, which we’re adding today: continuity.

We’re very excited to announce the launch of a new feature for Publications. We’re calling it Letters. We think it’s valuable, especially in today’s world of endless information streams, to remember the act of letter writing as an intimate form of communication.



Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video by Lionel Pigou, Aäron van den Oord, Sander Dieleman, Mieke Van Herreweghe, Joni Dambre

Gesture Recognition using Convolutional Neural Networks, Ghent University by Lionel Pigou


Tech Notes: A Technical Blog by Dave Miller


AAAS, The American Association for the Advancement of Science

About AAAS

The American Association for the Advancement of Science is an international non-profit organization dedicated to advancing science for the benefit of all people.

AAAS Mission

The AAAS seeks to "advance science, engineering, and innovation throughout the world for the benefit of all people." To fulfill this mission, the AAAS Board has set the following broad goals:

  • Enhance communication among scientists, engineers, and the public;
  • Promote and defend the integrity of science and its use;
  • Strengthen support for the science and technology enterprise;
  • Provide a voice for science on societal issues;
  • Promote the responsible use of science in public policy;
  • Strengthen and diversify the science and technology workforce;
  • Foster education in science and technology for everyone;
  • Increase public engagement with science and technology; and
  • Advance international cooperation in science.

2015 Annual Meeting Videos


France-Culture: FaceBook, Intel, des centres de recherche en France, et après ? le numérique sauvera-t-elle l'économie française ? (For those who want to listen to some French, and get some insight on the French point of view about digital economy...)

La révolution technologie en cours bouscule à vitesse grand V notre économie, nos pratiques et nos maigres certitudes. Alors que le gouvernement peaufine sa stratégie numérique et rédige un projet de loi qui sera le grand chantier de la rentrée, à l’heure où l’Union européenne dessine les frontières de son grand marché unique du numérique, lobbys et entrepreneurs du numérique nous vendent leur secteur comme un remède à la crise, tandis que des économistes relativisent ses vertus dans une France en mal d’emplois.

Le numérique fera-t-il vraiment reculer le chômage ?

Cette semaine, double annonce en provenance du secteur avec création d’emplois à la clé : l’implantation d’un laboratoire Facebook sur l’intelligence artificielle à Paris et celle d’un centre de recherche Intel sur le Big Data. Cocorico ? Après le temps des réjouissances, celui des doutes : pourquoi avoir choisi la France ? Faut-il craindre la fuite de nos chercheurs ou la mise au service des géants américains du Net de nos cerveaux ?


The Platform

The Platform is a new publication, which formally launched February 23, 2015, in partnership with The Register. It will offer in-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds.


SiliconANGLE Blog

About SiliconANGLE was founded by Silicon Valley entrepreneur John Furrier to provide news, analysis on the technology industry with a focus innovation, technology, emerging companies, enterprise, cloud, mobile, social, startups, and venture capital

Our focus since 2009 has been “where computer science intersects social science.”

SiliconANGLE, founded in  2009, offers real time news with a fresh optimistic perspective – “the Angles” – that can drive innovation and invention.  The mandate of SiliconANGLE is to promote the highest quality and fastest news, analysis, information coverage on innovation, entrepreneurship, the tech athletes, and latest inventions.


Wall Street Journal Blog


"The Machine Leaning Salon Meetup" Project

I am a co-organiser of the London Machine Learning Meetup. I've so far organised two meetups in London. The last one was hosted by UCL, thanks to Professor David Barber, and Yoshua Bengio was the Guest (Many thanks to him for coming). I don't earn any money from these events, and the event is free of charge for the attendance.

I'd like to create a new Meetup that will organise one or two events per year, and invite highly recognized professors or researchers well-known for their great achievements in Machine Learning. As London is full of great iconic places such as the Royal Institution (the Faraday theatre is amazing!), and as the "Salon" of my website is referring to 17th to 19th century French Salons, that could be a nice alliance. I won't earn any money from it, and I want the event free of charge for the attendance.

I'm looking for sponsors who are willing to give funds for booking the theatre and the catering. In exchange, their names will appeared on the event screensaver, the Meetup website and the event video.

Please, contact me at if you're interested.


PS: I'd like to do the same in New York and Seoul where my sons are living. Sponsors for New York and Seoul are welcome too.




Notebook Gallery by Fabian ?  (Great resources, not to be missed!)

Links to the best IPython and Jupyter Notebooks.

What is this website ?

This website is a collection of links to IPython/Jupyter notebooks. Contrary to other galleries (such as the one on nbviewer and the wakari gallery), this collection is continuously updated with notebooks submitted by users. It also uses the twitter API to fetch new notebooks daily. Please note that this website does not contain nor host any notebooks, only offers links to relevant notebooks.

Why did you make this website ?

Have you seen the amazing stuff people are making with IPython/Jupyter notebooks ? It will blow your mind! So I needed a place where I could find more of these amazing notebooks. For now it's a simple website that displays the latests and most viewed Notebooks, however in the future I would like it to have searching and categorization features.

If you enjoy this website, you can buy me a beer

Can I say something ?

Sure!. I'd love to hear some feedback. If it's an issue with the website feel free to open an issue here. You can also email me at


Shashi Gowda's GitHub

I dabble in code, design, and web development.

Right now, I am a GSoC student for The Julia Language.


Functional Geometry by Shashi Gowda




OpenDeep is a deep learning framework for Python built from the ground up in Theano with a focus on flexibility and ease of use for both industry data scientists and cutting-edge researchers. OpenDeep is a modular and easily extensible framework for constructing any neural network architecture to solve your problem.

Use OpenDeep to:

  • Quickly prototype complex networks through a focus on complete modularity and containers similar to Torch.
  • Configure and train existing state-of-the-art models.
  • Write your own models from scratch in Theano and plug into OpenDeep for easy training and dataset integration.
  • Use visualization and debugging tools to see exactly what is happening with your neural net architecture.

Plug into your existing Numpy/Scipy/Pandas/Scikit-learn pipeline.

Run on the CPU or GPU.

This library is currently undergoing rapid development and is in its alpha stages.


Intro to Deep Learning with Theano and OpenDeep by Markus Beissinger

Deep learning currently provides state-of-the-art performance in computer vision, natural language processing, and many other machine learning tasks. In this talk, we will learn when deep learning is useful (and when it isn't!), how to implement some simple neural networks in Python using Theano, and how to build more powerful systems using the OpenDeep package.

Our first model will be the 'hello world' of deep learning - the multilayer perceptron. This model generalizes logistic regression as your typical feed-forward neural net for classification.

Our second model will be an introduction to unsupervised learning with neural nets - the denoising auto-encoder. This model attempts to reconstruct corrupted inputs, learning a useful representation of your input data distribution that can deal with missing values.

Finally, we will explore the modularity of neural nets by implementing an image-captioning system using the the OpenDeep package.

Markus Beissinger

Recent graduate from the Jerome Fisher Program in Management and Technology dual degree program at the University of Pennsylvania (The Wharton School and the School of Engineering and Applied Science), and current Master's student in computer science. Focus on machine learning, startups, and management.


Thunder, Large-scale analysis of neural data with Spark

Thunder is a library for analyzing large-scale neural data. It's fast to run, easy to develop for, and can be used interactively. It is built on Spark, a new framework for cluster computing.

Thunder includes utilties for loading and saving different formats, classes for working with distributed spatial and temporal data, and modular functions for time series analysis, factorization, and model fitting. Analyses can easily be scripted or combined. It is written in Spark's Python API (Pyspark), making use of scipy, numpy, and scikit-learn.

Thunder is a community effort, and thus far features contributions from the following individuals:

Andrew Osheroff, Ben Poole, Chris Stock, Davis Bennett, Jascha Swisher, Jason Wittenbach, Jeremy Freeman, Josh Rosen, Kunal Lillaney, Logan Grosenick, Matt Conlen, Michael Broxton, Noah Young, Ognen Duzlevski, Richard Hofer, Owen Kahn, Ted Fujimoto, Tom Sainsbury, Uri Laseron

If you have ideas or want to contribute, submit an issue or pull request, or reach out to us on gitter, twitter, or the mailing list.


Google's R Style Guide

R is a high-level programming language used primarily for statistical computing and graphics. The goal of the R Programming Style Guide is to make our R code easier to read, share, and verify. The rules below were designed in collaboration with the entire R user community at Google.

Other Style Guide

Our C++ Style Guide, Objective-C Style Guide, Java Style Guide, Python Style Guide, Shell Style Guide, HTML/CSS Style Guide, JavaScript Style Guide, AngularJS Style Guide, Common Lisp Style Guide, and Vimscript Style Guide are now available. We have also released cpplint, a tool to assist with style guide compliance, and google-c-style.el, an Emacs settings file for Google style.


The Auton Lab, CMU

The Auton Lab, part of Carnegie Mellon University's School of Computer Science, researches new approaches to Statistical Data Mining. It is directed by Artur Dubrawski and Jeff Schneider. We are very interested in the underlying computer science, mathematics, statistics and AI of detection and exploitation of patterns in data.

We build practical large-scale deployments of very highly autonomous self-improving systems. We gratefully acknowledge funding support from NSF, DARPA, NASA, USDA, CDC, FDA, DHS, DoD, the State of Pennsylvania, other agencies, and over a dozen Fortune 500 companies with whom we have collaborated.


Statistical Data Mining Tutorials by Andrew Moore, CMU

I am the Dean of the School of Computer Science at Carnegie Mellon University. My background is in statistical machine learning, artificial intelligence, robotics, and statistical computation for large volumes of data. I love algorithms and statistics. In the case of robotics, which I also love, I only have expertise in decision and control algorithms. I suck at hardware and mechanical design. When I stand near a robot, it breaks.

I have worked in the areas of robot control, manufacturing, reinforcement learning, algorithms for astrophysics, algorithms for detection and surveillance of terror threats, internet advertising, internet click-through prediction, ecommerce, and logistics for same day delivery.

I am passionate about the impact of technology (algorithms, cloud architectures, statistics, robotics, language technologies, machine learning, computational biology, artificial intelligence and software development processes) on the future of society. We are lucky to live in such an exciting time of change. I am adamant that the Pittsburgh region in general, and Carnegie Mellon more specifically, are right in the center of all this change.



MIT cheetah robot lands the running jump

In a leap for robotic development, the MIT researchers who built a robotic cheetah have now trained it to see and jump over hurdles as it runs — making this the first four-legged robot to run and jump over obstacles autonomously.



Jean-Baptiste Mouret Homepage & Videos, INRIA, France

I study machine learning and evolutionary computation as a mean to design highly adaptive robots.

Awards and nominations

Best video, AAAI video competition 2014: Joost Huizinga, Jean-Baptiste Mouret, Jeff Clune. Evolving Neural Network at Are Both Modular and Regular


Ninth AAAI Video Competition

AAAI is pleased to announce the continuation of the AAAI Video Competition, now entering its ninth year. The video competition will be held in conjunction with the AAAI-15 conference in Austin Texas, USA, January 25-29, 2015. At the main AAAI-15 awards ceremony, authors of award-winning videos will be presented with "Shakeys", trophies named in honour of SRI's Shakey robot and its pioneering video. Top videos will be screened during the award ceremony and during a dedicated session. All videos will also be visible throughout the conference.

The goal of the competition is to show the world how much fun AI is by documenting exciting artificial intelligence advances in research, education, and application. View previous entries and award winners at or watch last year's videos on youtube.



HOW OLD DO I LOOK? by Microsoft

The #HowOldRobot guesses how old you look using Machine Learning.™

About in 100 Words™ (formerly is a leading web-based science, research and technology news service which covers a full range of topics. These include physics, earth science, medicine, nanotechnology, electronics, space, biology, chemistry, computer sciences, engineering, mathematics and other sciences and technologies. Launched in 2004,’s readership has grown steadily to include 1.75 million scientists, researchers, and engineers every month. publishes approximately 100 quality articles every day, offering some of the most comprehensive coverage of sci-tech developments world-wide. Quancast 2009 includes in its list of the Global Top 2,000 Websites. community members enjoy access to many personalized features such as social networking, a personal home page set-up, RSS/XML feeds, article comments and ranking, the ability to save favorite articles, a daily newsletter, and other options.


TEDx London Business School 2015

Edwina Dunn Curiosity & Collaboration

Edwina Dunn talks about market research in the digital age, how to extract value and truly understand your customers.

Udayan Goyal Financial inclusion in the information age

Udayan Goyal talks about why success lies in cooperation and collaboration, why start-ups should work with incumbents, and what factors make for a successful collaboration.

Jessica Butcher Visual search for generation curious

Jessica Butcher talks about how visual search has the potential to enhance and deepen appreciation of the physical world in everyday life.

Robert Diamond Love your data

Robert Diamond tells us why sharing our personal data could be the best thing we do. He examines how individuals, companies, and government all must play a role to leverage personal data to truly change the way we interact with the world.



CS231n: Convolutional Neural Networks for Visual Recognition by Andrej Karpathy, Stanford University

Andrej Karpathy's Publications, Stanford Computer Science Ph.D. student (Great resources, not to be missed)

Andrej Karpathy's Blog

Automated Image Captioning with ConvNets and Recurrent Nets by Andrej Karpathy


Frequently searched words on the Internet by Bloopish

Why a new Search Engine in Town?

Bloopish is a Real-Time Search Engine, meaning the results you got are really fresh and difficult to find so fast using other search engines.

Briefly speaking, Bloopish = Google + Twitter + Privacy

You can add a webpage to Bloopish index now and find it in your results one second later. The index is interactive and Real-Time as well.

We are crawling and updating our index constantly with pages created seconds or minutes ago.

Bloopish is free, no ads, no sponsored links, no spam, no virus.

More over, it is fast, easy and safe for child use.

By the way, we still believe in SEO, so don't leave blank the keywords meta in your pages, ok?



IA*LAB: Laboratoire citoyen et transdisciplinaire de recherche en Intelligence Artificielle, La Paillasse, France

L'IA*Lab rassemble ceux qui souhaitent explorer le vaste monde de l'Intelligence Artificielle.

Créé en 2014 au sein de La Paillasse, cette communauté rassemble toute personne motivée par le sujet et souhaitant discuter, chercher, créer, hacker ou apprendre autour du sujet de l'IA.

L'IA*Lab est ouvert à tous au travers de rassemblements réguliers et d'outils participatifs et communautaires présentés plus bas.


CogLab, La Paillasse, France

Le CogLab est un programme d’exploration des Sciences Cognitives au croisement de l’art numérique, de l’intelligence artificielle et de l’open science. Initié en tant que projet de la FRESCO, il est aujourd’hui propulsé par La Paillasse et s’articule sur 2 axes :

  • L’Open Lab : du matériel en Open Access et des projets collaboratifs dans une philosophie DIY / Maker pour permettre à chaque citoyens de s’approprier une science (encore) méconnue
  • Le Think Lab : des groupes de réflexion prospective sur l’influence de l’hybridation technologique sur l’évolution de nos usages pour mieux appréhender les enjeux éthiques

Le CogLab permet de vulgariser les nouveaux défis utilisant les Sciences Cognitives, et valoriser les projets lors de rencontres interactives et artistiques avec le grand publique.

Un communauté toujours plus grandissante se rassemble autour des Meetup et autres événements organisé, permettant d’amorcer de multiples projets.



Reinforcement Learning by David Silver, UCL (already in the ML Kit)

Lecture Videos (new)

Reinforcement Learning by Richard S. Sutton and Andrew G. Barto (already in the ML Kit)



MIT Technology Review

Machine-Learning Algorithm Calculates Fair Distance for a Race Between Usain Bolt and Long-Distance Runner Mo Farah

In an entirely new model of athletic performance, three numbers characterize an athlete’s capability over short, middle, and long-distance races



Archives ouvertes (French website but English abstracts)


Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman

The 2nd edition of the book (v2.1)

The book

The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining).

The book, like the course, is designed at the undergraduate computer science level with no formal prerequisites. To support deeper explorations, most of the chapters are supplemented with further reading references.

The Mining of Massive Datasets book has been published by Cambridge University Press. You can get 20% discount here.

By agreement with the publisher, you can download the book for free from this page. Cambridge University Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it.

We welcome your feedback on the manuscript.





Gapminder is a non-profit venture promoting sustainable global development and achievement of the United Nations Millennium Development Goals by increased use and understanding of statistics and other information about social, economic and environmental development at local, national and global levels.

We are a modern “museum” that helps making the world understandable, using the Internet.

Gapminder was founded in Stockholm by Ola Rosling, Anna Rosling Rönnlund and Hans Rosling on February 25, 2005. Gapminder is registered as a Foundation at Stockholm County Administration Board (Länstyrelsen i Stockholm) with registration number (organisationsnummer) 802424-7721.



Stanford: Center for Professional Development

The Stanford Center for Professional Development makes it possible for today's best and brightest professionals to enroll in Stanford University courses and programs while they maintain their careers.  Courses and programs from the School of Engineering and related Stanford departments are delivered online, at Stanford, at company work sites and international locations—providing a global community of learners with flexibility and convenience and enabling them to apply their education to their work.



The Machine Learning Salon's Ibook in progress .... available here (150-210 pages so far)



By the end of May, The Machine Learning Salon will celebrate its first year!

28,700 visitors so far with only two posts: one on, another one on and just a few tweets.

Looking for further developments, any idea is welcome!


Silicon Milkroundabout (I'll be there tomorrow!)



CVonline: The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer Vision by Professor Bob Fisher, University of Edinburgh


Professor Bob Fisher Homepage, University of Edinburgh

Prof. Robert Fisher has been an academic in the School of Informatics (originally in the former Department of Artificial Intelligence) at University of Edinburgh since 1984 and a full Professor since 2003. He received his PhD from University of Edinburgh (1987), investigating computer vision in the former Department of Artificial Intelligence. His previous degrees are a BS with honors (Mathematics) from California Institute of Technology (1974) and a MS (Computer Science) from Stanford University (1978). He worked as a software engineer for 5 years before returning to study for his PhD.

He has been researching 3D scene understanding since 1982, and has worked on model based object recognition, range image analysis and parallel vision algorithms. The main topics of his recent research are:

  • variations on the interpretation tree model matching algorithm,
  • automatic model acquisition applied to engineering objects and buildings,
  • surface model-based object recognition,
  • range image analysis,
  • iconic image analysis, and
  • humpback whalesong analysis.

This research is conducted in the Machine Vision Unit.


Advanced Vision Module by Professor Bob Fisher, University of Edinburgh


Advanced Vision MATLAB Code by Professor Bob Fisher, University of Edinburgh

The subdirectories contain the Matlab source and demo images for the programs described and/or presented in the lectures.


Computer Vision Online by Amin Sarafraz and Steven Cadavid

Our goal in Computer Vision Online is to create an online community for people who are working in the field of Computer Vision, so they can share their information and possibly use other researcher's achievements and prevent the cycle of reinventing the wheel. To accomplish this goal, currently we have listed different source codes, commercial or free softwares and datasets. In addition to those, we provide information about upcoming events and job openings related to computer vision.  Also, we have started writing online books that covers different areas of Computer Vision. The vision of those books is to provide a general knowledge about different aspects of computer vision.


See Machine Learning Salon Starter Kit page for more resources ...