Well, what he actually says is the ‘phase transition’ in computer science. Two things make that possible: 1- too much data and 2- processing speed.
One of the nicest example he gives is that a learning algorithm X is the best with 1 million examples and another algorithm Y comes at the third rank but when the same algorithms are run on a data set of 1 billion examples then Y becomes the best one.
Another good examples: Scene completion example where the algorithm did not provide meaningful results with 10.000 images and researchers kept on trying with 100.000 images, again no good results, then with 1.000.000 images, again no results but then with 10.000.000 images it worked very well! So there’s some kind of phase transition – or a quantum leap – is going on here. The situation is similar to Google Image Search where they were trying to find the canonical images, e.g. the image that best represents ‘Mona Lisa’ and not some variation of it. By taking pairs of images, doing a feature comparison, calculating a distance and arranging data as graph and running a pagerank-like algorithm on the graph they were able to find the images that represent the given set of keywords best.
It is always fun and revealing to listen to Norvig. If you are interested in cutting edge research in machine learning, pattern recognition and machine translation I recommend this video enthusiastically. Especially the parts where Norvig shows some single page Python source code for word segmentation and typo checking programs (first is about %97 correct, running on a laptop with a data set of about 1.7 billion words, the second is about %75 correct, again running on a not-very-high-end laptop). He also mentions MapReduce programming paradigm and some wrong claims about the model, showing how it helps to do parallel programming for very large amounts of data.
Benzer Yazılar / Similar Posts:
- F# Job at MSR Cambridge: Software Engineer in Data Mining and Machine Learning
- Bicycle statistics by country (and obesity relationship)
- My new ThinkPad T500 – Rock solid
- Bilim 2.0: Bilimsel Keşif Yapmak İstatistiksel Bir Modeldeki Parametreleri Tespit Etmekten Mi İbarettir?
- Our favorite running paths – 1
- The Second Answer Set Programming Competition – Call For Participation
- Beyond Search – Computational Intelligence for the Web – Videos
- Seminer: Human Brain Reading
- ClozeFox Firefox plugin concept goes for the second round
- mloss: Machine Learning Open Source Software
[...] This post was mentioned on Twitter by Ruben de la Fuente, Nilton Lessa and CollaborativeTM. Ruben de la Fuente said: RT @CollaborativeTM: Peter Norvig talks about the quantum leap in computer science and machine learning http://ileriseviye.org/blog/?p=2544 [...]