Tuesday, May 29, 2012

The R Language for Machine Learning

The other day I bought a book called "Machine Learning for Hackers," which uses R to show off some machine learning algorithms. There is a lot I don't know about machine learning, as I've only studied a subfield and not really the main statistical methods that are useful for the vast majority of problems. One reason I bought this book (in addition to having lots of gift cards) is that a friend of mine mentioned learning R recently, and I would like to know more about it. I don't know almost anything about R, so beware this post- its my initial impressions and misunderstandings.

My understanding is that R has a hodgepodge of properties that make it a bit strange. The basic data type is a vector (which is pretty cool and probably allows an efficient implementation of a lot of primitive operators on vectors), all data seems to have metadata attached, it is dynamically typed (properties of data are determined by metadata modifiable at runtime), there is an object system build on that metadata, it has higher-order function (terribly important, if you ask me), it is lazy (although common operations seem to force evaluation), its call-by-value in the sense of copying (although its looks like copying is done only when necessary), and it allows mutation of structures (although it limits propagation of effects on data through copying).

Overall its probably a good language for its domain- it got a lot attention to convenience, its designed to manipulate lots of complex data, and it supports some functional programming. I expect that it could get very messy with edge cases, odd interactions between mechanisms, possibly clumsy support for more regular programming tasks (I don't know either way about this one), and undefined semantics. The last one is a shame- languages should really know what their semantics or its programmings won't know what their programs really mean, and its hard to come up with variations of the language.

I'm pretty excited to learn these techniques, although this book does not go deep into theory. Its more about getting interested and involved in these algorithms with case studies and practical concerns. I think this is perfect- I'll learn what theory I want to know somewhere else. For now, I'm just glad to get a feel for the broader reach of this field that I like so much.

No comments:

Post a Comment