Saturday, December 27, 2008

NIPS Report

After attending NIPS 2008 I figure I should write up my impression. For me, some of the highlights were

  1. Shai Ben-David's student, Margareta Ackerman, gave a key note presentations on the quality of clustering. After hearing hear presentation and talking to her at her poster, I was unimpressed. I also think that a lot of the clustering quality stuff is BS. They create a set of axioms to evaluate the quality of clustering algorithm. I find all of them to be somewhat questionable. Comparison of unsupervised algorithms, such as clustering, can be done via comparisons of the marginal likelihood. It seemed that many of the ideas involved in Bayesian model comparison were a foreign language to Ackerman. A full review of the topic deserves its own post.
  2. Han Liu, John Lafferty and Larry Wasserman presented a paper on a joint sparsity regression. It builds on a previous paper where they modify L1 regularization for joint sparsity in multiple regression causing certain factors to have zero influence for all the different regressions. They extend this to the non-parametric case. Each regression is a additive model of nonlinear functions of each input. The joint sparsity model causes certain functions to be zero everywhere for each regression. The regularization is quite strange. L1 regularization is equivalent to a MAP solution with a laplacian prior. I am not sure what equivalent priors these regularization methods have. In the non-parametric single regression case, I think the regularizer is equivalent to a GP prior on the function where the covariance function is a kronecker delta. I have not proven that, however. A degeneracy of this model is that it causes the functions to go to zero for every input that has not been observed. Searching through the paper, I found that they used gaussian kernel smothing on the function afterwards to smooth it out, which seems like a bit of a hack to me. A full review of the topic deserves its own post.
  3. Byron Yu and John P Cunningham presented a paper on Gaussian Process Factor Analysis. They used independent GPs over time as the latent factors and then used a factor analysis like linear transformation to explain the observed time series. They applied to some neural spike data and got quite interesting results. They were able to visualize the acitivty of a monkey's motor cortex in 2D when throwing a ball.
  4. Ben Calderhead and Mark Girolami presented a paper on Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes. They were trying to infer the parameters of a nonlinear ODE. It was a little kludgy as their model setup violated the likelihood principle. They modeled the time series of the system state with GPs. They also modeled the derivatives for the system state via the ODE. So, they had two models for the data. They inferred the parameters be trying to minimize the difference between the two. I think they modeled the difference as a Gaussian centered at zero. By inspecting the graphical model, however, you notice that the observed variables are independent of the parameters to the ODE. So, if one wanted to be completely principled you could not use the model to infer anything about the parameters.
  5. Rebecca Saxe gave a presentation that I found quite interesting. She showed a certain brain region that was involved in thinking about how other people where thinking. It was therfore involved in moral judgement, because moral judgements often hinge on intent. She first correlated the activity in the region with people's moral judgements about hypothetical scenario's. She later showed how she was able to use TMS on subjects and change their moral judgements about the hypothetical scenarios.
  6. The was a lot of hype about infer.NET. It will make exploring different models much easier. It seems much more powerful than VIBES or BUGS.
  7. I attended the causality workshop which explored methods for inferring causuation from observational or experiments where you don't have direct control over the variables you'd like to. Judea Pearl gave a very enthusiantic presentation, I am not sure if I would consider it to be good presentation, however. There was some tension in the air between Phillip Dawid and Judea Pearl over their views on causation and have created to camps in the field. I don't think they are as far apart as they think. The divide is not as big as it is between Bayesian and frequentist, for example. Judea Pearl presented his do-calculus for inferering causation in causal graphs, which are derived using a set of axioms. Dawid gave a presentation high lighting what I hope most people already know: conditional independence in graphical models is not neccessarily the same thing as causation and that nothing is as good as a randomized experiment. However, Kevin Murphy, in Dawid's camp, showed one can prove all of the do-calculus rules using the IDAG. If one sets up a graphical model using inputs variables for causation one can derive the do-calculus rules using the standard conditional independence properties of graphical models. Wrapping ones mind around what is the correct aproach for causation is much more difficult and subtle than that for prediction. I beleive this is related to the fact it is much more difficult to get a ground truth when testing causal inference methods. Guyon high lighted this fact in relation to her causality challenge.
  8. Shakir mohamed presented a paper extending PCA to other data types with distributions in the exponential family. Normal PCA works under an assumption of gaussianity in the data. EPCA can assume a bournulli distribution for example.
  9. Jurgen Van Gael presented a paper where he extended the iHMM to a factorial iHMM. He basically went from an HMM to FHMM but with iHMMs. An iHMM is an HMM with an infinite number of latent states. The transition matrix is from a hierarcical DP.
  10. Sebastian Seung gave a presentation to decode the Connectome. The connectome is basically the connection matrix between all the neuron's the brain. It is likely summarizing a brain as a graph with each neuron as a node and each synapse as a edge. The difficulty of the task is converting images of 20-30 nm thick brain slices to a connection matrix. So far they have only done C elegans, which has a mere 300 neurons. With that scientists have reverse engineered a lot C elegans behaviour. They are currently working on decoding a cubic mm of mouse brain. They are using computer vision algorithms to automate and speed up the process. He eluded to the massive amounts of data involved. By my calculations, merely storing the connectome of the human brain would require 432 TB. The imagery would be vastly more. If one had the connectome matrix it would open up tons of possibilities for analysis. I would like to run spectral clustering on the graph and see how closely graph clusters correspond to anatomical structures. Of course, I don't know how one would run spectral clustering (ie do an eigen decomposition) on a matrix that large. Sebastion gave a video with 3D graphics illustrating the imaging process, which seemed like it was for the discovery channel. The star wars music in the background was a bit much ;)
  11. There was a paper on bootstraping the ROC curve. Basically, they are trying to get confidence bounds on a ROC curve. It is important get a sense of confidence on your performance to tbe sure that is was not from random chance. It is interesting to me because I have looked into model based approches to estimating the ROC curve.
Obviously, this is only a small subset of NIPS. However, it will give me a lot of material when it is my turn to present in my gorups weekly meetings. The list of proceedings is here

Teamwork and Machine Learning

In many engineering programs there is a focus on how to work in teams and how to divide projects into parts. In embedded systems it may start with the division between hardware and software. Then it may be further divided by different subsystems and software libraries. I haven't seen much emphasis on the different roles in machine learning. I see the different categories as being

  1. Acquiring the data and getting in a database.
  2. Extracting the data from the database into .csv and .mat files and into the form that can be sent directly into an algorithm.
  3. Designing new models, coding up the inference methods, and testing the algorithms on synthetic data.
  4. Creating a test bed to divide the data into training and test, evaluate different methods, and report results.
  5. Determining what feature matrices and models to use and putting everything together.
  6. Implementing libraries that can be used in actual applications
  7. Testing the real world libraries

From what I've seen not enough emphasis is placed on the division of the tasks academically or industrially. I think it is most effiecient to divide these tasks among different people who can be specialized. It is somewhat wasteful to take a person who is an expert in designing inference algorithms and have them spend most of their time setting up a database.