## Sunday, July 4, 2010

### Why LaTeX is superior to MS Word

I recently submitted a paper to Interdisciplinary Graduate Conference (IGC) 2010. I prepared well formatted 8 page LaTeX document. However, the conference was organized by humanities students who had never heard of LaTeX. They wanted a .doc file. I then had to go through the painful process of converting my LaTeX document into a Word one. After that painful experience, I could not resist writing a rant on the process. I have experienced both forms of writing: I used word for every (lab) report in my under-grad. Once in my PhD program I was converted to LaTeX and have not looked back since.

1. Speed: Anything that requires a mouse and clicking through menus will be slower than one where you can write it out in a few key strokes. This means that writing characters with accents, special symbols, and especially equations will be much faster to write in LaTeX.

1. MS Word (and even worse open office) can get sluggish when editing large documents with lots of equations and figures.

2. Security: stored in plain text

1. MS Word stores its files in a bloated binary form. If a file gets corrupted for whatever reason you could be locked out of your file and many hours of work. Likewise if some bug in MS Word is causing it to crash when opening your file. With plain text source files, if all else fails you can always open and edit the file in a simple text editor.

3. Separation of context and formatting

1. The plain text style of LaTeX simply using section and subsection etc allows the writer to simply think about the logical flow of the document without worrying about superficial details such as font sizes and styles

2. You know all your sections and subsections headings are in the correct font size. This is much harder to check using MS Word.

4. Integratable with SVN

1. Being plain text it is easier for SVN (and other revision control systems) to merge files being edited

2. It also takes up less space on the server storing revisions

3. It is also possible to use any diff tool to compare revisions

1. You also know the diff tool will show you ALL changes in the state of the file. There is no such guarantee when using features such as track changes in MS Word.

5. Control

1. MS Word often tries to outsmart you. It will automatically capitalize, automatically try to select whole sentences, automatically insert bullet points, and try to infer when you’re done with a sub/superscript when writing equations. Software that tries to out-smart you will often out-dumb you. It tries too hard to infer what you want and often gets it wrong.

1. A good example is the use of - vs. -- vs. ---. Microsoft assumes most users are not smart enough to infer which type of dash to use in which situation. So MS Word tries to figure it out automatically. It can be really annoying when it gets it wrong. With LaTeX you just write what you want in 1 to 3 key strokes!

2. There is also the quote directions  ''. In LaTeX it is specified manually while in word it is automatic, and can be annoying if word infers it wrong.

2. Notion of state: every change in the file is visible. Nothing is hidden from you in a plain text source file.

1. There is no hidden meta information

6. Flexibility

1. It is easy to search for $x^2$ or \footnote in LaTeX the is no easy way to do the analogous searches using CTRL-F in MS Word

2. You can also search the file using regular expressions

3. Macros for a more semantic representation can be written in just 1 line with a few key strokes

1. For example, I often define \field{} using \mathcal{} and then \R using \field{R}

4. In MS Word macros are often full blown VB scripts

1. They should be disabled any way since they are a security risk

7. More easily scriptable

1. For instance, I have MATLAB code that exports a matrix of results in MATLAB to a LaTeX table

2. It would require a full blown C++/VB program in visual studio using all sorts of crazy APIs (and therefore reading tons of documentation) to do the same thing in MS Word

1. Because MS Word has to reformat the document every keystroke it has to use inferior typesetting methods to prevent the GUI from becoming glacially slow

2. Since you only recompile after significant changes in LaTeX, it can afford to use more expensive type-setting algorithms (especially for equations) that might take 10 seconds to run.

9. Interoperable

1. Since things like bibtex are also plain text it is much easier for third parties to create applications such as Jabref. You don't get stuck using one particular reference manager. If you don't like one there are others to use instead. And if all else fails you can always edit it in notepad.

2. Therefore, there is no vendor lock-in and you can't get stuck using one piece of software for backward compatibility reasons that may in the future become inferior to the alternatives.

10. Misc

1. Equations always come out looking crappy

2. Figure captions aren't proper; MS Word will let them cross page boundaries for instance

3. Footnotes are a pain, especially if you have multiple footnotes on the same page

4. Equation numbering is a pain in Word

5. It is possible to embed pdf figures in LaTeX which allows for vector graphics and avoids file bloat

1. However, I beleive new versions of word allow for the insertion of eps figures

11. Cost

1. LaTeX is free while MS Office can cost a few hundred dollars

1. There is open office but that is even worse

2. I am not a "free-tard" so this is not my top concern

1. MS Word has a grammar checker. To my knowledge, none of the LaTeX editors have a grammar checker.

1. Of course, the grammar checker should usually be taken with a grain of salt. However, it is good at catching typos such as interchanging it/is/if and they/then, which a spell checker will not find and are easy to glance over when proof reading.

2. The equation editor is better than it used to be. Writing an equation heavy document in word used to be almost impossible. However, it is now doable, but still much slower than LaTeX.

### ICML 2010 Highlights

636: Sparse Gaussian Process Regression via L1 Penalization
Feng Yan (Purdue University); Yuan Qi (Purdue University)
They introduced a way to do sparse GPs for large amounts of data by adding a L1 penalization to the influence of data points. It effectively removes irrelevant data points using a convex optimization. It avoids the local optima problems of normal sparse GPs.

I liked all the papers in the application track:
901: Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine
Thore Graepel (Microsoft Research); Joaquin Quiñonero Candela (Microsoft Research); Thomas Borchert (Microsoft Research); Ralf Herbrich (Microsoft Research)

902: Detecting Large-Scale System Problems by Mining Console Logs
Wei Xu (UC Berkeley); Ling Huang (Intel Labs Berkeley); Armando Fox (UC Berkeley); David Patterson (UC Berkeley); Michael I. Jordan (UC Berkeley)
I liked this since it is somewhat related to my project.

903: The Role of Machine Learning in Business Optimization
Chid Apte (IBM T. J. Watson Research Center)
IBM is increasing the efficiency of collecting back taxes in NY state using machine learning (which some people found scary).

374: Local Minima Embedding
Minyoung Kim (CMU); Fernando De la Torre (CMU)
The idea is to visualize the a high dimensional objective function in a lower dimension that can be visualized while preserving the local optima. Its a really good idea, but it is not good enough to help with the hard problems we would want to solve with it (ie visualizing local optima in high dimensional neural network optimization).

495: Hilbert Space Embeddings of Hidden Markov Models
Le Song (Cmu); Byron Boots (Carnegie Mellon University); Sajid Siddiqi (Google); Geoffrey Gordon (Carnegie Mellon University); Alex Smola (Yahoo! Research)

551: Distance Dependent Chinese Restaurant Processes
David Blei (Princeton University); Peter Frazier (Cornell)

### AISTATS 2010 Highlights

The conference kicked off with
Forensic Statistics: Where are We and Where are We Going?
Richard Gill
To append to Sebastien's comment the statistical flaws introduced by the doctors and lawyers in the case included: confusion that P(data|guilty) = P(guilty|data), multiplying p-vals over multiple tests, the post-hoc problem in frequentist hypothesis testing, arbitrary starting and stopping rules, violated iid assumptions, and fitting the protocol to define events to maximize significance.

I liked
Reduced-rank hidden Markov models
S. Siddiqi, B. Boots and G. Gordon
some cool stuff about alternatives to EM for training HMMs that are closed form with bounds on the loss of statistical efficiency compared to EM and without EM's local optima problem.

Gaussian processes with monotonicity information Jaakko Riihimäki, Aki Vehtari ; 9:645-652, 2010.

I learned what Phil Hennig is up to in
Coherent inference on optimal play in game trees
P. Hennig, D. Stern and T. Graepel

Zoubin pulled in another best paper award in
Learning the structure of deep sparse graphical models
R. Adams, H. Wallach and Z. Ghahramani
Adams and Wallach used some of the MacKay magic.

On the Sunday after the conference there was the active learning workshop. Don Rubin (co-inventor of EM and co-author on Gelman) was the invited speaker who talked about experimental design and causality. He is definetly on the stats side of the ai-stats. I asked him about measuring test set performance and he asked why I would want to divide my data set in half (definitely out of sync with the ML culture!). It was also interesting to see the philosophical cracks between him and Phil Dawid and the division between the Dawid/Rubin/Pearl views on causal inference.

### NIPS 2009 highlights

Reading Tea Leaves: How Humans Interpret Topic Models, Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong Wang and David Blei, Princeton University Really interesting idea for evaluating an unsupervised method. They use human subjects on Amazon Turk for controlled experiments to determine the interpretability of topic model outputs. They find that higher perplexity doesn't always correspond to higher interpretability by humans. The model with better perplexity may have found more structure in the data, but that may or may not correspond to our intuitive notions of a "topic". After talking to them there are some big open questions: 1) Do correlated topic models have better perplexity than LDA, but lower interpretability, because people don't think about words being assigned to multiple topics OR because there are unknown models, which better explain the data, and are better on both measures? 2) Is it possible to use the feedback from human subjects to do semi-supervised or active learning to improve topic models? I really liked this paper, because many machine learning researchers would find a contrived automatic measure for interpretability, which may not reflect interpretability from human subjects, which is what you really care about. This is especially true in the empirical risk minimization (ERM) community where a contrived, but automatic, measure is preferred because it is easier to optimize. Semi-supervised learning using feedback from humans in complex experiments makes it much harder to operate in terms if risk.

Making Very Large-Scale Linear Algebraic Computations Possible Via Randomization (Tutorial) Gunnar Martinsson Encouraging results on doing large scale matrix computations.

Sequential Monte-Carlo Methods (Tutorial) Arnaud Doucet and Nando de Freitas Really clear description of bootstrap particle filters.

Gaussian process regression with Student-t likelihood, Jarno Vanhatalo, Pasi Jylanki and Aki Vehtar they claim their Laplace approximation is better than EP and MCMC.

On Stochastic and Worst-case Models for Investing, Elad Hazan, IBM, and Satyen Kale, Yahoo! Research Tried to create bounds on worst-case scenario's for finding a portfolio. Interesting idea, but after talking to the authors, it seems they made the implicit assumption that the market can't drop more than 50% in a day, which they didn't mention.

Fast subtree kernels on graphs, Nino Shervashidze and Karsten Borgwardt, MPIs Tuebingen They use a kernel that can be used for regression/classification when the inputs are graphs (such as a molecular structure). They used SVMs in the paper, but it could be easily used in the GP context (which of course excites me more).

Invited Talk: Bayesian Analysis of Markov Chains, Persi Diaconis, Stanford I was excited to finally see the much talked of Diaconis in the flesh.

Machine Learning for Sustainability, J. Zico Kolter, Stanford University, Thomas Dietterich, Oregon State University, and Andrew Ng, Stanford University (mini-symposium) They really put their money where their mouth is. One of the speakers did his presentation over Skype to avoid the CO2 necessary to fly from San Francisco to Vancouver.

Improving Existing Fault Recovery Policies, Guy Shani and
Christopher Meek, Microsoft Research
Interesting since its similar to what I do ;)