Software Engineer, Google, Inc., Mountain View, California
My research interests lie in machine learning, computational statistics, and information visualization. My application areas are large-scale data problems, primarily text analysis; I am also interested in modeling social, image, biological, and financial data. Broadly speaking my work addresses two themes:
These themes have natural interplay—they parallel the iterative process typical in analyzing and modeling data. Central to this interplay is a fundamental question: How can resources be efficiently expended to achieve desirable levels of accuracy?
My thesis, “Stochastic m-Estimators for Controlling Accuracy-Cost Tradeoffs,” addresses this issue by developing a mathematical framework capable of spanning a continuum of explicit and implicit tradeoffs present in machine learning. Abstractly, these tradeoffs consists of exchanging finite physical resources for improved accuracy. More concretely, these limiting factors, or costs, may be computational, such as time-limited cluster access for parameter learning, or they may be financial, such as purchasing human-labeled training data under a fixed budget. This work explores these accuracy-cost tradeoffs by proposing a family of estimators that maximizes a stochastic variation of the traditional m-estimator.
These “stochastic m-estimators” (SMEs) are constructed by stitching together different m-estimators, at random. Each such instantiation resolves the accuracy-cost tradeoff differently, and taken together they span a continuous spectrum of accuracy-cost tradeoff resolutions. My thesis proves the consistency of the estimators and provides formulas for their asymptotic variance and statistical robustness. I also demonstrate their usefulness for:
As well as a variety of other tradeoffs, i.e., active learning, robust loss functions, random variate generation, etc.
My advisor is Professor Guy Lebanon and I frequently collaborate with Doctor Kevyn Collins-Thompson at Microsoft Research.
Dissertation / Defense (handout-4up, handout-6up)
Ph.D., Computational Science & Engineering,
Georgia Institute of Technology, Atlanta, Georgia. January 2009 – December 2011.
|2008||M.S., Electrical & Computer Engineering,
Purdue University, West Lafayette, Indiana. August 2005 – December 2008.
|2005||B.S., Computer Engineering & Electrical Engineering,
Michigan Technological University, Houghton, Michigan. August 2001 – April 2005.
Mountain View, California. October 2011 – Present.
Hancock, Michigan. Spring 2005.
Rochester, Minnesota. Summer 2004.
Rochester, Minnesota. Summer 2003.
Michigan Department of Transportation,
Cass City, Michigan. Summer 2002.
Georgia Institute of Technology,
Atlanta, Georgia. January 2009 – December 2011.
Redmond, Washington. Summer 2009.
West Lafayette, Indiana. January 2006 – December 2008.
DOE Joint Genome Institute and Lawrence Livermore National Laboratory,
Walnut Creek, California. Summer 2006.
Extreme Blue Intern,
Austin, Texas. Summer 2005.
Purdue University. Spring, Fall 2008.
Purdue University. Fall 2007.
|2011||Marshall Sherfield Postdoctoral Fellowship, Marshall Aid Commemoration Commission, 2011–13.|
|2010||DHS Fellowship in Data Analysis and Visual Analytics, Department of Homeland Security, 2010–12.|
|2005||Ross Graduate Fellowship, Purdue University, 2005–06.|
|2001||Board of Control—Full Tuition, Michigan Technological University, 2001–2005.|
|2007||US delegate, 57th Lindau Meeting of Nobel Laureates and Students
(1,2). Germany, 2007.
|2005||Summa Cum Laude, Dept. of ECE. Michigan Technological University, 2005.|
|2002||Award of Excellence. Dept. of Mathematics, Michigan Technological University, 2002.|
|2006||Eta Kappa Nu ECE Honor Society, Beta Chapter. Purdue University, 2006.|
|2005||Eta Kappa Nu ECE Honor Society, Beta Gamma Chapter. Michigan Technological University, 2005.|
|2004||Phi Kappa Phi Honor Society. Michigan Technological University, 2004.|
|2004||Tau Beta Pi Engineering Honor Society, Michigan Beta Chapter. Michigan Technological University, 2004.|
|2001||Valedictorian, Cass City High School. Cass City, Michigan, 2001.|
|2010||J. Dillon and G. Lebanon. Stochastic Composite Likelihood. Journal of Machine Learning Research (JMLR) (in press), 2010.|
|2007||G. Lebanon, Y. Mao, and J. Dillon. The Locally Weighted Bag of Words Framework for Document Representation. Journal of Machine Learning Research (JMLR) 8(Oct):2405-2441, 2007.|
|2007||Y. Mao, J. Dillon, G. Lebanon. Sequential Document Visualization. IEEE Transactions on Visualization and Computer Graphics (INFOVIS), 13(6) 2007.|
|2010||J. Dillon, K. Balasubramanian, and G. Lebanon. Asymptotic analysis of generative semi-supervised learning. In Proc. of the International Conference on Machine Learning (ICML), 2010.|
|2010||J. Dillon and K. Collins-Thompson. A Unified Optimization Framework for Finding Reliable Pseudo-Relevance Feedback Models. Proceedings of the Nineteenth International Conference on Information and Knowledge Management (CIKM), 2010.|
|2009||J. Dillon and G. Lebanon. Statistical and Computational Tradeoffs in Stochastic Composite Likelihood. Proc. of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS). 2009.|
|2007||J. Dillon, Y. Mao, G. Lebanon, and J. Zhang. Statistical Translation, Heat Kernels, and Expected Distance. Proc. of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), 2007.|
|2010||K. Collins-Thompson and J. Dillon. Controlling the Search for Expanded Query Representations by Constrained Optimization in Latent Variable Space. SIGIR Workshop on Query Representation and Understanding, 2010.|
|2006||J. Dillon, Y. Mao, G. Lebanon, and J. Zhang. Statistical Translation, Heat Kernels, and Expected Distance. NIPS workshop on Learning to Compare Examples, 2006.|
|2010||S. Kim, J. Dillon, and G. Lebanon. Visualizing version controlled documents. Manuscript upon request, 2010.|
Matlab Central. August 2011.
Matlab interface to POSIX semaphore functionality.
Matlab Central. August 2010.
Allows 2D full/sparse matrix or 2D cell to be shared between multiple Matlab sessions, provided they have access to the same shared memory resources, i.e., the processes are on the same physical system. This program uses shared memory functions specified by POSIX and therefore doesn't use disk I/O for sharing. It should work trivially on Linux (tested on Ubuntu) and will probably work when compiled with Cygwin.
Copyright © 2010, Joshua V. Dillon. All rights reserved.