I am a Research Engineer in Deep Learning at Magic Leap, currently focused on pioneering new learning-based methods for Visual SLAM, co-advised by Tomasz Malisiewicz and Andrew Rabinovich. I recieved my Master's and Bachelor's degrees at the University of Michigan, where I focused on Machine Learning, Computer Vision and Robotics. During this time I worked on various small projects on topics including person tracking, outdoor SLAM, scene text detection, 3D voxel convnets, robotic path planning, and text summarization, advised by great mentors like Matthew Johnson-Roberson, Edwin Olson, Silvio Savarese and Homer Neal. During my Master's studies I did an internship with Occipital where I helped release the Structure SDK which runs RGB-D SLAM on mobile devices.

2015-now: Senior Research Engineer at Magic Leap Deep Learning, Visual SLAM, Mixed Reality
2013-2015: University of Michigan Master's Student Computer Vision, Machine Learning, Robotics
Fall 2014: Graduate Student Instructor Computer Vision (EECS 442)
Summer 2014: Occipital Internship RGBD SLAM, Augmented Reality
2013: Vice President of Student AI Lab Natural Language Processing, Computer Vision
2008-2013: University of Michigan Bachelors's Student Robotics, Computer Science, International Studies

December 2018: Two new arXiv papers: Deep ChArUco: Dark ChArUco Marker Pose Estimation and Self-Improving Visual Odometry.
November 2018: Gave talk at Berkeley Artificial Intelligence Research Lab (BAIR).
October 2018: Gave keynote at the Bay Area Multimedia Forum Keynote (BAMMF) series in Palo Alto.
August 2018: Magic Leap One started shipping to creators worldwide.
July 2018: Attended ICVSS 2018 in stunning Sicily.
June 2018: Released a pre-trained net and demo code in PyTorch for SuperPoint. Get up and running in 5 minutes or your money back!
April 2018: SuperPoint selected as an oral at the 1st International Workshop on Deep Learning for Visual SLAM at CVPR in Salt Lake City.


SuperPoint: Self-Supervised Interest Point Detection and Description
This work presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. Our model, when trained on the MS-COCO image dataset, is able to repeatedly detect a rich set of interest points and stably track them over time.
Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich
CVPR 2018 Deep Learning for Visual SLAM Workshop
Toward Geometric Deep SLAM
We present a point tracking system powered by two deep convolutional neural networks. The first network, MagicPoint, operates on single images and extracts salient 2D points. As transformation estimation is more simple when the detected points are geometrically stable, we designed a second network, MagicWarp, which operates on pairs of point images and estimates the homography that relates the inputs. Both networks are trained with synthetic data.
Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich
arXiV 2017
Deep Image Homography Estimation
We present a deep convolutional neural network called HomographyNet for estimating the relative homography between a pair of images. We use a 4-point homography parameterization which maps the four corners from one image into the second image. The network is trained end-to-end using warped MS-COCO images, allow the use of large-scale training without time-consuming data collection. The HomographyNet does not require separate local feature detection and transformation estimation stages and outperforms a traditional homography estimator based on ORB.
Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich
RSS 2016 Workshop: Limits and Potentials of Deep Learning in Robotics
show more
We are building a fast calorie tracking and fitness guidance framework that leverages deep learning and other predictive algorithms.
3D Spatial Convnets for Semantic Segmentation
By training a 3D spatial convnet to recognize 127,915 CAD Models in 662 different categories, we can develop a rich feature hierarchy for performing 3D semantic segmentation.
Daniel DeTone, Matthew Johnson-Roberson
Winter 2015
Structure Sensor SDK
We built an SDK for developers to use with the Structure Sensor that includes sample code for 3D object capture, 3D room mapping, and augmented reality gaming.
Summer 2014
Simultaneous Environment Discovery & Annotation
SEDA is a project for enhancing human learning by using state of the art techniques from AI. The non-technically constrained goal is to create an overlay to human vision to help with tasks humans are inherently bad at such as memory, calculations, and abstractions and to help speed up tasks such as looking up information and referencing material.
Michigan Student AI Lab (MSAIL)
Winter 2014
Scene Text Detection and Recognition
We built an end-to-end scene text detection and recognition framework that builds off of some recent published work of Lukas Neumann using an extremal region (ER) classifier and efficient exhaustive search.
Michigan Student AI Lab (MSAIL)
Winter 2014
Robust Locally Weighted Regression for Aesthetically Pleasing Region-of-Interest Video Generation
We provide a method that takes the output from an object tracker and creates a smoothed RoI to be viewed as the final output video. To accomplish this, we use a variation of linear regression, namely, robust locally weighted linear regression (rLWLR-Smooth).
ATLAS Collaboratory Project
Parallel Tracking and Mapping for Outdoor Localization
By removing some of the long term pose optimizations and by limiting the allowed number of bundle adjustment iterations, I was able to modify PTAM to work in an outdoor localization setting. This work was used to help improve the accuracy of a multi-target tracking system.
Daniel DeTone, Yu Xiang, Silvio Savarese
Summer 2013
Robotics Competition for Autonomous SLAM and Path Planning
We entered a mobile robot, equipped with a fisheye camera and laser pointer, in a robotics competition. To win, the robot must autonomously map a small area, shoot green triangles, and return to a starting point. We implemented a fast agglomerative line fitting algorithm, a graph-based SLAM algorithm, and a memory efficient quad-tree for map storage. Our team finished 2nd out of 8 teams.
Daniel DeTone, Ibrahim Musba, Jonathan Bendes, Andrew Segavac
Winter 2013
Projectile Prediction and Robotic Retrieval using Kinect RGBD Video
We developed a fully automated projectile-catching robot by affixing a small basket to a mobile robot and predicting the projectile's landing position in real-time. We implemented a detection algorithm using RGBD video from a Kinect and an estimation algorithm using linear regression. Once the landing position was calculated, we used dead-reckoning and a PID controller to navigate the mobile robot.
Daniel DeTone, Rohan Thomare, Max Keener
Winter 2013
Tracking-by-detection in a Lecture Hall Setting
We present a framework for tracking a single human (person-of-interest) in a lecture hall environment. It is a tracking-by-detection framework that uses a generic person detector, a novel scoring function to solve the data association problem, and a Kalman filter that provides reliable state estimation. In our scoring function, we introduce two novel subcomponents: a subscore based on the target’s width and a subscore based on the color histogram of him/her at the first time step.
ATLAS Collaboratory Project
Fall 2013
Particle Filter Tracking in a Lecture Hall Setting
Proof of concept for using a deformable parts model in conjunction with a particle filter and efficient MCMC sampling.
ATLAS Collaboratory Project
Fall 2013
Linear array of photodiodes to track a human speaker for video recording
We present a human lecturer tracking and recording system that consists of a pan/tilt/zoom (PTZ) color video camera, a necklace of infrared LEDs and a linear photodiode array detector. Electronic output from the photodiode array is processed to generate the location of the LED necklace, which is worn by a human speaker. The LED necklace is flashed at 70Hz at a 50% duty cycle to provide noise-filtering capability.
Daniel DeTone, Homer Neal, Bob Lougheed
JoP:CS 2012


EECS 477: Algorithms
EECS 551: Matrix Methods for Machine Learning, Signal Processing, and Data Analysis
EECS 592: Advanced Topics in Artificial Intelligence
EECS 545: Machine Learning
EECS 501: Probability and Random Processes


I maintain the Michigan Men's Ultimate Frisbee webpage.
Find me on Twitter, LinkedIn. and Github.