Attitude estimation on iPhone
Stand up and close your eyes. If you have superpowers and managed to memorize this text before doing so, you'll be now reading it in your mind. Now, try to walk straight for two meters towards somewhere with no walls or obstacles. This is not an experiment about pain. Stop, turn around 180 degrees, walk two meters again and open your eyes. If you don't suffer from a proprioceptive disorder, you should be close to your initial position. Obvious, right? Perhaps it is for humans but, since a few years ago, it was not easy for robots.
In order to know their spatial orientation and keep balance, robots use Inertial Measurement Units (IMU). They are composed of accelerometers, gyroscopes and, sometimes, magnetometers that measure linear accelerations, angular rates and magnetic fields, respectively. The measurements of these components are fused together with a Kalman filter or a complementary filter to have a pretty accurate estimation of the IMU attitude. By only measuring accelerations and rates, it would not be possible for an IMU to give an absolute attitude measure. However, accelerometers can sense the gravitational field and magnetometers can know the direction of the Earth's magnetic field, so the IMU can build a reference frame for the attitude measurements.
Thanks to MicroElectroMechanical Systems (MEMS), the components in an IMU today can be embedded into chiplike casings and consume very low power. This opened the door some years ago to the inclusion of IMUs in portable devices. In fact, I would dare saying that absolutely all smartphones today have an IMU. If only they had also powerful processors, cameras, data connections, low power consumptions and affordable prices, they would do great as basic robot controllers... Oh, wait!
A few days ago, I was wondering how good the performance of the IMU in my iPhone 4S would be. Would it be accurate enough for robotic applications? Would it help with selflocalization? Would it be good enough to put on a quadrotor and do this? I wrote a simple app to check it out and recorded some tests.
Reading the attitude with Core Motion
Core Motion is the iOS framework that provides access to the IMU. Before starting to read the documentation, I was expecting something rough. Something like having to directly read the accelerometers, then fusing with the gyros and so forth. Nothing further from the truth. Apple packed the whole process in the Core Motion framework, so you can do everything with a few lines of code.
1. Create a reference to a CMMotionManager object in your class interface:
CMMotionManager *motionManager;
2. Initialize the motion manager and tell it to periodically sample the device motion:
motionManager = [[CMMotionManager alloc] init];
if (motionManager.isDeviceMotionAvailable) {
// This device supports DeviceMotion events. Configure the sampling rate and start the IMU with an available reference frame
motionManager.deviceMotionUpdateInterval = 1.0 / 30.0; // Put your sampling rate here, up to 100 Hz
// Start sampling with an orthonormal reference frame in which:
//  the Z vector is vertical
//  we don't care where the X vector points to, but the magnetometer is used to reduce angular drift
[motionManager startDeviceMotionUpdatesUsingReferenceFrame: CMAttitudeReferenceFrameXArbitraryCorrectedZVertical];
} else {
// Tell the user there's no device motion available and commit a crime
}
3. Get the current attitude at any point:
CMDeviceMotion *currentMotion = motionManager.deviceMotion;
CMRotationMatrix currentAttitude = currentMotion.attitude.rotationMatrix;
However, you still can do it the hard way if you want because Core Motion also gives lower level access to the IMU components. A potential scenario where low level access could be useful would be the fusion of the IMU information with the GPS data in a tightly coupled Kalman filter.
About the sample app visuals
The dice is rendered with OpenGL. The camera is captured with OpenCV and each frame is copied to a texture covering a quad at the background that is always parallel to the camera image plane. Before being copied to the texture, the camera frames are cropped to keep the same aspect ratio as the phone's screen to avoid distortion. The background texture dimensions are chosen as the minimal power of two enclosing the cropped background. The texture coordinates for the background quad are set so that the texture piece containing the camera frame covers the full screen.
The rest of the post will only cover the aspects related to the linear algebra required to produce the "steady dice" effect from the attitude measurements. Other issues, like setting up the OpenGL context, framebuffers, models, textures, etc. will not be described, as they were not my main concern for this experiment. Feel free to fork the source code instead. All the visuals are generated with the good ol' OpenGL ES 1.1 pipeline. Despite it is easy and fast for simple applications, I would recommend learning the much more flexible OpenGL ES 2.0 pipeline with one of the many good tutorials on the Internet.
Matrix transformations
As you have seen in the video, my sample app makes the dice look like its attitude stays aligned with the world reference, no matter how the iPhone moves. As the dice attitude is "fixed", you can explore its faces by moving the phone around it. This is achieved by rendering the object as if it is viewed from a camera that moves exactly like the iPhone in the world frame.
In OpenGL, there are two basic matrices to configure: the projection matrix and the modelview matrix. As its name suggests, the projection matrix projects 3D points on the image plane. These 3D points must be expressed in the camera reference frame or "view" reference frame. So, in order to transform points in the model reference frame to points in the view reference frame, we have the modelview matrix. Then, every vertex in the model is projected on the screen as:
s_{i} = ^{s}T_{v} · ^{v}T_{m} · m_{i}  (Eq. I) 
Where:
m_{i} is the ith model 3D vertex
s_{i} is the ith 2D projection on the screen
^{v}T_{m} is the modelview matrix
^{s}T_{v} is the projection matrix
Please observe my matrix notation. I write every transformation matrix as a T with a superscript and a subscript. The subscript denotes the source reference frame and the superscript indicates the target reference frame. Hence, the projection matrix is the transformation matrix from the view reference frame (v) to the screen reference frame (s). And the modelview matrix is the transformation matrix from the model reference frame (m) to the view reference frame (v). Notice that I am using screen instead of viewport so the abbreviation is not confused with the view reference frame but, in fact, viewport would be a more correct nomenclature, as the projections can live in a window or other screen subspace.
The projection matrix
When programming with OpenGL on iOS, we can take benefit of the GLKit. It is a framework with many helpers that make life easier when using OpenGL. If you have programmed OpenGL desktop software, you will see some similarities with GLUT. GLKit contains many functions for matrix manipulation that I used to set up the projection and modelview matrices. The projection matrix can be set with GLKMatrix4MakePerspective():
GLKMatrix4 projectionMatrix = GLKMatrix4MakePerspective(GLKMathDegreesToRadians(60.0f), fabsf(vpWidth / vpHeight), 0.1f, 100.0f);
This function is called with the angle of the field of view in the vertical direction, the aspect ratio of the viewport and the distance of the near and far clipping planes. The output is a transformation matrix that describes something similar to a depthbounded pinhole camera model. It is equivalent to gluPerspective() in GLUT.
The modelview matrix
By reading the device motion with Core Motion, we get a rotation matrix that expresses the device attitude in the selected IMU reference frame. This reference frame, for our case, has the Z vector pointing up the sky. The X and Y vector directions are not specified, but I have experimentally verified that the Y vector points towards where the phone top was when the app started, whereas the X vector points towards where the phone right side was at that same moment.
On the other hand, the phone camera reference frame is an orthonormal system in which the Z vector points off the screen, towards the user; in portrait mode, the Y vector points up and the X vector points right. This means that the rotation matrix given by Core Motion is directly the camera rotation in the IMU reference frame:

(Eq. II) 
We want to have our virtual camera rotating like if it actually is the iPhone camera. Therefore, our OpenGL view must be "attached" to the real phone camera. However, we want the view to be some distance away from the object, so we can see it from afar (not from the inside). Thus, we translate the view along the real camera reference frame Z axis. Each dice side is 1 unit long, so we pick 4 units to see the full object:

(Eq. III) 
d = (0, 0, 4)
The object model has been defined from a reference frame that is equal to the IMU frame, therefore:
^{m}T_{IMU} = I  (Eq. IV) 
Finally, we can get the viewmodel transformation by combining transformations in equations II, III and IV. The inverse of the viewmodel transformation is the modelview transformation that we are looking for:
^{v}T_{m} = (^{ m}T_{v })^{1} = (^{ m}T_{IMU} · ^{IMU}T_{cam · }^{cam}T_{v })^{1}  (Eq. V) 
We can get rid of ^{m}T_{IMU} because it is the identity:
^{v}T_{m} = (^{ }^{IMU}T_{cam · }^{cam}T_{v}_{ })^{1} = (^{ cam}T_{v}_{ })^{1}_{· }(^{ }^{IMU}T_{cam}_{ })^{1}  (Eq. VI) 
Now, taking into account that ^{cam}T_{v} is a pure translation matrix and ^{IMU}T_{cam} is a pure rotation matrix, the result can be rewritten as:

(Eq. VII) 
The advantage of this expression is that the computationally expensive inverse operation can be skipped by simply transposing the rotation submatrix and negating the translation vector.
Matrix operations with GLKit
Why to use GLKit matrix operations? Can't you do it with OpenGL? In fact, you can, but I just wanted to point out that we can use some useful GLKit functions for the same purpose without the drawbacks of working with the OpenGL matrix stack. The code will be easier to read and more flexible, as you won't need to take care of OpenGL's state by pushing and popping the matrices every time you want to operate on them. Is it for free? Nope, you will pay in lack of portability because GLKit is not available outside iOS. Anyway, it's good to know that it's there.
For instance, you could write equation V with GLKit functions like this:
CMRotationMatrix r = motionManager.deviceMotion.attitude.rotationMatrix;
GLKMatrix4 camFromIMU = GLKMatrix4Make(r.m11, r.m12, r.m13, 0,
r.m21, r.m22, r.m23, 0,
r.m31, r.m32, r.m33, 0,
0, 0, 0, 1);
GLKMatrix4 viewFromCam = GLKMatrix4Translate(GLKMatrix4Identity, 0, 0, 4);
GLKMatrix4 imuFromModel = GLKMatrix4Identity;
GLKMatrix4 viewModel = GLKMatrix4Multiply(imuFromModel, GLKMatrix4Multiply(camFromIMU, viewFromCam));
bool isInvertible;
GLKMatrix4 modelView = GLKMatrix4Invert(viewModel, &isInvertible);
Now that we have seen the power of GLKit, let's go back to the simplified expression in equation VII. In the code, it simply translates to:
CMRotationMatrix r = motionManager.deviceMotion.attitude.rotationMatrix;
GLKMatrix4 modelView = GLKMatrix4Make(r.m11, r.m21, r.m31, 0.0f,
r.m12, r.m22, r.m32, 0.0f,
r.m13, r.m23, r.m33, 0.0f,
0.0f, 0.0f, 4.0f, 1.0f);
Your first impression on the modelview matrix definition above might be "the element order is wrong". It is a very common mistake and it means that you are thinking in rowmajor order. Just notice that matrices in OpenGL, GLKit and Core Motion are treated as columnmajor. The first four coefficients passed to GLKMatrix4Make() are the first column, the next four coefficients are the second column, and so on. In the same way, each member m_{ij} of the CMRotationMatrix is the element in column i and row j. If you take a look again thinking columnmajor, you will see that the second line in the code above actually represents equation VII.
There is a transposed version of GLKMatrix4Make() called GLKMatrix4MakeAndTranspose(), in case you still prefer to enter the matrix coefficients in rowmajor fashion. The following code would be equivalent to the previous matrix definition:
GLKMatrix4 modelView = GLKMatrix4MakeAndTranspose(r.m11, r.m12, r.m13, 0.0f,
r.m21, r.m22, r.m23, 0.0f,
r.m31, r.m32, r.m33, 4.0f,
0.0f, 0.0f, 0.0f, 1.0f);
Once a GLKMatrix4 is generated with the GLKit functions, it can be set as the modelview matrix by just telling OpenGL to load its contents. This is that simple because GLKit matrices are in columnmajor order, too:
glMatrixMode(GL_MODELVIEW);
glLoadMatrixf(modelView.m);
Closing words
We have explored how to read the device attitude on iOS and how to do something meaningful with it thanks to linear algebra, the GLKit framework and OpenGL. After doing this experiment, I am convinced that the IMU in the iPhone 4S would do a good job in many robotic applications. Even when the magnetometer is enabled, I have noticed a slight drift in the yaw angle. However, without going into quantitative results, the accuracy and the response time figures that I have experienced are comparable to those offered by the IMUs in some quadcopters that I have used in the past. In conclusion, I guess that I will give the iPhone a try as a robot controller as soon as I can.
Have you used your smartphone's IMU for some project? Did you try it on Android? Do you have any tips or want to share your experience? You are very welcome to comment below.