Thanks for the details, jewels.
If the practical camera is not moving, then the tracking would not be required for the camera of course. The only parts in a scene that could require tracking are moving objects. I’m not aware that this is an option in Skybox. I haven’t tried it, but if I’m not mistaken Syntheyes can do VR objects.
When you shoot a 360º*180º, an equirectangular, then the camera is in the center point, while the image is on a Sphere for example. The six sided cube works the same, but somehow I’m more happy with a sphere: To trun images into an equirectangular requires sub pixel movement and with that, one lowers the quality. To take this image and convert it into a six side view will do more harm to the image quality. Then use it and render a new format out to use it on an HMD (head mounted display) will be the third conversion already. So, to skip one step is my idea about. Cinema 4D supports this natively: Spherical Projection. All of that if the first generation of footage/image was an equirectangular. There the discussion starts, while the format created there will drastically determine the later steps and with that the quality. Since we might talk about tutorials, this needs to be discussed.
In the moment we have a sphere and the needed projection is set up, we know already that the camera sits in the center. Well, kind of, as VR camera set ups are based on at least two cameras, the nodal point is not in the center, so anything that is too close will not match. Typically the rule of thumb is: one millimeter in nodal point shift, needs a meter distance (camera objects) to balance the problem. Better more distance. With multi camera set ups past the food in diameter, this can cause problems.
Having said all of that, the center point and the projection are quite easy to manual track, as the center is clear and the point where it goes as well. The only thing that isn’t clear is the distance to the camera. While shooting such footage/image, a good survey on location is needed. If the object moves or its size is known, things might be simpler.
The stitching of these images is normally supposed to take all lens distortion out, the reality is not that perfect, but based on equipment and used methods it might come very close. Which means, one could set up a sphere an camera and take 2D images from it, to be used in the Camera Calibrator for example, or in Object tracking in other 3D tracking apps. If the object itself can provide enough parallax, things will get simpler.
Anyway, yes, a longer and detailed tutorial series would be nice, even things changes currently in the practical camera section alone and certainly a lot in the tracking and stitching software, the amount of projects in this area might motivate for such material.
How the six side format will change is also in discussion:
https://blog.google/products/google-vr/bringing-pixels-front-and-center-vr-video/
All the best