FAQ
Signature error
If there is a signing error or if a cloud_manager request fails, it may be that the user's AKSK is incorrect. The troubleshooting steps are as follows.
- Log into the Fusion cloud console and verify whether an AKSK was created.
- Check the local configuration file
~/.config/xiaomi/config
and verify whether the AKSK is correctly configured. - Export the environment variables, echo $XIAOMI_ACCESS_KEY_ID,echo $XIAOMI_SECRET_ACCESS_KEY and echo $XIAOMI_CLOUDML_ENDPOINT, and verify whether they are correctly configured.
Does logging into the DevEnv environment affect other users?
No, the user development environment is isolated by Docker containers to ensure that TensorFlow versions and user files do not interact.
Will software installed in the DevEnv environment remain there?
If a user logs out, he can then log back in to access existing files. However, after the user deletes an instance, the cluster will clear the container. User data and files will not be preserved. Users can create a Docker image to avoid having to re-install software at every startup.
Is it necessary to configure the AKSK when using one's own container image at startup?
Downloading the container image does not currently support the AKSK configuration. All images are configured with a public service guarantee, available for download. Configuring the the AKSK is unnecessary.
How to designate the environment's training data?
Users can manually download training data from an external network to a local destination and access this locality for training, whereas TensorFlow already supports HDFS and FDS, allowing users to directly access the corresponding distributed storage without having to download it locally. For frameworks like Caffe, that can only access data locally, consider using FDS Fuse or S3 Fuse. The use of GPU clusters is somewhat restricted at present.
Can the development environment's GPU be used by other users?
No, the development environment's GPU is exclusive. It cannot be shared with other users.
What are the contents of the basic framework / version on the page, and what precautions should be taken?
CLOUD-ML will create several images in advance. If you do not specify an image when submitting a task, the system will default to the CPU image of TensorFlow version 1.0.0. To use a different image, you will have to specify which one. There are two ways to do so. One way is by specifying the name of the image, such as "cr.d.xiaomi.net/cloud-ml/train-caffe2-cpu:0.6.0". The image can be a standard image provided by CLOUD-ML or a user-defined image. Either way, it must meet the requirements of Docker, i.e. you must be able to download it successfully using the "docker pull" command. The other way is to specify the framework name and version number. CLOUD-ML will in this case load the corresponding image based on the information provided by the user. Using this method means that you can only use a standard image provided by the platform.