Use front-end and back-end commands

Brief introduction

Xiaomi Cloud-ML supports front-end and back-end command functions, allowing users to execute custom shell commands before and after training.

Users can use front-end and back-end commands to download data, initialize Kerberos permissions, mount FUSE directories, etc.

Usage example

This feature can be used with the command-line parameters -pc and -fc so as to view the training log in order to verify whether the command was executed successfully.

cloudml jobs submit -n linear -m trainer.task -u fds://cloud-ml/trainer-1.0.0.tar.gz -pc "ls /tmp" -fc "ls /tmp"

Access HDFs

Using the front-end command, we can initialize the security HDFS cluster. When used, it ensures that the configuration file of the Hadoop cluster is already in the container image and that the corresponding training data are uploaded to HDFS. The following command then executes.

cloudml jobs submit -n deep -m trainer.task -u fds://cloud-ml/trainer-1.0.tar.gz -pc "echo rdKxxxxxxTrnyYU | kinit u_chendihao@XIAOMI.HADOOP" -a "--train_file hdfs://namenode:port/deep_recommend_system/data/cancer_train.csv.tfrecords --validate_file hdfs://namenode:port/deep_recommend_system/data/cancer_test.csv.tfrecords"

Mount FDSFuse

We can also use FDSFuse to map the FDS data to the local container before executing training tasks, so that all Cloud-ML frameworks can access distributed storage as if it were local.

cloudml jobs submit -n deep -m trainer.task -u fds://cloud-ml/trainer-1.0.tar.gz -pc "export XIAOMI_ACCESS_KEY_ID="AKJUDVxxxxxxxx43UI" && export XIAOMI_SECRET_ACCESS_KEY="15xzSfTO2qYMmxxxxxxxxxxxxxsUbx96959ky" && export XIAOMI_FDS_ENDPOINT="cnbj1-fds.api.xiaomi.net" && fdsfuse testfdsfuse /fds -o use_cache=/fdscache" -a "--train_file=/fds/deep_recommend_system/data/cancer/cancer_train.csv.tfrecords --validate_file=/fds/deep_recommend_system/data/cancer/cancer_test.csv.tfrecords"

Mount S3fs

s3fs is similar to FDSFuse. To access the training data in AWS S3, one only needs to modify the front-end command below.

echo AKIAJxxxxxxxxxxxxx5CTQ:j0m+Xwe8jBQyCxxxxxxxxxxxxxxxxxxAQ3piA4 > /tmp/passwd && chmod 600 /tmp/passwd && s3fs tobebucket /s3 -o passwd_file=/tmp/passwd

Parameters introduction

  • -pc indicates a front-end command. It will execute before the training begins.
  • -fc indicates a back-end command. It will execute after the training has completed.