Trainjob process

Use process

The process of using TrainJob is as follows: Initialize the Xiaomi Cloud-ML client environment.

cloudml init

Then package the model code and upload it to FDS.

mkdir trainer

touch trainer/__init__.py

curl "https://raw.githubusercontent.com/XiaoMi/cloud-ml-sdk/master/cloud_ml_samples/tensorflow/linear_regression/trainer/task.py" > trainer/task.py

cat << EOF > setup.py
import setuptools
setuptools.setup(name='trainer', version='1.0', packages=['trainer'])
EOF

python setup.py sdist --format=gztar

Finally, use the Cloud-ML command to submit.

cloudml jobs submit -n linear -m trainer.task -u fds://cloud-ml/linear/trainer-1.0.tar.gz -a "--model_path fds://cloud-ml/linear_model --output_path fds://cloud-ml/linear_tensorboard"

After the training task launches, you can view the status and log information immediately after the task is submitted.

cloudml jobs events linear

cloudml jobs logs linear

Parameters introduction

  • -n is a mandatory parameter allowing users to choose the name of the task.
  • -m is a mandatory parameter that must correspond to the user-packaged Python module name.
  • -u is a mandatory parameter that must correspond to the path uploaded by the user to FDS.
  • -a is an optional parameter allowing users to import any user-defined parameters when submitting a task.

Additional features

TrainJob also supports such features as GPU training, hyper-parameter auto-tuning and can continue to read later documents.