zoukankan      html  css  js  c++  java
  • sagemaker-containers 2.6.0

    https://pypi.org/project/sagemaker-containers/

    Project description
    SageMaker Containers
    Code style: black
    SageMaker Containers gives you tools to create SageMaker-compatible Docker containers, and has additional tools for letting you create Frameworks (SageMaker-compatible Docker containers that can run arbitrary Python or shell scripts).
    
    Currently, this library is used by the following containers: TensorFlow Script Mode, MXNet, PyTorch, Chainer, and Scikit-learn.
    
    Contents
    
    SageMaker Containers
    Getting Started
    Creating a container using SageMaker Containers
    The Dockerfile
    Building the container
    Training with Local Mode
    How a script is executed inside the container
    Mapping hyperparameters to script arguments
    Reading additional information from the container
    IMPORTANT ENVIRONMENT VARIABLES
    SM_MODEL_DIR
    SM_CHANNELS
    SM_CHANNEL_{channel_name}
    SM_HPS
    SM_HP_{hyperparameter_name}
    SM_CURRENT_HOST
    SM_HOSTS
    SM_NUM_GPUS
    List of provided environment variables by SageMaker Containers
    SM_NUM_CPUS
    SM_LOG_LEVEL
    SM_NETWORK_INTERFACE_NAME
    SM_USER_ARGS
    SM_INPUT_DIR
    SM_INPUT_CONFIG_DIR
    SM_OUTPUT_DATA_DIR
    SM_RESOURCE_CONFIG
    SM_INPUT_DATA_CONFIG
    SM_TRAINING_ENV
    Getting Started
    Creating a container using SageMaker Containers
    Here we’ll demonstrate how to create a Docker image using SageMaker Containers in order to show the simplicity of using this library.
    
    Let’s suppose we need to train a model with the following training script train.py using TF 2.0 in SageMaker:
    
    import tensorflow as tf
    
    mnist = tf.keras.datasets.mnist
    
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    
    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    model.fit(x_train, y_train, epochs=1)
    
    model.evaluate(x_test, y_test)
    The Dockerfile
    We then create a Dockerfile with our dependencies and define the program that will be executed in SageMaker:
    
    FROM tensorflow/tensorflow:2.0.0a0
    
    RUN pip install sagemaker-containers
    
    # Copies the training code inside the container
    COPY train.py /opt/ml/code/train.py
    
    # Defines train.py as script entry point
    ENV SAGEMAKER_PROGRAM train.py
    More documentation on how to build a Docker container can be found here
    
    Building the container
    We then build the Docker image using docker build:
    
    docker build -t tf-2.0 .
    Training with Local Mode
    We can use Local Mode to test the container locally:
    
    from sagemaker.estimator import Estimator
    
    estimator = Estimator(image_name='tf-2.0',
                          role='SageMakerRole',
                          train_instance_count=1,
                          train_instance_type='local')
    
    estimator.fit()
    After using Local Mode, we can push the image to ECR and run a SageMaker training job. To see a complete example on how to create a container using SageMaker Container, including pushing it to ECR, see the example notebook tensorflow_bring_your_own.ipynb.
    
    How a script is executed inside the container
    The training script must be located under the folder /opt/ml/code and its relative path is defined in the environment variable SAGEMAKER_PROGRAM. The following scripts are supported:
    
    Python scripts: uses the Python interpreter for any script with .py suffix
    Shell scripts: uses the Shell interpreter to execute any other script
    When training starts, the interpreter executes the entry point, from the example above:
    
    python train.py
    Mapping hyperparameters to script arguments
    Any hyperparameters provided by the training job will be passed by the interpreter to the entry point as script arguments. For example the training job hyperparameters:
    
    {"HyperParameters": {"batch-size": 256, "learning-rate": 0.0001, "communicator": "pure_nccl"}}
    Will be executed as:
    
    ./user_script.sh --batch-size 256 --learning_rate 0.0001 --communicator pure_nccl
    The entry point is responsible for parsing these script arguments. For example, in a Python script:
    
    import argparse
    
    if __name__ == '__main__':
      parser = argparse.ArgumentParser()
    
      parser.add_argument('--learning-rate', type=int, default=1)
      parser.add_argument('--batch-size', type=int, default=64)
      parser.add_argument('--communicator', type=str)
      parser.add_argument('--frequency', type=int, default=20)
    
      args = parser.parse_args()
      ...
    Reading additional information from the container
    Very often, an entry point needs additional information from the container that is not available in hyperparameters. SageMaker Containers writes this information as environment variables that are available inside the script. For example, the training job below includes the channels training and testing:
    
    from sagemaker.pytorch import PyTorch
    
    estimator = PyTorch(entry_point='train.py', ...)
    
    estimator.fit({'training': 's3://bucket/path/to/training/data',
                   'testing': 's3://bucket/path/to/testing/data'})
    The environment variable SM_CHANNEL_{channel_name} provides the path were the channel is located:
    
    import argparse
    import os
    
    if __name__ == '__main__':
      parser = argparse.ArgumentParser()
    
      ...
    
      # reads input channels training and testing from the environment variables
      parser.add_argument('--training', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
      parser.add_argument('--testing', type=str, default=os.environ['SM_CHANNEL_TESTING'])
    
      args = parser.parse_args()
      ...
    When training starts, SageMaker Containers will print all available environment variables.
    
    IMPORTANT ENVIRONMENT VARIABLES
    These environment variables are those that you’re likely to use when writing a user script. A full list of environment variables is given below.
    
    SM_MODEL_DIR
    SM_MODEL_DIR=/opt/ml/model
    When the training job finishes, the container will be deleted including its file system with exception of the /opt/ml/model and /opt/ml/output folders. Use /opt/ml/model to save the model checkpoints. These checkpoints will be uploaded to the default S3 bucket. Usage example:
    
    import os
    
    # using it in argparse
    parser.add_argument('model_dir', type=str, default=os.environ['SM_MODEL_DIR'])
    
    # using it as variable
    model_dir = os.environ['SM_MODEL_DIR']
    
    # saving checkpoints to model dir in chainer
    serializers.save_npz(os.path.join(os.environ['SM_MODEL_DIR'], 'model.npz'), model)
    For more information, see: How Amazon SageMaker Processes Training Output.
    
    SM_CHANNELS
    SM_CHANNELS='["testing","training"]'
    Contains the list of input data channels in the container.
    
    When you run training, you can partition your training data into different logical “channels”. Depending on your problem, some common channel ideas are: “training”, “testing”, “evaluation” or “images” and “labels”.
    
    SM_CHANNELS includes the name of the available channels in the container as a JSON encoded list. Usage example:
    
    import os
    import json
    
    # using it in argparse
    parser.add_argument('channel_names', default=json.loads(os.environ['SM_CHANNELS'])))
    
    # using it as variable
    channel_names = json.loads(os.environ['SM_CHANNELS']))
    SM_CHANNEL_{channel_name}
    SM_CHANNEL_TRAINING='/opt/ml/input/data/training'
    SM_CHANNEL_TESTING='/opt/ml/input/data/testing'
    Contains the directory where the channel named channel_name is located in the container. Usage examples:
    
    import os
    import json
    
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TESTING'])
    
    
    args = parser.parse_args()
    
    train_file = np.load(os.path.join(args.train, 'train.npz'))
    test_file = np.load(os.path.join(args.test, 'test.npz'))
    SM_HPS
    SM_HPS='{"batch-size": "256", "learning-rate": "0.0001","communicator": "pure_nccl"}'
    Contains a JSON encoded dictionary with the user provided hyperparameters. Example usage:
    
    import os
    import json
    
    hyperparameters = json.loads(os.environ['SM_HPS']))
    # {"batch-size": 256, "learning-rate": 0.0001, "communicator": "pure_nccl"}
    SM_HP_{hyperparameter_name}
    SM_HP_LEARNING-RATE=0.0001
    SM_HP_BATCH-SIZE=10000
    SM_HP_COMMUNICATOR=pure_nccl
    Contains value of the hyperparameter named hyperparameter_name. Usage examples:
    
    learning_rate = float(os.environ['SM_HP_LEARNING-RATE'])
    batch_size = int(os.environ['SM_HP_BATCH-SIZE'])
    comminicator = os.environ['SM_HP_COMMUNICATOR']
    SM_CURRENT_HOST
    SM_CURRENT_HOST=algo-1
    The name of the current container on the container network. Usage example:
    
    import os
    
    # using it in argparse
    parser.add_argument('current_host', type=str, default=os.environ['SM_CURRENT_HOST'])
    
    # using it as variable
    current_host = os.environ['SM_CURRENT_HOST']
    SM_HOSTS
    SM_HOSTS='["algo-1","algo-2"]'
    JSON encoded list containing all the hosts . Usage example:
    
    import os
    import json
    
    # using it in argparse
    parser.add_argument('hosts', type=str, default=json.loads(os.environ['SM_HOSTS']))
    
    # using it as variable
    hosts = json.loads(os.environ['SM_HOSTS'])
    SM_NUM_GPUS
    SM_NUM_GPUS=1
    The number of gpus available in the current container. Usage example:
    
    import os
    
    # using it in argparse
    parser.add_argument('num_gpus', type=int, default=os.environ['SM_NUM_GPUS'])
    
    # using it as variable
    num_gpus = int(os.environ['SM_NUM_GPUS'])
    List of provided environment variables by SageMaker Containers
    SM_NUM_CPUS
    SM_NUM_CPUS=32
    The number of cpus available in the current container. Usage example:
    
    # using it in argparse
    parser.add_argument('num_cpus', type=int, default=os.environ['SM_NUM_CPUS'])
    
    # using it as variable
    num_cpus = int(os.environ['SM_NUM_CPUS'])
    SM_LOG_LEVEL
    SM_LOG_LEVEL=20
    The current log level in the container. Usage example:
    
    import os
    import logging
    
    logger = logging.getLogger(__name__)
    
    logger.setLevel(int(os.environ.get('SM_LOG_LEVEL', logging.INFO)))
    SM_NETWORK_INTERFACE_NAME
    SM_NETWORK_INTERFACE_NAME=ethwe
    Name of the network interface, useful for distributed training. Usage example:
    
    # using it in argparse
    parser.add_argument('network_interface', type=str, default=os.environ['SM_NETWORK_INTERFACE_NAME'])
    
    # using it as variable
    network_interface = os.environ['SM_NETWORK_INTERFACE_NAME']
    SM_USER_ARGS
    SM_USER_ARGS='["--batch-size","256","--learning_rate","0.0001","--communicator","pure_nccl"]'
    JSON encoded list with the script arguments provided for training.
    
    SM_INPUT_DIR
    SM_INPUT_DIR=/opt/ml/input/
    The path of the input directory, e.g. /opt/ml/input/ The input_dir, e.g. /opt/ml/input/, is the directory where SageMaker saves input data and configuration files before and during training.
    
    SM_INPUT_CONFIG_DIR
    SM_INPUT_CONFIG_DIR=/opt/ml/input/config
    The path of the input configuration directory, e.g. /opt/ml/input/config/. The directory where standard SageMaker configuration files are located, e.g. /opt/ml/input/config/.
    
    SageMaker training creates the following files in this folder when training starts:
    
    hyperparameters.json: Amazon SageMaker makes the hyperparameters in a CreateTrainingJob request available in this file.
    inputdataconfig.json: You specify data channel information in the InputDataConfig parameter in a CreateTrainingJob request. Amazon SageMaker makes this information available in this file.
    resourceconfig.json: name of the current host and all host containers in the training.
    More information about this files can be find here: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html
    
    SM_OUTPUT_DATA_DIR
    SM_OUTPUT_DATA_DIR=/opt/ml/output/data/algo-1
    The dir to write non-model training artifacts (e.g. evaluation results) which will be retained by SageMaker, e.g. /opt/ml/output/data.
    
    As your algorithm runs in a container, it generates output including the status of the training job and model and output artifacts. Your algorithm should write this information to the this directory.
    
    SM_RESOURCE_CONFIG
    SM_RESOURCE_CONFIG='{"current_host":"algo-1","hosts":["algo-1","algo-2"]}'
    The contents from /opt/ml/input/config/resourceconfig.json. It has the following keys:
    
    current_host: The name of the current container on the container network. For example, 'algo-1'.
    hosts: The list of names of all containers on the container network, sorted lexicographically. For example, ['algo-1', 'algo-2', 'algo-3'] for a three-node cluster.
    For more information about resourceconfig.json: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-dist-training
    
    SM_INPUT_DATA_CONFIG
    SM_INPUT_DATA_CONFIG='{
        "testing": {
            "RecordWrapperType": "None",
            "S3DistributionType": "FullyReplicated",
            "TrainingInputMode": "File"
        },
        "training": {
            "RecordWrapperType": "None",
            "S3DistributionType": "FullyReplicated",
            "TrainingInputMode": "File"
        }
    }'
    Input data configuration from /opt/ml/input/config/inputdataconfig.json.
    
    For more information about inpudataconfig.json: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-dist-training
    
    SM_TRAINING_ENV
    SM_TRAINING_ENV='
    {
        "channel_input_dirs": {
            "test": "/opt/ml/input/data/testing",
            "train": "/opt/ml/input/data/training"
        },
        "current_host": "algo-1",
        "framework_module": "sagemaker_chainer_container.training:main",
        "hosts": [
            "algo-1",
            "algo-2"
        ],
        "hyperparameters": {
            "batch-size": 10000,
            "epochs": 1
        },
        "input_config_dir": "/opt/ml/input/config",
        "input_data_config": {
            "test": {
                "RecordWrapperType": "None",
                "S3DistributionType": "FullyReplicated",
                "TrainingInputMode": "File"
            },
            "train": {
                "RecordWrapperType": "None",
                "S3DistributionType": "FullyReplicated",
                "TrainingInputMode": "File"
            }
        },
        "input_dir": "/opt/ml/input",
        "job_name": "preprod-chainer-2018-05-31-06-27-15-511",
        "log_level": 20,
        "model_dir": "/opt/ml/model",
        "module_dir": "s3://sagemaker-{aws-region}-{aws-id}/{training-job-name}/source/sourcedir.tar.gz",
        "module_name": "user_script",
        "network_interface_name": "ethwe",
        "num_cpus": 4,
        "num_gpus": 1,
        "output_data_dir": "/opt/ml/output/data/algo-1",
        "output_dir": "/opt/ml/output",
        "resource_config": {
            "current_host": "algo-1",
            "hosts": [
                "algo-1",
                "algo-2"
            ]
        }
    }'
    Provides the entire training information as a JSON-encoded dictionary.
  • 相关阅读:
    云计算是什么?它有哪些形式?
    TensorFlow从0到1之浅谈深度学习(5)
    excel如何快速统计出某一分类的最大值?
    Excel怎样根据出生日期,快速计算出其年龄呢?
    Excel只想显示一部分日期,怎样把其余部分隐藏起来?
    Excel数据透视表的日常应用技巧
    人工智能(机器学习)学习之路推荐
    人工智能之常用数据结构与算法(python)
    excel如何快速计算日期对应的生肖?
    excel 如何制作带下拉框的动态折线图表
  • 原文地址:https://www.cnblogs.com/cloudrivers/p/11930824.html
Copyright © 2011-2022 走看看