zoukankan      html  css  js  c++  java
  • OpenStack快照分析:(一)从镜像启动的云主机离线快照分析

    OpenStack虚拟机创建快照实际上是将虚拟机的磁盘创建为一个新的镜像,其操作实际就是创建镜像,我们可以通过dashboard页面或者命令行来调用对应的接口,创建快照的基本流程如下:

    . 获取token(获取token接口)

    . 查询虚拟机状态(查询接口)

    . 创建虚拟机快照

    可以通过OpenStack提供的CLI命令创建快照:

    通过OpenStack Dashboard或者nova命令可以发起快照,快照命令格式:

    nova image-create {server} {name}

    下面的命令对id=814a8ad8-9217-4c45-91c7-c2be2016e5da的云主机执行快照,快照名称为snapshot1

    nova image-create 814a8ad8-9217-4c45-91c7-c2be2016e5da snapshot1

    也可以通过curl命令来调用对应的api接口:

    curl -i http://186.100.8.214:8774/v2/814a8ad8-9217-4c45-91c7-c2be2016e5da/servers/6c2504f4-efa-47ec-b6f4-06a9fde8a00b/action -X POST -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: " -d '{"createImage": {"name": " snapshot1", "metadata": {}}}'

    可以看到,创建快照的实际操作关键词为createImage

    从镜像启动的云主机离线快照

    1.1. nova-api部分

    根据nova-api部分代码提供的API接口,结合使用curl命令调用创建快照的命令,我们可以很容易的知道快照函数入口是: nova/api/openstack/compute/servers.py/ServersController._action_create_image 下面一起来看看代码:

    @wsgi.response(202)
    @
    extensions.expected_errors((400, 403, 404, 409))
    @
    wsgi.action('createImage')
    @
    common.check_snapshots_enabled
    @validation.schema(schema_servers.create_image, '2.0', '2.0')
    @
    validation.schema(schema_servers.create_image, '2.1')
    def _action_create_image(self, req, id, body):
       
    """Snapshot a server instance.
       
    输入参数如下:
        req = Request
    对象,包含本次请求的上下文
        id = 814a8ad8-9217-4c45-91c7-c2be2016e5da
        body = {u'createImage': {u'name': u'snapshot1', u'metadata': {}}}
        """
       
    # 得到请求的上下文,并执行权限验证
      
    context = req.environ['nova.context']
        context.can(server_policies.SERVERS %
    'create_image')

       
    # body中获取快照的名称及相关属性
      
    entity = body["createImage"]
        image_name = common.normalize_name(entity[
    "name"])
        metadata = entity.get(
    'metadata', {})
        snapshot_id = entity.get(
    "snapshot_id", None)

       
    # Starting from microversion 2.39 we don't check quotas on createImage
       
    if api_version_request.is_supported(req, max_version=api_version_request.MAX_IMAGE_META_PROXY_API_VERSION):
            common.check_img_metadata_properties_quota(context
    , metadata)

       
    # nova数据库中获取实例信息,包括:metadatasystem_metadatasecurity_groups,
        # info_cache, flavor
    pci_devices等属性信息,并返回一个Instance V2对象
      
    instance = self._get_server(context, req, id)

        snapshot = snapshot_current(context
    , instance, self.compute_rpcapi)
       
    if snapshot:  # if there are snapshots, then create an image with snashots.
           
    if not snapshot_id:
                snapshot_id = snapshot[
    "id"]
            image = snapshot_create_image(context
    , snapshot_id, instance, self.compute_rpcapi, entity)
       
    else:
           
    # 从数据库中获取该实例关联的所有块设备,返回BlockDeviceMappingList对象
        
    bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(context, instance.uuid)
           
    try:
               
    if compute_utils.is_volume_backed_instance(context, instance, bdms):
                    context.can(server_policies.SERVERS %
    'create_image:allow_volume_backed')
                   
    # 这里判断系统磁盘类型是否是volume,如果是,说明是从磁盘启动的实例
               image = self.compute_api.snapshot_volume_backed(
                        context
    ,
                       
    instance,
                       
    image_name,
                       
    extra_properties=metadata
                    )
               
    else:
                   
    # 镜像启动的实例,执行快照走这个分支,调用:nova/compute/api.py/API执行快照
             
    image = self.compute_api.snapshot(context, instance, image_name,
                                                      
    extra_properties=metadata)
           
    except exception.InstanceUnknownCell as e:
               
    raise exc.HTTPNotFound(explanation=e.format_message())
           
    except exception.InstanceInvalidState as state_error:
                common.raise_http_conflict_for_instance_invalid_state(state_error
    'createImage', id)
           
    except exception.Invalid as err:
               
    raise exc.HTTPBadRequest(explanation=err.format_message())
           
    except exception.OverQuota as e:
               
    raise exc.HTTPForbidden(explanation=e.format_message())

       
    # Starting with microversion 2.45 we return a response body containing
        # the snapshot image id without the Location header.
       
    if api_version_request.is_supported(req, '2.45'):
           
    return {'image_id': image['id']}

       
    # build location of newly-created image entity
       
    image_id = str(image['id'])
        # 根据glance.conf配置,生成镜像url,我的例子中是:
        # http://$glance_host:$glance_port/images/'ffb841fd-d5f8-4146-bb29-b12eb5bbf6b2'

       
    image_ref = glance.generate_image_url(image_id)

        resp = webob.Response(
    status_int=202)
        resp.headers[
    'Location'] = image_ref
       
    return resp

    当执行镜像启动的快照后,就会调用nova/compute/api.py中的API.snapshot方法,代码分析如下:

    @check_instance_cell
    @check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,
                                   
    vm_states.PAUSED, vm_states.SUSPENDED])
    def snapshot(self, context, instance, name, extra_properties=None):
       
    """Snapshot the given instance.

        :param context:
    请求上下文
      
    :param instance: InstanceV2实例对象
      
    :param name:快照名 ‘snapshot1’
        :param extra_properties: dict of extra image properties to include
                                 when creating the image.
    快照属性 {}
        :returns: A dict containing image metadata
        """
       
    """
       
    glance数据库(images表)中添加一条类型为'snapshot'的条目,
        每个properties属性作为一条记录添加到image_properties表;
        {
         'status': u'queued',
         'name': u'snapshot1',
         'deleted': False,
         'container_format': u'bare',
         'created_at': datetime.datetime(2018,9,26,7,26,29,tzinfo=<iso8601.Utc>),
         'disk_format': u'raw',
         'updated_at': datetime.datetime(2018,9,26,7,26,29,tzinfo=<iso8601.Utc>),
         'id': u'ffb841fd-d5f8-4146-bb29-b12eb5bbf6b2',
         'owner': u'25520b29dce346d38bc4b055c5ffbfcb',
         'min_ram': 0,
         'checksum': None,
         'min_disk': 20,
         'is_public': False,
         'deleted_at': None,
         'properties': {
             u'image_type': u'snapshot',
             u'instance_uuid': u'814a8ad8-9217-4c45-91c7-c2be2016e5da',
             u'user_id': u'b652f9bd65844f739684a20ed77e9a0f',
             u'base_image_ref': u'e0cc468f-6501-4a85-9b19-70e782861387'
         },
         'size': 0
        }
       """

      
     # 调用glance api创建image entry,为后将snapshot上传为镜像做准备,
        #
    虽然镜像和snapshot在可以上传到glance作为镜像启动虚拟机,
        #
    但是为了区分二者的不同,glance将镜像和snapshot标记卫不同的类型:type=image type=snapshot
       
    image_meta = self._create_image(context, instance, name, 'snapshot',
                                       
    extra_properties=extra_properties)

       
    # NOTE(comstud): Any changes to this method should also be made
        # to the snapshot_instance() method in nova/cells/messaging.py
      
     # 更新实例的状态为:镜像快照等待中
      
    instance.task_state = task_states.IMAGE_SNAPSHOT_PENDING
      
     # 中间异常处理省略
        instance.save(
    expected_task_state=[None])

       
    # 通过rpc调用将消息‘snapshot_instance’投递到消息队列,nova-compute会接受该消息,做对应的处理
       
    self.compute_rpcapi.snapshot_instance(context, instance, image_meta['id'])

       
    return image_meta

    当代码走到self.compute_rpcapi.snapshot_instance(context, instance, image_meta['id']),会调用rpc,发送一个创建快照的消息到消息队列,rpc消息部分代码如下:

    def snapshot_instance(self, ctxt, instance, image_id):
        version =
    '4.0'
       
    cctxt = self.router.client(ctxt).prepare(
               
    server=_compute_host(None, instance), version=version)
        cctxt.cast(ctxt
    , 'snapshot_instance',
                  
    instance=instance,
                  
    image_id=image_id)

    梳理下流程:

    1. 用户发起create snapshot的请求;

    2. nova-api服务接收到这个请求并进行前期处理,即api中的snapshot方法;

    3. 真正的snapshot操作是需要在nova-compute节点上执行的,所以nova-api需要向nova-compute发送message

    由于OpenStack环境中会有很多个nova-compute,所以需要通过server=_compute_host(None, instance)来获取虚拟机所在的host,并向其发送message

    1.2. nova-compute部分

    nova-compute接收到来自nova-api发来的“snapshot_instance”快照请求后,nova-compute会调用 nova/compute/manager.py/ComputeManager.snapshot_instance方法处理该请求,如下:

    @wrap_exception()
    @
    reverts_task_state
    @wrap_instance_fault
    @delete_image_on_error
    def snapshot_instance(self, context, image_id, instance):
       
    """Snapshot an instance on this host.

        :param context: security context
        :param image_id: glance.db.sqlalchemy.models.Image.Id
        :param instance: a nova.objects.instance.Instance object

       
    该方法实现很简单:设置实例任务状态后,直接将请求转交给_snapshot_instance方法处理
        """

       
    try:
          
     # 更新实例的状态为快照中
            instance.task_state = task_states.IMAGE_SNAPSHOT
            instance.save(
    expected_task_state=task_states.IMAGE_SNAPSHOT_PENDING)
       
    except exception.InstanceNotFound:
           
    # possibility instance no longer exists, no point in continuing
           
    LOG.debug("Instance not found, could not set state %s for instance.",
                     
    task_states.IMAGE_SNAPSHOT, instance=instance)
           
    return

        except
    exception.UnexpectedDeletingTaskStateError:
            LOG.debug(
    "Instance being deleted, snapshot cannot continue", instance=instance)
           
    return
       
       
    self._snapshot_instance(context, image_id, instance, task_states.IMAGE_SNAPSHOT)

    snapshot_instance在做完基本处理后,实际上是调用self._snapshot_instance(context, image_id, instance, task_states.IMAGE_SNAPSHOT)来实现具体的快照功能,如下(去掉异常处理):

    def _snapshot_instance(self, context, image_id, instance, expected_task_state):
        context = context.elevated()

      
     # 获取虚拟机的电源状态
      instance.power_state = self._get_power_state(context, instance)
        instance.save()
        LOG.info(
    'instance snapshotting', instance=instance)

       
    # 若虚拟机处于非运行状态,记录告警日志
      
    if instance.power_state != power_state.RUNNING:
            state = instance.power_state
            running = power_state.RUNNING
            LOG.warning(
    'trying to snapshot a non-running instance: '
                        '(state: %(state)s expected: %(running)s)'
    ,
                        
    {'state': state, 'running': running},
                        
    instance=instance)
        
    # 通过“notifier”发送“snapshot.start”通知消息,改消息应该是投递给ceilometer
      
    self._notify_about_instance_usage(context, instance, "snapshot.start")
        compute_utils.notify_about_instance_action(
            context
    , instance, self.host, action=fields.NotificationAction.SNAPSHOT,
            
    phase=fields.NotificationPhase.START)

       
    # 实例状态更新辅助函数
     
    def update_task_state(task_state, expected_state=expected_task_state):
            instance.task_state = task_state
            instance.save(
    expected_task_state=expected_state)

       
    # 调用LibvirtDriver.snapshot执行快照具体操作
      
    self.driver.snapshot(context, instance, image_id, update_task_state)

       
    # 更新虚拟机的状态为None
       
    instance.task_state = None
        
    instance.save(expected_task_state=task_states.IMAGE_UPLOADING)

        
    # 通过“notifier”发送一个"snapshot.end"消息,通知ceilometer快照结束
      
    self._notify_about_instance_usage(context, instance, "snapshot.end")
        compute_utils.notify_about_instance_action(
            context
    , instance,
            
    self.host, action=fields.NotificationAction.SNAPSHOT,
            
    phase=fields.NotificationPhase.END)

    通过以上代码,可以看到执行快照实际上是调用libvirt的具体接口来做的,即调用“self.driver.snapshot”来做快照(代码位置:nova/virt/libvirt/driver.py/LibvirtDriver.snapshot):

    def snapshot(self, context, instance, image_id, update_task_state):
       
    """Create snapshot from a running VM instance.
        This command only works with qemu 0.14+
        """
       
    try:
           
    # 通过libvirt获取instance对应的virDomain对象
             
    guest = self._host.get_guest(instance)
            virt_dom = guest._domain
       
    except exception.InstanceNotFound:
           
    raise exception.InstanceNotRunning(instance_id=instance.uuid)

       
    # glance数据库中获取快照的信息,该信息在调用nova-api时已经记录到数据库中
        snapshot =
    self._image_api.get(context, image_id)

        # 这一步是要从实例的xml文件中解析出实例的磁盘信息,包括磁盘路径disk_path和磁盘格式
        # source_format is an on-disk format
    ,如raw
        disk_path, source_format = libvirt_utils.find_disk(guest)
       
    # source_type is a backend type,解析出该disk_path的后端存储类型,如rbd,或者思华的flexblock
        source_type = libvirt_utils.get_disk_type_from_path(disk_path)
        LOG.info(
    'disk_path: %s', disk_path)
        # 修正后端存储类型及快照磁盘类型
        #
    如果未能从磁盘路径中解析出后端存储类型,就用磁盘格式类型作为后端类型
        #
    使用'snapshot_image_format '或者后端存储类型作为快照磁盘类型,
        #
    如果快照类型为lvm或者rbd,就修改为raw格式
       
    if source_type is None:
            source_type = source_format
        image_format = CONF.libvirt.snapshot_image_format
    or source_type
       
    if image_format == 'lvm' or image_format == 'rbd'  or image_format == 'flexblock':
            image_format =
    'raw'
             """
    根据系统盘镜像属性,快照属性及快照磁盘格式生成快照属性字典,
               
    用来上传快照文件时更新glance数据库条目,属性字典信息如下:
             {
             'status': 'active',
             'name': u'snapshot1',
             'container_format': u'bare',
             'disk_format': 'raw',
             'is_public': False, 
             'properties': {
                     'kernel_id': u'',
                     'image_location': 'snapshot',
                     'image_state': 'available',
                     'ramdisk_id': u'',
                      'owner_id': u'25520b29dce346d38bc4b055c5ffbfcb'
                      }
             }
             """
       
    metadata = self._create_snapshot_metadata(instance.image_meta, instance,
                                                 
    image_format,
                                                 
    snapshot['name'])
        # 本地的临时快照文件名
        snapshot_name = uuid.uuid4().hex
       
    # 获取实例电源状态,用来判断是执行在线快照还是离线快照
        state = guest.get_power_state(
    self._host)

        """判断是执行在线快照还是离线快照,在线快照需要同时满足下面的条件:
            1. QEMU >= 1.3 &&
     libvirt >= 1.0.0
            2. nova
    后端存储非lvm或者rbd
            3.
    未开启外部存储加密功能 ephemeral_storage_encryption = False
            4.
    未关闭在线快照disable_libvirt_livesnapshot = False
        """

       
    if (self._host.has_min_version(hv_type=host.HV_DRIVER_QEMU)
            
    and source_type not in ('lvm', 'rbd', 'flexblock')
            
    and not CONF.ephemeral_storage_encryption.enabled
            
    and not CONF.workarounds.disable_libvirt_livesnapshot):
            live_snapshot =
    True
           
    # Abort is an idempotent operation, so make sure any block
            # jobs which may have failed are ended. This operation also
            # confirms the running instance, as opposed to the system as a
            # whole, has a new enough version of the hypervisor (bug 1193146).
           
    try:
                guest.get_block_device(disk_path).abort_job()
           
    except libvirt.libvirtError as ex:
                error_code = ex.get_error_code()
               
    if error_code == libvirt.VIR_ERR_CONFIG_UNSUPPORTED:
                    live_snapshot =
    False
               
    else:
                   
    pass
        else
    :

    # 比如后端存储使用的是ceph RBD,则执行的快照即为离线快照
            live_snapshot =
    False

       
    # NOTE(rmk): We cannot perform live snapshots when a managedSave
        #            file is present, so we will use the cold/legacy method
        #            for instances which are shutdown.

    # 在管理状态下执行离线快照
       
    if state == power_state.SHUTDOWN:
           
    live_snapshot =False

       
    # 如果采取的是非“LXC”虚拟化,在执行并且实例处于运行或者暂停状态时,在快照前需要卸载pci设备及sriov端口
       
    self._prepare_domain_for_snapshot(context, live_snapshot, state, instance)
       
    """
           “_prepare_domain_for_snapshot”就是在判断底层虚拟化的类型和处理实例的设备,内容为:
           def _prepare_domain_for_snapshot(self, context, live_snapshot, state, instance):

               if CONF.libvirt.virt_type != 'lxc' and not live_snapshot:
                 if state == power_state.RUNNING or state == power_state.PAUSED:
                     self.suspend(context, instance)
         
    调用到了suspend方法,来卸载pci设备和sriov端口:
          def suspend(self, context, instance):
             """Suspend the specified instance."""
             guest = self._host.get_guest(instance)
             self._detach_pci_devices(guest, pci_manager.get_instance_pci_devs(instance))
             self._detach_direct_passthrough_ports(context, instance, guest)
             guest.save_memory_state()
        """

        root_disk =
    self.image_backend.by_libvirt_path(instance, disk_path, image_type=source_type)
        LOG.info(
    'root_disk: %s', root_disk)
       
       
    # 显示不同类型快照的日志
       
    if live_snapshot:
            LOG.info(
    "Beginning live snapshot process", instance=instance)
       
    else:
            LOG.info(
    "Beginning cold snapshot process", instance=instance)
       
    # 当在调用“driver.snapshot”时,会给snapshot传递一个辅助函数“update_task_state”,这里进行调用,实际上也就是更新一下虚拟机的状态为“IMAGE_PENDING_UPLOAD”和“IMAGE_UPLOADING”,然后更新metadata信息。
        update_task_state(
    task_state=task_states.IMAGE_PENDING_UPLOAD)

       
    try:
            update_task_state(
    task_state=task_states.IMAGE_UPLOADING,
                             
    expected_state=task_states.IMAGE_PENDING_UPLOAD)
            metadata[
    'location'] = root_disk.direct_snapshot(
                context
    , snapshot_name, image_format, image_id,
               
    instance.image_ref)
           
    self._snapshot_domain(context, live_snapshot, virt_dom, state, instance)
           
    self._image_api.update(context, image_id, metadata, purge_props=False)
       
    except (NotImplementedError, exception.ImageUnacceptable, exception.Forbidden) as e:
           
    if type(e) != NotImplementedError:
                LOG.warning(
    'Performing standard snapshot because direct '
                            'snapshot failed: %(error)s'
    , {'error': e})
            failed_snap = metadata.pop(
    'location', None)
           
    if failed_snap:
                failed_snap = {
    'url': str(failed_snap)}
            root_disk.cleanup_direct_snapshot(failed_snap
    ,
                                                 
    also_destroy_volume=True,
                                                 
    ignore_errors=True)
            update_task_state(
    task_state=task_states.IMAGE_PENDING_UPLOAD,
                             
    expected_state=task_states.IMAGE_UPLOADING)
           
    # TODO(nic): possibly abstract this out to the root_disk
           
    if source_type in ('rbd','flexblock') and live_snapshot:
              
     # 当出现异常时(更新虚拟机状态时失败),将在线快照离线
                # Standard snapshot uses qemu-img convert from RBD which is
                # not safe to run with live_snapshot.
               
    live_snapshot = False
               
    # Suspend the guest, so this is no longer a live snapshot
                
    self._prepare_domain_for_snapshot(context, live_snapshot, state, instance)
           
    # 从配置文件中获取生成本地快照的存放路径,例如/opt/nova/data/nova/instances/snapshots
            snapshot_directory = CONF.libvirt.snapshots_directory
            fileutils.ensure_tree(snapshot_directory)
           
    # 接着需要生成一个临时的目录
           
    with utils.tempdir(dir=snapshot_directory) as tmpdir:
               
    try:
                   
    # 拼接出完整的快照文件路径
                    out_path = os.path.join(tmpdir
    , snapshot_name)
                    LOG.info(
    'out_path: %s', out_path)
                   
    if live_snapshot:
                       
    # NOTE(xqueralt): libvirt needs o+x in the tempdir
                       
    # 在线快照需要设定快照文件的访问权限为701
                        os.chmod(tmpdir, 0o701)
                       
    self._live_snapshot(context, instance, guest,
                                           
    disk_path, out_path, source_format,
                                           
    image_format, instance.image_meta)
                       
    # 调用后端存储驱动执行快照,Rbd.snapshot_extract,内部实现
                        #
    调用'qemu-img convert'拷贝系统磁盘到out_path文件中,命令如下:
                   """
                      qemu-img convert -O raw rbd:vms/814a8ad8-9217-
                      4c45-91c7-c2be2016e5da_disk:id=cinder:
                      conf=/etc/ceph/ceph.conf'
                      /opt/stack/data/nova/instances/snapshots/tmptR6hog/e44639af86434069b38f835847083697
                   """

                    else:
                        root_disk.snapshot_extract(out_path
    , image_format)
               
    finally:
                 
      # 上文卸载了pci设备及sriov端口,快照完成后需要重新挂载上
                   
    self._snapshot_domain(context, live_snapshot, virt_dom,
                                         
    state, instance)
                    LOG.info(
    "Snapshot extracted, beginning image upload", instance=instance)
               
    # Upload that image to the image service
               
    # 接着再次调用传递进来的辅助函数,更新实例的状态为“IMAGE_UPLOADING”
               
    update_task_state(task_state=task_states.IMAGE_UPLOADING,
                       
    expected_state=task_states.IMAGE_PENDING_UPLOAD)
               
    # 最后一步,通过glance api将快照文件上传到后端存储,过程类似于上传镜像
               
    with libvirt_utils.file_open(out_path, 'rb') as image_file:
                   
    self._image_api.update(context, image_id, metadata, image_file)
       
    except Exception:
           
    with excutils.save_and_reraise_exception():
                LOG.exception(_(
    "Failed to snapshot image"))
                failed_snap = metadata.pop(
    'location', None)
               
    if failed_snap:
                    failed_snap = {
    'url': str(failed_snap)}
                root_disk.cleanup_direct_snapshot(failed_snap
    , also_destroy_volume=True, ignore_errors=True)

        LOG.info(
    "Snapshot image upload complete", instance=instance)

    到这里,从镜像启动的云主机的离线快照就分析完了,总结如下:

    l  快照时,需要先在本地生成临时快照,再上传到glance,效率比较低

    l  快照过程中,云主机包括如下任何状态转换:(None)镜像快照等待中 -> 快照中 -> 等待镜像上传 -> 上传镜像中 -> None

    l  如果novalvm或者ceph rbd做后端存储,则任何情况下都不支持在线快照

    l  openstack中的实例快照以镜像形式存储在glance中,不同于通常理解的快照用于数据恢复

  • 相关阅读:
    Oracle rownum用法、分页
    Oracle 序列(查询序列的值,修改序列的值)
    Photoshop 更换证件照底色
    Oracle 新建用户、赋予权限
    Oracle-SQL 建表
    SQL decode 函数的用法
    英语词汇800常用20类
    c语言常用排序
    js时间戳总结
    Javascript之编译器
  • 原文地址:https://www.cnblogs.com/qianyeliange/p/9712853.html
Copyright © 2011-2022 走看看