Paper Reading_Distributed System

zoukankan html css js c++ java

Paper Reading_Distributed System
最近（以及预感接下来的一年）会读很多很多的paper......不如开个帖子记录一下读paper心得

Mark一个上海交通大学东岳网络工作室的paper notebook

Mark一个大神的笔记

Edge computing

111

Cloud

Rethinking Adaptability in Wide-Area Stream Processing Systems

111

Multi-Query Optimization in Wide-Area Streaming Analytics

这是SOCC2018的一篇文章，关注的是如何在Geo-Distributed的情况下进行data analysis，提高WAN bandwidth的使用效率并提高性能。

在streaming analysis query中，很多情况下执行情况都是类似的，比如使用相同的input dataset或者perform the same data processing procedure（比如大家都用Twitter data，只是进行的analyze任务不同，有人做sentiment analysis，有人做topic啥的），这种就是本文所说的multi-query。本文的idea关注的就是optimizing multiple queries by applying multi-query optimization in a WAN-aware manner。这里有两个key point：multi-query optimization、WAN awareness（这是为保证multi-query的性能必须要加的）。

Streaming按照computional model又可以分为两类：dataflow和bulk-synchronous parallel。本文先focus on dataflow model，意思是data streams flow continuously
from data sources into the system and are transformed by a set of stream operators。streaming query可以被视为像SQL一样的查询语句。另外，本文中的系统是geo-distributed，意味着数据可能一开始存在A地，然后由B地的服务器处理，又交给C地的用户，数据就要飞过来飞过去。如果系统能对WAN的拓扑结构有所了解，就可以设计更优化的执行方案了。这也就是需要WAN-aware的原因。

Multi-Query Optimization这个概念也是在DB那边学来的，目的是identify the commonality between queries and potentially combine their executions to mitigate redundant executions。另外在我们这个场景中，Multi-Query Optimization需要be done in an online manner as new queries arrive by sharing any common execution incrementally。因为streaming analysis query通常都是deployed once and run indefinitely，中间停个机改一改是不现实的。2.2节的后半部分举了个例子：Query1和Query2虽然业务逻辑不完全相同，但仍然有很多可以share的元素（both queries partially share common input streams (US and EU) and perform similar data processing (e.g., filtering user info)），如果这些只执行一次就可以省下很多带宽。

下面详细介绍下这两个组件：
- Multi-Query Optimization：
- WAN-aware Optimization：
Wiera: Policy-Driven Multi-Tiered Geo-Distributed Cloud Storage System

本文关注的是Cloud Storage System的问题，重点关注在storage system具有很多tier（专门为不同application优化的，不同种类的Storage System。比如ElasticCache/S3/...）、分布在很多location（multi-DC）的情况下，如何进行data placement。目标是achieve desired fault tolerance or to serve a dispersed set of end-users。

本文提出了一个叫做Wiera的存储系统，将data placement的问题抽象成一个constrained optimization problem（比如minimize total cost），然后用一些算法来优化它。另外还要处理一些类似fault tolerance之类的问题。

看起来很像5105的pa3啊......一致性协议都用的一样的Quorum......

Streaming

TTL-based Approach for Data Aggregation in Geo-Distributed Streaming Analytics

这是一篇OSDI poster，后来转手就中了SIGMETRICS....orz

A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

就是上面poster对应的长文...

Consensus

Flexible Paxos: Quorum intersection revisited

https://www.bilibili.com/video/av70763749

https://www.zhihu.com/question/320838210

....
查看全文

相关阅读:
Datasource Server returns invalid timezone问题
 springboot之异常处理
 maven的安装配置
 Javajdk的安装
 jdbc连接mysql数据库 (idea)
关于MySQL数据库的卸载
 python3.6.8的安装及初步使用
 计算机基础及编程语言的简单了解
 git、码云的使用
 粘滞位权限

原文地址：https://www.cnblogs.com/pdev/p/12617042.html

Paper Reading_Distributed System

Edge computing

Cloud

Rethinking Adaptability in Wide-Area Stream Processing Systems

Multi-Query Optimization in Wide-Area Streaming Analytics

Wiera: Policy-Driven Multi-Tiered Geo-Distributed Cloud Storage System

Streaming

TTL-based Approach for Data Aggregation in Geo-Distributed Streaming Analytics

A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

Consensus

Flexible Paxos: Quorum intersection revisited