Swift/修复再平衡和Golang

症状

再平衡速度慢，尤其是在高密度服务器上
最终用户请求的延迟不确定
难以监控，并且需要大量干预才能从糟糕的情况中恢复（例如集群已满）

问题

Swift 不在 rsync 的传输数据路径中
磁盘扫描过多
任务调度/查找要执行的工作效率低下
eventlet hub 无法触及磁盘
- 缓解措施：使用大量进程——在 Python 中“容易”，但难以协调工作
- 解决方案：使用非阻塞 IO——“困难的重写”，但有效地解决了问题

正在进行的修复工作

用于数据移动的 tsync 协议
- 将 Swift 放入数据路径（对于实际传输和写入磁盘（与 rsync 相比）更有效）
- 使用外部且受支持的数据传输和线路协议，而不是我们自己发明的协议（http2+grpc 与 repconn 或 ssync）
- 另请参阅 https://etherpad.openstack.org/p/swift-rebalance
改进重构器和复制器中的工作调度
- 线程而非 eventlet
- 更多并发性 == 更快（达到硬件限制）
- 识别要执行的工作（重建与再平衡；包括来自 tsync 的反压）
修复代理<->存储协议（不能依赖于我们当前框架中的定制功能）
Golang 对象服务器本身，以便更有效地接收网络数据并将其写入磁盘

如何实现（可能会更改）

   0. hummingbird branch is an interesting R&D reference but not going to be merged (done)
   1. make replication/reconstruction tolerable to the point that we can make it fast by changing a config value (more workers, more connections, etc) (nearly done)
   2. build a better scheduler for consistency engine work
   2. build the tsync protocol
   now: build a feature-complete golang object server (might or might not borrow from hummingbird)
   now: infra/devstack CI work (ie swift consumable in the gate)
   now: ask other deployment projects what needs to be done to make them happy with swift as a golang thing (eg kolla, ansible, tripleo, etc)

Swift/修复再平衡和Golang

目录

症状

问题

正在进行的修复工作

如何实现（可能会更改）