zoukankan      html  css  js  c++  java
  • cassandra写数据CommitLog

    cassandra
    两种方式:

    Cassandra-ArchitectureCommitLog

    Cassandra持久化-Durability

    一种是配置commitlog_sync为periodic,定期模式;另外一种是batch,

    默认(Cassandra1.2.19/3.0.0)为periodic,定期10000ms

    #commitlog_sync: batch
    #commitlog_sync_batch_window_in_ms: 50
    commitlog_sync: periodic
    commitlog_sync_period_in_ms: 10000
    

    这里如果是periodic模式潜在丢数据的风险,来看看两种实现方式,大致调用顺序

    StorageProxy. ->WritePerformer.apply()->counterWriteTask()/sendToHintedEndpoints()->((CounterMutation/mutation).apply()->Mutation.apply()->Keyspace.apply()->CommitLog.instance.add(mutation),主要看CommitLog.instance.add(mutation)
    

    CommitLog.instance.add(mutation)

    public ReplayPosition add(Mutation mutation)
    {
        assert mutation != null;
    
        long size = Mutation.serializer.serializedSize(mutation, MessagingService.current_version);
    
        long totalSize = size + ENTRY_OVERHEAD_SIZE;
        if (totalSize > MAX_MUTATION_SIZE)
        {
            throw new IllegalArgumentException(String.format("Mutation of %s bytes is too large for the maxiumum size of %s",
                                                             totalSize, MAX_MUTATION_SIZE));
        }
    
        Allocation alloc = allocator.allocate(mutation, (int) totalSize);
        try
        {
            ICRC32 checksum = CRC32Factory.instance.create();
            final ByteBuffer buffer = alloc.getBuffer();
            BufferedDataOutputStreamPlus dos = new DataOutputBufferFixed(buffer);
    
            // checksummed length
            dos.writeInt((int) size);
            checksum.update(buffer, buffer.position() - 4, 4);
            buffer.putInt(checksum.getCrc());
    
            int start = buffer.position();
            // checksummed mutation
            Mutation.serializer.serialize(mutation, dos, MessagingService.current_version);
            checksum.update(buffer, start, (int) size);
            buffer.putInt(checksum.getCrc());
        }
        catch (IOException e)
        {
            throw new FSWriteError(e, alloc.getSegment().getPath());
        }
        finally
        {
            alloc.markWritten();
        }
        executor.finishWriteFor(alloc);
        return alloc.getReplayPosition();
        }
    

    这里主要写buffer,没有刷盘,这时会有两种方式,就是之前说的periodic与batch,主要看 executor.finishWriteFor(alloc),起里边调用了maybeWaitForSync(),是一个抽像的,在BatchCommitLogService与PeriodicCommitLogService中实现

    public void finishWriteFor(Allocation alloc)
    {
        maybeWaitForSync(alloc);
        written.incrementAndGet();
    }
    protected abstract void maybeWaitForSync(Allocation alloc);
    

    BatchCommitLogService中实现

    protected void maybeWaitForSync(CommitLogSegment.Allocation alloc)
    {
        // wait until record has been safely persisted to disk
        pending.incrementAndGet();
        alloc.awaitDiskSync(commitLog.metrics.waitingOnCommit);
        pending.decrementAndGet();
    }
    void waitForSync(int position, Timer waitingOnCommit)
    {
        while (lastSyncedOffset < position)
        {
            WaitQueue.Signal signal = waitingOnCommit != null ?
                                      syncComplete.register(waitingOnCommit.time()) :
                                      syncComplete.register();
            if (lastSyncedOffset < position)
                signal.awaitUninterruptibly();
            else
                signal.cancel();
        }
    }
    

    这里面如果lastSyncedOffset < position是会一直等待的,知道lastSyncedOffset>=position,即当前alloc对应的buffer已被flush

    PeriodicCommitLogService中实现,这里的关键是waitForSyncToCatchUp()

    protected void maybeWaitForSync(CommitLogSegment.Allocation alloc)
    {
        if (waitForSyncToCatchUp(Long.MAX_VALUE))
        {
            // wait until periodic sync() catches up with its schedule
            long started = System.currentTimeMillis();
            pending.incrementAndGet();
            while (waitForSyncToCatchUp(started))
            {
                WaitQueue.Signal signal = syncComplete.register(commitLog.metrics.waitingOnCommit.time());
                if (waitForSyncToCatchUp(started))
                    signal.awaitUninterruptibly();
                else
                    signal.cancel();
            }
            pending.decrementAndGet();
        }
    }
    

    waitForSyncToCatchUp()

    private boolean waitForSyncToCatchUp(long started)
    {
        return started > lastSyncedAt + blockWhenSyncLagsMillis;
    }
    

    这里的blockWhenSyncLagsMillis是1.5倍的commitlog_sync_period_in_ms

    blockWhenSyncLagsMillis = (int) (DatabaseDescriptor.getCommitLogSyncPeriod() * 1.5);
    

    为什么是1.5倍呢,我的理解是假设flush刷盘的时间是0.5个commitlog_sync_period,但是这个其实是不一定的,可能大于0.5,可能小于0.5,这里就潜在数据丢失了,假设这个确实flush一次不止0.5个commitlog_sync_period,那写完的数据其实是不确定一定刷盘了的。
    具体的flush代码,位于AbstractCommitLogService中的start()方法中

    long syncStarted = System.currentTimeMillis();
    commitLog.sync(shutdown);
    lastSyncedAt = syncStarted;
    syncComplete.signalAll();
    

    commitLog.sync()->segment.sync()->write(startMarker, sectionEnd),write在CompressedSegment与MemoryMappedSegment实现,最终都是调用的channel.force()

  • 相关阅读:
    转载一篇不错的Mac上安装Apache和多版本PHP的文章
    Mac 上配置tomcat 及可能碰到的问题。
    iOS通知中心 NSNotificationCenter详解
    字符缓冲区读取文件BufferedReader
    BufferedWriter—newLine
    缓冲流复制文件与基本流复制文件比较
    BufferedOutputStream缓冲流
    properties集合
    JDK7,JDK9流中异常的处理
    try-catch-finally处理流中的异常
  • 原文地址:https://www.cnblogs.com/donganwangshi/p/4530841.html
Copyright © 2011-2022 走看看