zoukankan      html  css  js  c++  java
  • Spark repartition

    repartitionByRange


    repartitionByRange(numPartitions, *cols) method of pyspark.sql.dataframe.DataFrame instance
        Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The
        resulting DataFrame is range partitioned.
        
        :param numPartitions:
            can be an int to specify the target number of partitions or a Column.
            If it is a Column, it will be used as the first partitioning column. If not specified,
            the default number of partitions is used.
        
        At least one partition-by expression must be specified.
        When no explicit sort order is specified, "ascending nulls first" is assumed.
    begin = time.time()
    df = merge_data
    df.repartitionByRange(10,"probeset_id").write.format("delta").mode("append").save(f)
    print(time.time()-begin)
  • 相关阅读:
    CSS
    html5
    XHTML
    HTML
    git 教程 --git revert 命令
    Git 教程 --git merge 命令
    git 教程 --git reset 命令
    git 教程 --git cherry-pick 命令
    git 教程 --git stash命令
    git 教程 --git diff功能
  • 原文地址:https://www.cnblogs.com/similarface/p/13267075.html
Copyright © 2011-2022 走看看