zoukankan      html  css  js  c++  java
  • ELK学习总结(4-2)关于导入数据

    用REST API的_bulk来批量插入,可以达到5到10w条每秒

    把数据写进json文件,然后再通过批处理,执行文件插入数据:

    1、先定义一定格式的json文件,文件不能过大,过大会报错

     

    2、后用curl命令去执行Elasticsearch的_bulk来批量插入

    建议生成10M一个文件,然后分别去执行这些小文件就可以了!

     

    json数据文件内容的定义

    {"index":{"_index":"meterdata","_type":"autoData"}}
    {"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:03:00"}
    {"index":{"_index":"meterdata","_type":"autoData"}}
    {"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:04:00"}
    {"index":{"_index":"meterdata","_type":"autoData"}}
    {"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:05:00"}
    {"index":{"_index":"meterdata","_type":"autoData"}}
    {"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:06:00"}
    {"index":{"_index":"meterdata","_type":"autoData"}}
    {"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:07:00"}
     
    批处理内容的定义
    cd E:curl-7.50.3-win64-mingwin
    curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:BinDebug estdata437714060.json
    curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:BinDebug estdata743719428.json
    curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:BinDebug estdata281679894.json
    curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:BinDebug estdata146257480.json
    curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:BinDebug estdata892018760.json
    pause
     

    工具代码

    private void button1_Click(object sender, EventArgs e)
    {
    //Application.StartupPath + "\" + NextFile.Name
    Task.Run(() => { CreateDataToFile(); });
    }
    public void CreateDataToFile()
    {
    StringBuilder sb = new StringBuilder();
    StringBuilder sborder = new StringBuilder();
    int flag = 1;
    sborder.Append(@"cd E:curl-7.50.3-win64-mingwin" + Environment.NewLine);
    DateTime endDate = DateTime.Parse("2016-10-22");
    for (int i = 1; i <= 10000; i++)//1w个点
    {
    DateTime startDate = DateTime.Parse("2016-10-22").AddYears(-1);
    this.Invoke(new Action(() => { label1.Text = "生成第" + i + "个"; }));

    while (startDate <= endDate)//每个点生成一年数据,每分钟一条
    {
    if (flag > 100000)//大于10w分割一个文件
    {
    string filename = new Random(GetRandomSeed()).Next(900000000) + ".json";

    FileStream fs3 = new FileStream(Application.StartupPath + "\testdata\" + filename, FileMode.OpenOrCreate);
    StreamWriter sw = new StreamWriter(fs3, Encoding.GetEncoding("GBK"));
    sw.WriteLine(sb.ToString());
    sw.Close();
    fs3.Close();
    sb.Clear();
    flag = 1;
    sborder.Append(@"curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:BinDebug estdata" + filename + Environment.NewLine);

    }
    else
    {
    sb.Append("{"index":{"_index":"meterdata","_type":"autoData"}}" + Environment.NewLine);
    sb.Append("{"Mfid ":" + i + ","TData":" + new Random().Next(1067500) + ","TMoney":" + new Random().Next(1300) + ","HTime":"" + startDate.ToString("yyyy-MM-ddTHH:mm:ss") + ""}" + Environment.NewLine);
    flag++;
    }
    startDate = startDate.AddMinutes(1);//
    }

    }
    sborder.Append("pause");
    FileStream fs1 = new FileStream(Application.StartupPath + "\testdata\order.bat", FileMode.OpenOrCreate);
    StreamWriter sw1 = new StreamWriter(fs1, Encoding.GetEncoding("GBK"));
    sw1.WriteLine(sborder.ToString());
    sw1.Close();
    fs1.Close();
    MessageBox.Show("生成完毕");

    }
    static int GetRandomSeed()
    {//随机生成不重复的编号
    byte[] bytes = new byte[4];
    System.Security.Cryptography.RNGCryptoServiceProvider rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
    rng.GetBytes(bytes);
    return BitConverter.ToInt32(bytes, 0);
    }

    总结

    测试结果,发现Elasticsearch的搜索速度是挺快的,生成过程中,在17亿数据时查了一下,根据Mid和时间在几个月范围的数据,查十条数据两秒多完成查询,

    而且同一查询条件查询越多,查询就越快,应该是Elasticsearch缓存了,

    52亿条数据,大概占用500G空间左右,还是挺大的,

    相比Protocol Buffers存储的数据,要大三倍左右,但搜索速度还是比较满意的。

  • 相关阅读:
    git archive
    查看库的详细版本号,
    locks
    jquery中的DOM操作集锦
    Eclipse调试Bug的七种常用技巧
    eclipse如何修改dynamic web module version
    Eclipse导入到web项目没有run on server
    深入理解JavaScript系列
    10个好用的JQuery代码片段收集
    实现table样式的设定
  • 原文地址:https://www.cnblogs.com/lexiaofei/p/6673319.html
Copyright © 2011-2022 走看看