官方提供了下面的样例进行嵌套json扁平化:
{ "timestamp": "2015-09-12T12:10:53.155Z", "dim1": "qwerty", "dim2": "asdf", "dim3": "zxcv", "ignore_me": "ignore this", "metrica": 9999, "foo": {"bar": "abc"}, "foo.bar": "def", "nestmet": {"val": 42}, "hello": [1.0, 2.0, 3.0, 4.0, 5.0], "mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}], "world": [{"hey": "there"}, {"tree": "apple"}], "thing": {"food": ["sandwich", "pizza"]} }
我对这个样例进行了批量,并传输至kakfa中,截取一小段:
{"timestamp": "2018-12-20T14:12:39","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}} {"timestamp": "2018-12-20T14:12:39","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}} {"timestamp": "2018-12-20T14:12:40","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}} {"timestamp": "2018-12-20T14:12:40","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}} {"timestamp": "2018-12-20T14:12:40","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}} {"timestamp": "2018-12-20T14:12:41","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}} {"timestamp": "2018-12-20T14:12:41","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}} {"timestamp": "2018-12-20T14:12:41","dim1": "qwerty","dim2": "asdf","dim3": "zxcv","ignore_me": "ignore this","metrica": 9999,"foo": {"bar": "abc"},"foo.bar": "def","nestmet":{"val": 42},"hello": [1.0, 2.0, 3.0, 4.0, 5.0],"mixarray": [1.0, 2.0, 3.0, 4.0, {"last": 5}],"world": [{"hey": "there"}, {"tree": "apple"}],"thing": {"food": ["sandwich", "pizza"]}}
官方给的解析方式:
"parseSpec": { "format": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [ { "type": "root", "name": "dim1" }, "dim2", { "type": "path", "name": "foo.bar", "expr": "$.foo.bar" }, { "type": "root", "name": "foo.bar" }, { "type": "path", "name": "path-metric", "expr": "$.nestmet.val" }, { "type": "path", "name": "hello-0", "expr": "$.hello[0]" }, { "type": "path", "name": "hello-4", "expr": "$.hello[4]" }, { "type": "path", "name": "world-hey", "expr": "$.world[0].hey" }, { "type": "path", "name": "worldtree", "expr": "$.world[1].tree" }, { "type": "path", "name": "first-food", "expr": "$.thing.food[0]" }, { "type": "path", "name": "second-food", "expr": "$.thing.food[1]" }, { "type": "jq", "name": "first-food-by-jq", "expr": ".thing.food[1]" }, { "type": "jq", "name": "hello-total", "expr": ".hello | sum" } ] }, "dimensionsSpec" : { "dimensions" : [], "dimensionsExclusions": ["ignore_me"] }, "timestampSpec" : { "format" : "auto", "column" : "timestamp" } }
生成数据源后,发现不读取kafka数据,检查发现拉取数据的进程失败了,原因是有相同字段field出现:
{ "type": "path", "name": "foo.bar", "expr": "$.foo.bar" }, { "type": "root", "name": "foo.bar" },
修改为这样重启:
{ "type": "path", "name": "foo-bar", "expr": "$.foo.bar" }, { "type": "root", "name": "foo.bar" },
重启后日志仍然报错,原因是jq没有sum函数:
"type": "jq", "name": "hello-total", "expr": ".hello | sum" }
去掉后重启恢复正常。
找到原因后,又测试了3层,4层嵌套,都能扁平化flatten,没找到不确定长度数组怎么添加field key。
jq的函数没找到怎么数组sum求和。
jackson-jq github: https://github.com/eiiches/jackson-jq
jackson-jq 官网:https://stedolan.github.io/jq/
json-path github:https://github.com/json-path/JsonPath
官方路径:http://druid.io/docs/latest/ingestion/flatten-json.html