One time, I have written a query with two tables join,
One table is big table with partitions , another table is filter this big table.
Then join the two tables.
The big table is about some millions after filter by partition, and the small table is 170 thousands rows.
The query running a lot of time.
And the big data environment even go to safe mode for this.
I kill this job .
How to monitor long running hive job for this?
Why the name node come to safe mode for the query?
the parent process was killed for java outofmemory exception, SA found this root cause.
another issue is that, pay attention to the split(field,seperater),
if the seperater is |, you should use [|] or \|, because | stand for special meaning in regex expression.