错误信息:
insert overwrite table t_mobile_mid_use_p_tmp4_rcf
select '201411' as month_id,
a.prov_id, a.city, a.client_imsi, a.os_version,
b.install_status, b.install_date, b.unstall_status, b.unstall_date,
a.label_name, a.package_name, a.app_version, a.app_type_id, a.type_label_name,
b.run_time, monthSpace(b.install_date) as install_days,
a.flow, a.use_time, a.run_count, a.active_days, a.is_from_plugin,
from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') as load_date
from t_mobile_mid_use_p_tmp3_1_rcf a
join t_mobile_client_p_rcf b on (a.client_imsi = b.client_imsi and a.label_name = b.label_name);
Query ID = ca_20141218152020_9e4ebfa2-f663-47b8-a0cf-5303b9c0e482
Total jobs = 1
14/12/18 15:21:02 WARN conf.Configuration:
file:/tmp/ca/hive_2014-12-18_15-20-54_155_1926187970964040123-1/-local-10005/jobconf.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/12/18 15:21:02 WARN conf.Configuration:
file:/tmp/ca/hive_2014-12-18_15-20-54_155_1926187970964040123-1/-local-10005/jobconf.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts; Ignoring.
Execution log at: /tmp/ca/ca_20141218152020_9e4ebfa2-f663-47b8-a0cf-5303b9c0e482.log
2014-12-18 03:21:03 Starting to launch local task to process map join; maximum memory = 1065484288
2014-12-18 03:21:08 Processing rows: 200000 Hashtable size: 199999 Memory usage: 112049704 percentage: 0.105
2014-12-18 03:21:09 Processing rows: 300000 Hashtable size: 299999 Memory usage: 160367688 percentage: 0.151
2014-12-18 03:21:10 Processing rows: 400000 Hashtable size: 399999 Memory usage: 209294088 percentage: 0.196
2014-12-18 03:21:11 Processing rows: 500000 Hashtable size: 499999 Memory usage: 257089944 percentage: 0.241
2014-12-18 03:21:12 Processing rows: 600000 Hashtable size: 599999 Memory usage: 305440536 percentage: 0.287
2014-12-18 03:21:14 Processing rows: 700000 Hashtable size: 699999 Memory usage: 347305664 percentage: 0.326
2014-12-18 03:21:14 Processing rows: 800000 Hashtable size: 799999 Memory usage: 403916624 percentage: 0.379
2014-12-18 03:21:16 Processing rows: 900000 Hashtable size: 899999 Memory usage: 452238592 percentage: 0.424
2014-12-18 03:21:16 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 499593552 percentage: 0.469
2014-12-18 03:21:18 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 547966320 percentage: 0.514
2014-12-18 03:21:19 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 593792800 percentage: 0.557
2014-12-18 03:21:21 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 641564688 percentage: 0.602
2014-12-18 03:21:21 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 690130432 percentage: 0.648
2014-12-18 03:21:21 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 737340976 percentage: 0.692
2014-12-18 03:21:24 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 793258352 percentage: 0.745
2014-12-18 03:21:25 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 841009952 percentage: 0.789
2014-12-18 03:21:25 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 887464680 percentage: 0.833
2014-12-18 03:21:28 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 934581288 percentage: 0.877
2014-12-18 03:21:28 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 984062056 percentage: 0.924
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
Stage-5
官方FAQ解释:
Hive
converted a join into a locally running and faster 'mapjoin', but ran
out of memory while doing so. There are two bugs responsible for this.
hives metric for converting joins miscalculated the required amount of
memory. This is especially true for compressed files and ORC files, as
hive uses the filesize as metric, but compressed tables require more
memory in their uncompressed 'in memory representation'.
The later option may lead to bug number two if you happen to have a affected Hadoop version.
Hive/Hadoop ignores 'hive.mapred.local.mem' ! (more exactly: bug in
Hadoop 2.2 where hadoop-env.cmd sets the -xmx parameter multiple times,
effectively overriding the user set hive.mapred.local.mem setting. see:
======select count(1) from t_mobile_mid_use_p_tmp3_1_rcf;
/**
*MapReduce Jobs Launched:
*Job 0: Map: 14 Reduce: 1 Cumulative CPU: 102.42 sec HDFS Read: 172923550 HDFS Write: 9 SUCCESS
*Total MapReduce CPU Time Spent: 1 minutes 42 seconds 420 msec
*OK
*34304843
*Time taken: 33.022 seconds, Fetched: 1 row(s)
*/
======select count(*) from t_mobile_client_p_rcf;
/**
*MapReduce Jobs Launched:
*Job 0: Map: 5 Reduce: 1 Cumulative CPU: 62.47 sec HDFS Read: 116257926 HDFS Write: 10 SUCCESS
*Total MapReduce CPU Time Spent: 1 minutes 2 seconds 470 msec
*OK
*165830880
*Time taken: 37.75 seconds, Fetched: 1 row(s)
*/
解决方法:
set hive.auto.convert.join=false;关闭自动转化MapJoin,默认为true;
set hive.ignore.mapjoin.hint=false; 关闭忽略mapjoin的hints(不忽略,hints有效),默认为true(忽略hints)。