最近在做通过sqoop 将oracle数据库当中的数据导入的HDFS上面,但是当我串行的时候是没有一点问题的。但是为了达到集群当中资源的额最大的使用率。想让导入数据做成并行去处理。
在做并行的时候,有时候是好的,有时候就出错,这样不稳定的系统真的头大。出现的问题如下:
8/10/29 15:01:03 ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Connection reset java.sql.SQLRecoverableException: IO Error: Connection reset at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:682) at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:711) at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:385) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:30) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:558) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:215) at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:329) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:253) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:336) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1858) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1657) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at oracle.net.ns.DataPacket.send(DataPacket.java:209) at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:215) at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:302) at oracle.net.ns.NetInputStream.read(NetInputStream.java:249) at oracle.net.ns.NetInputStream.read(NetInputStream.java:171) at oracle.net.ns.NetInputStream.read(NetInputStream.java:89) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79) at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:426) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:390) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249) at oracle.jdbc.driver.T4CTTIoauthenticate.doOSESSKEY(T4CTTIoauthenticate.java:435) at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:580) ... 25 more 18/10/29 15:01:03 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1663) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
看到上面的问题,第一反应应该是oracle 驱动的问题,然后就修改oracle的驱动包,然后从ojdbc6.jar的jar包开始替换,然后换成ojdbc7.jar但是问题依然是存在的。然后就翻墙到国外的网站。
然后查看到相同的问题。提供的解决方法
上麦这段话的意思是,当我们在执行map任务的时候,在主机上缺少一个快速随机产生的设备。然后去了sqoop的官网看到也是同样的这句话:
Solution: This problem occurs primarily due to the lack of a fast random number generation device on the host where the map tasks execute. On typical Linux systems this can be addressed by setting the following property in the java.security
file:
在这里在我们的执行脚本当中加入这两句话:
export HADOOP_OPTS=-Djava.security.egd=file:/dev/../dev/urandom sqoop import -D mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom"
至此我的问题得到了解决。整个并行的sqoop脚本执行成功。
下面是参考的文章:
http://stackoverflow.com/questions/2327220/oracle-jdbc-intermittent-connection-issue/
https://community.oracle.com/thread/943911?tstart=0&messageID=3793101
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oracle_connection_reset_errors