hadoop - Job fails to read from one ORC file and write a subset to another -
working in apache pig interactive shell in hdp 2.3 windows, i've got existing orc file in /path/to/file
. if load , save using:
a = load '/path/to/file' using orcstorage(''); store '/path/to/second_file' using orcstorage('');
then works. however, if try:
a = load '/path/to/file' using orcstorage(''); b = limit 10; store b '/path/to/third_file' using orcstorage('');
then following error traceback in logs second job (out of 2 schedules):
2015-08-25 16:03:42,161 fatal [main] org.apache.hadoop.mapreduce.v2.app.mrappmaster: error starting mrappmaster java.lang.noclassdeffounderror: org/apache/hadoop/hive/ql/io/orc/orcnewoutputformat @ java.lang.class.forname0(native method) @ java.lang.class.forname(class.java:348) @ org.apache.pig.impl.pigcontext.resolveclassname(pigcontext.java:657) @ org.apache.pig.impl.pigcontext.instantiatefuncfromspec(pigcontext.java:726) @ org.apache.pig.backend.hadoop.executionengine.physicallayer.relationaloperators.postore.getstorefunc(postore.java:251) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigoutputcommitter.getcommitters(pigoutputcommitter.java:88) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigoutputcommitter.<init>(pigoutputcommitter.java:71) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigoutputformat.getoutputcommitter(pigoutputformat.java:289) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster$1.call(mrappmaster.java:476) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster$1.call(mrappmaster.java:458) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.callwithjobclassloader(mrappmaster.java:1560) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.createoutputcommitter(mrappmaster.java:458) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.serviceinit(mrappmaster.java:377) @ org.apache.hadoop.service.abstractservice.init(abstractservice.java:163) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster$4.run(mrappmaster.java:1518) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:422) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1657) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.initandstartappmaster(mrappmaster.java:1515) @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.main(mrappmaster.java:1448) caused by: java.lang.classnotfoundexception: org.apache.hadoop.hive.ql.io.orc.orcnewoutputformat @ java.net.urlclassloader.findclass(urlclassloader.java:381) @ java.lang.classloader.loadclass(classloader.java:424) @ sun.misc.launcher$appclassloader.loadclass(launcher.java:331) @ java.lang.classloader.loadclass(classloader.java:357)
i suspect classpath 2 jobs different, causing classnotfound. case? if so, how can fix it? (bonus question: why has happened?)
check dependent library orcstorage placed in nodes.
- the first option spawn single job
- the second option spawn multiple jobs maybe run in different machine doesnt have dependent library in classpath.
Comments
Post a Comment