hadoop - Job fails to read from one ORC file and write a subset to another -


working in apache pig interactive shell in hdp 2.3 windows, i've got existing orc file in /path/to/file. if load , save using:

a = load '/path/to/file' using orcstorage(''); store '/path/to/second_file' using orcstorage(''); 

then works. however, if try:

a = load '/path/to/file' using orcstorage(''); b = limit 10; store b '/path/to/third_file' using orcstorage(''); 

then following error traceback in logs second job (out of 2 schedules):

2015-08-25 16:03:42,161 fatal [main] org.apache.hadoop.mapreduce.v2.app.mrappmaster: error starting mrappmaster java.lang.noclassdeffounderror: org/apache/hadoop/hive/ql/io/orc/orcnewoutputformat     @ java.lang.class.forname0(native method)     @ java.lang.class.forname(class.java:348)     @ org.apache.pig.impl.pigcontext.resolveclassname(pigcontext.java:657)     @ org.apache.pig.impl.pigcontext.instantiatefuncfromspec(pigcontext.java:726)     @ org.apache.pig.backend.hadoop.executionengine.physicallayer.relationaloperators.postore.getstorefunc(postore.java:251)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigoutputcommitter.getcommitters(pigoutputcommitter.java:88)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigoutputcommitter.<init>(pigoutputcommitter.java:71)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigoutputformat.getoutputcommitter(pigoutputformat.java:289)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster$1.call(mrappmaster.java:476)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster$1.call(mrappmaster.java:458)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.callwithjobclassloader(mrappmaster.java:1560)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.createoutputcommitter(mrappmaster.java:458)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.serviceinit(mrappmaster.java:377)     @ org.apache.hadoop.service.abstractservice.init(abstractservice.java:163)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster$4.run(mrappmaster.java:1518)     @ java.security.accesscontroller.doprivileged(native method)     @ javax.security.auth.subject.doas(subject.java:422)     @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1657)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.initandstartappmaster(mrappmaster.java:1515)     @ org.apache.hadoop.mapreduce.v2.app.mrappmaster.main(mrappmaster.java:1448) caused by: java.lang.classnotfoundexception: org.apache.hadoop.hive.ql.io.orc.orcnewoutputformat     @ java.net.urlclassloader.findclass(urlclassloader.java:381)     @ java.lang.classloader.loadclass(classloader.java:424)     @ sun.misc.launcher$appclassloader.loadclass(launcher.java:331)     @ java.lang.classloader.loadclass(classloader.java:357) 

i suspect classpath 2 jobs different, causing classnotfound. case? if so, how can fix it? (bonus question: why has happened?)

check dependent library orcstorage placed in nodes.

  • the first option spawn single job
  • the second option spawn multiple jobs maybe run in different machine doesnt have dependent library in classpath.

Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -