Local Spark talking to remote HDFS? -
i have file in hdfs inside hortonworks hdp 2.3_1 virtualbox vm.
if go guest spark-shell , refer file thus, works fine
val words=sc.textfile("hdfs:///tmp/people.txt") words.count
however if try access local spark app on windows host, doesn't work
val conf = new sparkconf().setmaster("local").setappname("my app") val sc = new sparkcontext(conf) val words=sc.textfile("hdfs://localhost:8020/tmp/people.txt") words.count
emits
exception in thread "main" org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 1 times, recent failure: lost task 0.0 in stage 0.0 (tid 0, localhost): org.apache.hadoop.hdfs.blockmissingexception: not obtain block: bp-452094660-10.0.2.15-1437494483194:blk_1073742905_2098 file=/tmp/people.txt @ org.apache.hadoop.hdfs.dfsinputstream.choosedatanode(dfsinputstream.java:838) @ org.apache.hadoop.hdfs.dfsinputstream.blockseekto(dfsinputstream.java:526)
the port 8020 open, , if choose wrong file name, tell me
input path not exist: hdfs://localhost:8020/tmp/people.txt!!
localhost:8020 should correct guest hdp vm hat nat port tunneling host windows box.
and it's telling if give wrong name appropriate exception
my pom has
<dependency> <groupid>org.apache.spark</groupid> <artifactid>spark-core_2.11</artifactid> <version>1.4.1</version> <scope>provided</scope> </dependency>
am doing wrong? , blockmissingexception trying tell me?
Comments
Post a Comment