org.apache.spark.shuffle.FetchFailedException -

i running query on data size of 4 billion rows , getting

org.apache.spark.shuffle.fetchfailedexception error.

select adid,position,userid,price ( select adid,position,userid,price, dense_rank() on (partition adlocationid order price desc) rank traininfo) tmp rank <= 2

i have attached error logs spark-sql terminal.please suggest reason these kind of errors , how can resolve them.

error logs

the problem lost executor:

15/08/25 10:08:13 warn heartbeatreceiver: removing executor 1 no recent heartbeats: 165758 ms exceeds timeout 120000 ms 15/08/25 10:08:13 error taskschedulerimpl: lost executor 1 on 192.168.1.223: executor heartbeat timed out after 165758 ms

the exception occurs when trying read shuffle data node. node may doing long gc (maybe try using smaller heap size executors), or network failure, or pure crash. spark should recover lost nodes one, , indeed starts resubmitting first stage node. depending how big cluster, may succeed or not.

Search This Blog

Premier

org.apache.spark.shuffle.FetchFailedException -

Comments

Post a Comment

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -