org.apache.spark.shuffle.FetchFailedException -
i running query on data size of 4 billion rows , getting
org.apache.spark.shuffle.fetchfailedexception error.
select adid,position,userid,price ( select adid,position,userid,price, dense_rank() on (partition adlocationid order price desc) rank traininfo) tmp rank <= 2
i have attached error logs spark-sql terminal.please suggest reason these kind of errors , how can resolve them.
the problem lost executor:
15/08/25 10:08:13 warn heartbeatreceiver: removing executor 1 no recent heartbeats: 165758 ms exceeds timeout 120000 ms 15/08/25 10:08:13 error taskschedulerimpl: lost executor 1 on 192.168.1.223: executor heartbeat timed out after 165758 ms
the exception occurs when trying read shuffle data node. node may doing long gc (maybe try using smaller heap size executors), or network failure, or pure crash. spark should recover lost nodes one, , indeed starts resubmitting first stage node. depending how big cluster, may succeed or not.
Comments
Post a Comment