Partial and duplicate records while sqoop import -
sqoop import resulting in duplicate/partial records when using following setting
--query- custom query--split-by- non-integer column (char)--num-mappers- more 2
verified source data count 1000 records
verified import data count 1923 records
when using split-by , field non integer .
sqoop uses textsplitter provides warning follows :
warn db.textsplitter: if database sorts in case-insensitive order, may result in partial import or duplicate records warn db.textsplitter: encouraged choose integral split column. - solution 1: use single mapper or 2
- solution 2: use rank function in query , use
--split-byon rank field - solution 3: sort
--split-byfield in ascending order in query
Comments
Post a Comment