Partial and duplicate records while sqoop import -
sqoop import resulting in duplicate/partial records when using following setting
--query
- custom query--split-by
- non-integer column (char)--num-mappers
- more 2
verified source data count 1000 records
verified import data count 1923 records
when using split-by
, field non integer .
sqoop uses textsplitter provides warning follows :
warn db.textsplitter: if database sorts in case-insensitive order, may result in partial import or duplicate records warn db.textsplitter: encouraged choose integral split column.
- solution 1: use single mapper or 2
- solution 2: use rank function in query , use
--split-by
on rank field - solution 3: sort
--split-by
field in ascending order in query
Comments
Post a Comment