Partial and duplicate records while sqoop import -


sqoop import resulting in duplicate/partial records when using following setting

  • --query - custom query
  • --split-by - non-integer column (char)
  • --num-mappers - more 2

verified source data count 1000 records

verified import data count 1923 records

when using split-by , field non integer .

sqoop uses textsplitter provides warning follows :

warn db.textsplitter: if database sorts in case-insensitive order, may result in partial import or duplicate records  warn db.textsplitter: encouraged choose integral split column.  
  • solution 1: use single mapper or 2
  • solution 2: use rank function in query , use --split-by on rank field
  • solution 3: sort --split-by field in ascending order in query

Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -