hadoop - Extracting Column name from Twitter JSON File -

i trying analyse twitter data using hadoop. have created hive table according tweet had previously. have again downloaded twitter data , problem new columns came in tweet not present in previous tweet data. question is there way can find maximum number of columns tweet can create hive table it. helpless far kindly

if have tweets in json format make table in hive using below query

create external table tweets (    id bigint,    created_at string,    source string,    favorited boolean,    retweet_count int,    retweeted_status struct<       text:string,       user:struct<screen_name:string,name:string>>,    entities struct<       urls:array<struct<expanded_url:string>>,       user_mentions:array<struct<screen_name:string,name:string>>,       hashtags:array<struct<text:string>>>,    text string,    user struct<       screen_name:string,       name:string,       friends_count:int,       followers_count:int,       statuses_count:int,       verified:boolean,       utc_offset:int,       time_zone:string>,    in_reply_to_screen_name string )  row format serde 'com.cloudera.hive.serde.jsonserde' location '/user/hive/warehouse/tweets';

download jar form http://files.cloudera.com/samples/hive-serdes-1.0-snapshot.jar , add

add jar /home/kishore/hive-0.9.0/lib/hive-serdes-1.0-snapshot.jar;

Search This Blog

Premier

hadoop - Extracting Column name from Twitter JSON File -

Comments

Post a Comment

Popular posts from this blog

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -