apache kafka - suggestion with hadoop project -


i thinking build using big data. ideally is:

take .csv put flume, kafka, perform n etl , put in kafka, kafka put flume , in hdfs. once infos in hdfs perform map reduce job or hive queries , chart whatever want.

how can put .csv file flume , save kafka? have piece of code not sure if works:

myagent.sources = r1 myagent.sinks = k1 myagent.channels = c1  myagent.sources.r1.type = spooldir myagent.sources.r1.spooldir = /home/xyz/source myagent.sources.r1.fileheader = true  myagent.sinks.k1.type = org.apache.flume.sink.kafka.kafkasink  vmagent.channels.c1.type = memory myagent.channels.c1.capacity = 1000 myagent.channels.c1.transactioncapacity = 100  myagent.sources.r1.channels = c1 myagent.sinks.k1.channel = c1 

any or suggestions? , if piece of code correct, how move on?

thanks everyone!!

your sink config incomplete. try :

a1.sinks.k1.type = org.apache.flume.sink.kafka.kafkasink

a1.sinks.k1.topic = mytopic

a1.sinks.k1.brokerlist = localhost:9092

a1.sinks.k1.requiredacks = 1

a1.sinks.k1.batchsize = 20

a1.sinks.k1.channel = c1

https://flume.apache.org/flumeuserguide.html#kafka-sink


Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -