java - How to For Each RDD Spark Streaming -

i have 1 csv file queries.txt , reading file this:

javardd<string> distfile = sc.textfile("queries.txt");

schema of queries.txt file is: uniq_id,,,...some numeric values in csv...

i need each line - create hashmap, key first column of queries.txt file(uniq_id) , value other columns in file hashmap.

example. (this not real , not working example, want convey essence)

hashmap totalmap = new hashmap<integer, numericvalues>();  for(int i=0;i<distfile.size();i++) {    string line = distfile[i].getcolumns();    for(int y=0;y<line.size();y++)    {       totalmap.put(line.getfirstcolumn,line.getremainingcolumns);    } }

here numericvalues custom class have variables mapping columns in file.

any other suggestions helpful.

i guess looking for, example doesn't parses csv line itself.

  javardd<string> distfile = sc.textfile("queries.txt");   hashmap totalmap = new hashmap<integer, numericvalues>();   distfile.foreach(new voidfunction<string>(){            public void call(string line) {               totalmap.put(yourcsvparser(line)); //this dummy function call      }});

Search This Blog

Premier

java - How to For Each RDD Spark Streaming -

Comments

Post a Comment

Popular posts from this blog

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -