python - Error with training logistic regression model on Apache Spark. SPARK-5063 -
i trying build logistic regression model apache spark. here code.
parseddata = raw_data.map(mapper) # mapper function generates pair of label , feature vector labeledpoint object featurevectors = parseddata.map(lambda point: point.features) # feature vectors parsed data scaler = standardscaler(true, true).fit(featurevectors) #this creates standardization model scale features scaleddata = parseddata.map(lambda lp: labeledpoint(lp.label, scaler.transform(lp.features))) #trasform features scale mean 0 , unit std deviation modelscaledsgd = logisticregressionwithsgd.train(scaleddata, iterations = 10)
but error:
exception: appears attempting reference sparkcontext broadcast variable, action, or transforamtion. sparkcontext can used on driver, not in code run on workers. more information, see spark-5063.
i not sure how work around this. greately appreciated.
problem see pretty same 1 i've described in how use java/scala function action or transformation? transform have call scala function, , requires access sparkcontext
hence error see.
standard way handle process required part of data , zip results.
labels = parseddata.map(lambda point: point.label) featurestransformed = scaler.transform(featurevectors) scaleddata = (labels .zip(featurestransformed) .map(lambda p: labeledpoint(p[0], p[1]))) modelscaledsgd = logisticregressionwithsgd.train(...)
if don't plan implement own methods based on mllib
components easier use high level ml
api.
edit:
there 2 possible problems here.
- at point
logisticregressionwithsgd
supports only binomial classification (thanks eliasah pointing out) . if need multi-label classification can replacelogisticregressionwithlbfgs
. standardscaler
supports dense vectors has limited applications.
Comments
Post a Comment