Friday, 15 April 2016

Frequent Issues occurred during Spark Development

While coding,we face many issues,be it compilation or execution. So I tried to collate some frequently faced issues for Spark development here.

  •    When we run spark on windows, sometimes following error is displayed:
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxrwxr-x
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:529)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:478)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430)
... 7 more

            Solution:
You need to give 777 permission to this directory. 
Lets say, if /tmp/hive is present in your D: drive, run following command:

D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
For complete installation steps, you can refer previous post.


  •    How to launch Master and worker on windows manually?
            Solution:
Open command prompt and go to %SPARK_HOME%/bin folder.  Run the following commands:

spark-class org.apache.spark.deploy.master.Master     <= for master node
spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077  <= for worker node


  •     How to get rid of “A master url is not set for configuration” error?
            Solution:
From command line:

Set –Dspark.master=spark://hostname:7077 as a JVM parameter

From code, use SparkConf.setMaster() method.
SparkConf conf = new SparkConf().setAppName("App_Name").setMaster("spark://hostname:7077);


  •     How to solve following “System memory, Please use larger heap” size error?
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
at
org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
       at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
       at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
       at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
       at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
       at com.spark.example.SimpleApp.main(SimpleApp.java:18)

            Solution:
Add -Xmx1024m -Xms512m in VM arguments


             Stay tuned for further updates..!!! 

11 comments:

  1. Any luck with sharing RDDs in the same context for spark-jobserver in java?

    Regards,
    Vishal
    gavishal@gmail.com

    ReplyDelete
  2. No, I didn't find any solution for sharing RDD using NamedObject or NamedRDD but it works fine when you use it as simple RDD

    ReplyDelete
  3. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java EE Online Training from India . Nowadays Java has tons of job opportunities on various vertical industry.



    Java Online Training

    ReplyDelete