Debugging Hadoop Task tracker, Job tracker, Data Node or Name Node

Hadoop conf/hadoop-env.sh has following environment variables 
  1. HADOOP_NAMENODE_OPTS
  2. HADOOP_SECONDARYNAMENODE_OPTS, 
  3. HADOOP_DATANODE_OPTS
  4. HADOOP_BALANCER_OPTS
  5. HADOOP_JOBTRACKER_OPTS
  6. HADOOP_TASKTRACKER_OPTS

You can use them to start the remote debugger so that you can connection and debug any of the above servers. Unfortunately, Hadoop tasks are started through a separate JVM by the task tracker, and you cannot use this method to debug your map or reduce function as they run in separate JVMs. 
To debug task tracker, do following steps. 
1. Edit conf/hadoop-env.sh to have following
export HADOOP_TASKTRACKER_OPTS=”-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n”
2. Start Hadoop (bin/start-dfs.sh and bin/start-mapred.sh)
3. It will block waiting for debug connection
4. Connect to the server using Eclipse “Remote Java Application” in the Debug configurations and add the break points
5. Run a map reduce Job 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s