We are currently changing our infrastructure to use the distributed hadoop filesystem (HDFS, an open source filesystem similar to Google’s), instead of dedicated fileservers. Therefore we needed to change the ant task that deletes old files on the developer’s computers to delete those files in HDFS. After some extensive research – “ant hadoop” are really bad search terms – we found that the Hadoop distribution already comes with some predefined tasks. This is how they can be used:

  1. <path id="ant.classpath">
  2. <fileset dir="${libs.dir}">
  3.       <include name="hadoop-0.18.3-ant.jar" />
  4.       <include name="hadoop-0.18.3-core.jar" />
  5.       <include name="commons-cli-2.0-SNAPSHOT.jar" />
  6. </fileset>
  7. </path>
  8. <taskdef name="hdfs" classname="org.apache.hadoop.ant.DfsTask" classpathref="ant.classpath" />
  9. <target  name="createHDFSdirectory">
  10.       <hdfs cmd="mkdir" args="hdfs://localhost:54310/testDir" />
  11. </target>
  12. <target  name="deleteHDFSdirectory">
  13.       <hdfs cmd="rmr" args="hdfs://localhost:54310/testDir" />
  14. </target>


‘localhost:54310′ is the fs.default.name as configured in conf/hadoop-site.xml.
Note that you have to include a special commons-cli.jar, which you can find in the lib directory of your downloaded hadoop. Otherwise you’ll get a NoSuchMethodError.

Unfortunately it is not possible to execute the removing and adding of the same directory in one task (not even with sequential execution and sleeping in between), so we decided to write our own simple wrapper task for the HDFS operation we needed.

Email this Share this on Facebook Share this on LinkedIn Tweet This! RSS feed for comments on this post. TrackBack URL

Leave a comment