从 java web 应用程序调用 MapReduce 作业 (Servlet)
您可以使用Java API从Web应用程序调用MapReduce作业。下面是从 servlet 调用 MapReduce 作业的一个小示例。步骤如下:
步骤 1:首先创建一个 MapReduce 驱动程序 servlet 类。同时开发地图和减少服务。下面是一个示例代码片段:
CallJobFromServlet.java
public class CallJobFromServlet extends HttpServlet {
protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException {
Configuration conf = new Configuration();
// Replace CallJobFromServlet.class name with your servlet class
Job job = new Job(conf, " CallJobFromServlet.class");
job.setJarByClass(CallJobFromServlet.class);
job.setJobName("Job Name");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class
job.setNumReduceTasks(30);
job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// Job Input path
FileInputFormat.addInputPath(job, new
Path("hdfs://localhost:54310/user/hduser/input/"));
// Job Output path
FileOutputFormat.setOutputPath(job, new
Path("hdfs://localhost:54310/user/hduser/output"));
job.waitForCompletion(true);
}
}
步骤2:将所有相关的jar(hadoop,特定于应用程序的jar)文件放在Web服务器(例如Tomcat)的lib文件夹中。这对于访问Hadoop配置是强制性的(hadoop'conf'文件夹有配置xml文件,即核心站点.xml,hdfs-site.xml等)。只需将jars从hadoop lib文件夹复制到Web server(tomcat)lib目录即可。jar 名称列表如下:
1. commons-beanutils-1.7.0.jar
2. commons-beanutils-core-1.8.0.jar
3. commons-cli-1.2.jar
4. commons-collections-3.2.1.jar
5. commons-configuration-1.6.jar
6. commons-httpclient-3.0.1.jar
7. commons-io-2.1.jar
8. commons-lang-2.4.jar
9. commons-logging-1.1.1.jar
10. hadoop-client-1.0.4.jar
11. hadoop-core-1.0.4.jar
12. jackson-core-asl-1.8.8.jar
13. jackson-mapper-asl-1.8.8.jar
14. jersey-core-1.8.jar
步骤3:将Web应用程序部署到Web服务器(在Tomcat的“webapps”文件夹中)。
步骤 4:创建一个 jsp 文件,并在表单操作属性中链接 servlet 类 (CallJobFromServlet.java)。下面是一个示例代码片段:
索引.jsp
<form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet ">
<span class="back">Trigger Hadoop Job from Web Page </span>
<input type="submit" name="submit" value="Trigger Job" />
</form>