You want to run Hadoop jobs on your development workstation for testing before you submit them to your production cluster. Which mode of operation in Hadoop allows you to most closely simulate a production cluster while using a single machine?
Answer : C
You need to run the same job many times with minor variations. Rather than hardcoding all job configuration options in your drive code, youve decided to have your Driver subclass org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Tool interface.
Indentify which invocation correctly passes.mapred.job.name with a value of Example to
Hadoop?
Answer : C
Explanation: Configure the property using the -D key=value notation:
-D mapred.job.name='My Job'
You can list a whole bunch of options by calling the streaming jar with just the -info argument
Reference: Python hadoop streaming : Setting a job name
You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?
Answer : C
Reference: Hadoop and Pig for Large-Scale Web Log Analysis
Your client application submits a MapReduce job to your Hadoop cluster. Identify the
Hadoop daemon on which the Hadoop framework will look for an available slot schedule a
MapReduce operation.
Answer : D
Explanation: JobTracker is the daemon service for submitting and tracking MapReduce jobs in Hadoop. There is only One Job Tracker process run on any hadoop cluster. Job
Tracker runs on its own JVM process. In a typical production cluster its run on a separate machine. Each slave node is configured with job tracker node location. The JobTracker is single point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted. JobTracker in Hadoop performs following actions(from Hadoop Wiki:)
Client applications submit jobs to the Job tracker.
The JobTracker talks to the NameNode to determine the location of the data
The JobTracker locates TaskTracker nodes with available slots at or near the data
The JobTracker submits the work to the chosen TaskTracker nodes.
The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different
TaskTracker.
A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable.
When the work is completed, the JobTracker updates its status.
Client applications can poll the JobTracker for information.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is a JobTracker in Hadoop? How many instances of JobTracker run on a Hadoop Cluster?
Which one of the following statements describes the relationship between the
ResourceManager and the ApplicationMaster?
Answer : A
You want to perform analysis on a large collection of images. You want to store this data in
HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high- level programming language like Python. Which format should you use to store this data in
HDFS?
Answer : B
Reference: Hadoop binary files processing introduced by image duplicates finder
Review the following 'data' file and Pig code.
Answer : A
In Hadoop 2.2, which one of the following statements is true about a standby NameNode?
The Standby NameNode:
Answer : B
Which best describes what the map method accepts and emits?
Answer : D
Explanation: public class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> extends Object
Maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks which transform input records into a intermediate records.
The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.
Reference: org.apache.hadoop.mapreduce
Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
Which Hadoop component is responsible for managing the distributed file system metadata?
Answer : A
Examine the following Hive statements:
Answer : A
You want to count the number of occurrences for each unique word in the supplied input data. Youve decided to implement this by having your mapper tokenize each word and emit a literal value 1, and then have your reducer increment a counter for each literal 1 it receives. After successful implementing this, it occurs to you that you could optimize this by specifying a combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?
Answer : A
Explanation: Combiners are used to increase the efficiency of a MapReduce program.
They are used to aggregate intermediate map output locally on individual mapper outputs.
Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times.
Therefore your MapReduce jobs should not depend on the combiners execution.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What are combiners? When should I use a combiner in my MapReduce Job?
You are developing a combiner that takes as input Text keys, IntWritable values, and emits
Text keys, IntWritable values. Which interface should your class implement?
Answer : D
A NameNode in Hadoop 2.2 manages ______________.
Answer : B
In the reducer, the MapReduce API provides you with an iterator over Writable values.
What does calling the next () method return?
Answer : C
Explanation: Calling Iterator.next() will always return the SAME EXACT instance of
IntWritable, with the contents of that instance replaced with the next value.
Reference: manupulating iterator in mapreduce
Have any questions or issues ? Please dont hesitate to contact us