Hadoop MCQs
1. PCollection, PTable, and PGroupedTable all support a __________ operation.
a) intersection
b) union
c) OR
d) None of the mentioned
View Answer
Answer: b
Explanation: Union operation takes a series of distinct PCollections that all have the same data type and treats them as a single virtual PCollection.
2. Point out the correct statement.
a) StreamPipeline executes the pipeline in-memory on the client
b) MemPipeline executes the pipeline by converting it to a series of Spark pipelines
c) MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster
d) All of the mentioned
View Answer
Answer: c
Explanation: SparkPipeline executes the pipeline by converting it to a series of Spark pipelines.
3. Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition.
a) Transient
b) DoFns
c) Configuration
d) All of the mentioned
View Answer
Answer: b
Explanation: Dofus is a Flash based massively multiplayer online role-playing game (MMORPG) developed and published by Ankama Games.
4. Inline DoFn that splits a line up into words is an inner class ____________
a) Pipeline
b) MyPipeline
c) ReadPipeline
d) WritePipe
View Answer
Answer: b
Explanation: Inner classes contain references to their parent outer classes, so unless MyPipeline implements the Serializable interface, the NotSerializableException will be thrown when Crunch tries to serialize the inner DoFn.
5. Point out the wrong statement.
a) DoFns also have a number of helper methods for working with Hadoop Counters, all named increment
b) The Crunch APIs contain a number of useful subclasses of DoFn that handle common data processing scenarios and are easier to write and test
c) FilterFn class defines a single abstract method
d) None of the mentioned
View Answer
Answer: d
Explanation: Counters are an incredibly useful way of keeping track of the state of long-running data pipelines and detecting any exceptional conditions that occur during processing
6. DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.
a) TaskInputContext
b) TaskInputOutputContext
c) TaskOutputContext
d) All of the mentioned
View Answer
Answer: b
Explanation: There are also a number of helper methods for working with the objects associated with the TaskInputOutputContext
7. The top-level ___________ package contains three of the most important specializations in Crunch.
a) org.apache.scrunch
b) org.apache.crunch
c) org.apache.kcrunch
d) all of the mentioned
View Answer
Answer: b
Explanation: Each of these specialized DoFn implementations has associated methods on the PCollection, PTable, and PGroupedTable interfaces to support common data processing steps.
8. The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.
a) spot
b) reflects
c) gets
d) all of the mentioned
View Answer
Answer: b
Explanation: There are a couple of restrictions on the structure of the POJO.
9. The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.
a) NLineInputFormat
b) InputLineFormat
c) LineInputFormat
d) None of the mentioned
View Answer
Answer: a
Explanation: We can set the value of parameter via the Source interface’s inputConf method.
10. The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.
a) Grouping
b) GroupingOptions
c) RowGrouping
d) None of the mentioned
View Answer
Answer: b
Explanation: The GroupingOptions class is immutable.