Engineering Questions with Answers - Multiple Choice Questions

MCQs on Crunch with Hadoop – 1

1 - Question

The Apache Crunch Java library provides a framework for writing, testing, and running ___________ pipelines.
a) MapReduce
b) Pig
c) Hive
d) None of the mentioned

View Answer

Answer: a
Explanation: Goal of Crunch is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.




2 - Question

Point out the correct statement.
a) Scrunch’s Java API is centered around three interfaces that represent distributed datasets
b) All of the other data transformation operations supported by the Crunch APIs are implemented in terms of three primitives
c) A number of common Aggregator<V> implementations are provided in the Aggregators class
d) All of the mentioned

View Answer

Answer: c
Explanation: PGroupedTable provides a combine values operation that allows a commutative and associative Aggregator to be applied to the values of the PGroupedTable instance on both the map and reduce sides of the shuffle.




3 - Question

For Scala users, there is the __________ API, which is built on top of the Java APIs.
a) Prunch
b) Scrunch
c) Hivench
d) All of the mentioned

View Answer

Answer: b
Explanation: It includes a REPL (read-eval-print loop) for creating MapReduce pipelines.




4 - Question

The Crunch APIs are modeled after _________ which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.
a) FlagJava
b) FlumeJava
c) FlakeJava
d) All of the mentioned

View Answer

Answer: b
Explanation: The Apache Crunch project develops and supports Java APIs that simplify the process of creating data pipelines on top of Apache Hadoop.




5 - Question

Point out the wrong statement.
a) Crunch pipeline written by the development team sessionizes a set of user logs generates are then processed by a diverse collection of Pig scripts and Hive queries
b) Crunch pipelines provide a thin veneer on top of MapReduce
c) Developers have access to low-level MapReduce APIs
d) None of the mentioned

View Answer

Answer: d
Explanation: Crunch is extremely fast, only slightly slower than a hand-tuned pipeline developed with the MapReduce APIs.




6 - Question

Crunch was designed for developers who understand __________ and want to use MapReduce effectively.
a) Java
b) Python
c) Scala
d) Javascript

View Answer

Answer: a
Explanation: Crunch is often used in conjunction with Hive and Pig.




7 - Question

Hive, Pig, and Cascading all use a _________ data model.
a) value centric
b) columnar
c) tuple-centric
d) none of the mentioned

View Answer

Answer: c
Explanation: Crunch allows developers considerable flexibility in how they represent their data, which makes Crunch the best pipeline platform for developers.




8 - Question

A __________ represents a distributed, immutable collection of elements of type T.
a) PCollect<T>
b) PCollection<T>
c) PCol<T>
d) All of the mentioned

View Answer

Answer: b
Explanation: PCollection<T> provides a method, parallelDo, that applies a DoFn to each element in the PCollection<T>.




9 - Question

___________ executes the pipeline as a series of MapReduce jobs.
a) SparkPipeline
b) MRPipeline
c) MemPipeline
d) None of the mentioned

View Answer

Answer: b
Explanation: Every Crunch data pipeline is coordinated by an instance of the Pipeline interface.




10 - Question

__________ represent the logical computations of your Crunch pipelines.
a) DoFns
b) DoFn
c) ThreeFns
d) None of the mentioned

View Answer

Answer: a
Explanation: DoFns are designed to be easy to write, easy to test, and easy to deploy within the context of a MapReduce job.

Get weekly updates about new MCQs and other posts by joining 18000+ community of active learners