Cloudera Certified Administrator for Apache Hadoop (CCAH) CCA-500 Question # 7 Topic 1 Discussion

Cloudera Certified Administrator for Apache Hadoop (CCAH) CCA-500 Question # 7 Topic 1 Discussion

CCA-500 Exam Topic 1 Question 7 Discussion:
Question #: 7
Topic #: 1

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions:

1. Group the individual images into a set of larger files

2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.

Which data serialization system gives the flexibility to do this?


A.

CSV


B.

XML


C.

HTML


D.

Avro


E.

SequenceFiles


F.

JSON


Get Premium CCA-500 Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.