Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Databricks Certified Generative AI Engineer Associate Databricks-Generative-AI-Engineer-Associate Question # 4 Topic 1 Discussion

Databricks Certified Generative AI Engineer Associate Databricks-Generative-AI-Engineer-Associate Question # 4 Topic 1 Discussion

Databricks-Generative-AI-Engineer-Associate Exam Topic 1 Question 4 Discussion:
Question #: 4
Topic #: 1

A Generative Al Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each document.

What is the most performant way to store this dataframe?


A.

Split the data into train and test set, create a unique identifier for each document, then save to a Delta table


B.

Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a Delta table


C.

First create a unique identifier for each document, then save to a Delta table


D.

Store each chunk as an independent JSON file in Unity Catalog Volume. For each JSON file, the key is the document section name and the value is the array of text chunks for that section


Get Premium Databricks-Generative-AI-Engineer-Associate Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.