Pass the Databricks Databricks Certification Databricks-Certified-Professional-Data-Scientist Questions and answers with CertsForce

Viewing page 3 out of 5 pages
Viewing questions 21-30 out of questions
Questions # 21:

Of all the smokers in a particular district, 40% prefer brand A and 60% prefer brand B. Of those smokers who prefer brand A. 30% are females, and of those who prefer brand B. 40% are female. What is the probability that a randomly selected smoker prefers brand A, given that the person selected is a female?

Which of the following is a best way to solve this problem?

Options:

A.

Bays Theorem


B.

Poisson Distribution


C.

Binomial Distribution


D.

None of the above


Expert Solution
Questions # 22:

You are studying the behavior of a population, and you are provided with multidimensional data at the individual level. You have identified four specific individuals who are valuable to your study, and would like to find all users who are most similar to each individual. Which algorithm is the most appropriate for this study?

Options:

A.

Association rules


B.

Decision trees


C.

Linear regression


D.

K-means clustering


Expert Solution
Questions # 23:

Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has

rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 10% of the time. Which of the following will you use to calculate the probability whether it will rain on the

day of Marie’s wedding?

Options:

A.

Naive Bayes


B.

Logistic Regression


C.

Random Decision Forests


D.

All of the above


Expert Solution
Questions # 24:

While working with Netflix the movie rating websites you have developed a recommender system that has produced ratings predictions for your data set that are consistently exactly 1 higher for the user-item pairs in your dataset than the ratings given in the dataset. There are n items in the dataset. What will be the calculated RMSE of your recommender system on the dataset?

Options:

A.

1


B.

2


C.

0


D.

n/2


Expert Solution
Questions # 25:

Select the correct statement which applies to Supervised learning

Options:

A.

We asks the machine to learn from our data when we specify a target variable.


B.

Lesser machine's task to only divining some pattern from the input data to get the target variable


C.

Instead of telling the machine Predict Y for our data X, we're asking What can you tell me about X?


Expert Solution
Questions # 26:

What describes a true limitation of Logistic Regression method?

Options:

A.

It does not handle redundant variables well.


B.

It does not handle missing values well.


C.

It does not handle correlated variables well.


D.

It does not have explanatory values.


Expert Solution
Questions # 27:

You are using one approach for the classification where to teach the agent not by giving explicit categorizations, but by using some sort of reward system to indicate success, where agents might be rewarded for doing certain actions and punished for doing others. Which kind of this learning

Options:

A.

Supervised


B.

Unsupervised


C.

Regression


D.

None of the above


Expert Solution
Questions # 28:

Which analytical method is considered unsupervised?

Question # 28

may have a trend component that is quadratic in nature. Which pattern of data will indicate that the trend in the time series data is quadratic in nature?

Options:

A.

Naive Bayesian classifier


B.

Decision tree


C.

Linear regression


D.

K-means clustering


Expert Solution
Questions # 29:

Refer to exhibit

Question # 29

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1. Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C. The results of the regression are seen in the exhibit. You cannot request additional data. what is a way that you could try to increase the R2 of the model without artificially inflating it?

Options:

A.

Create clusters based on the data and use them as model inputs


B.

Force all 15 variables into the model as independent variables


C.

Create interaction variables based only on variables A, B, and C


D.

Break variables A, B, and C into their own univariate models


Expert Solution
Questions # 30:

Which of the following are advantages of the Support Vector machines?

Options:

A.

Effective in high dimensional spaces.


B.

it is memory efficient


C.

possible to specify custom kernels


D.

Effective in cases where number of dimensions is greater than the number of samples


E.

Number of features is much greater than the number of samples, the method still give good performances


F.

SVMs directly provide probability estimates


Expert Solution
Viewing page 3 out of 5 pages
Viewing questions 21-30 out of questions