Databricks Databricks-Certified-Professional-Data-Scientist Exam Questions Free Practice Test

Viewing page 3 out of 5 pages

Viewing questions 21-30 out of questions

Questions # 21:

Of all the smokers in a particular district, 40% prefer brand A and 60% prefer brand B. Of those smokers who prefer brand A. 30% are females, and of those who prefer brand B. 40% are female. What is the probability that a randomly selected smoker prefers brand A, given that the person selected is a female?

Which of the following is a best way to solve this problem?

Options:

Bays Theorem

Poisson Distribution

Binomial Distribution

None of the above

Expert Solution

Questions # 22:

You are studying the behavior of a population, and you are provided with multidimensional data at the individual level. You have identified four specific individuals who are valuable to your study, and would like to find all users who are most similar to each individual. Which algorithm is the most appropriate for this study?

Options:

Association rules

Decision trees

Linear regression

K-means clustering

Expert Solution

Questions # 23:

Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has

rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 10% of the time. Which of the following will you use to calculate the probability whether it will rain on the

day of Marie’s wedding?

Options:

Naive Bayes

Logistic Regression

Random Decision Forests

All of the above

Expert Solution

Questions # 24:

While working with Netflix the movie rating websites you have developed a recommender system that has produced ratings predictions for your data set that are consistently exactly 1 higher for the user-item pairs in your dataset than the ratings given in the dataset. There are n items in the dataset. What will be the calculated RMSE of your recommender system on the dataset?

Options:

n/2

Expert Solution

Questions # 25:

Select the correct statement which applies to Supervised learning

Options:

We asks the machine to learn from our data when we specify a target variable.

Lesser machine's task to only divining some pattern from the input data to get the target variable

Instead of telling the machine Predict Y for our data X, we're asking What can you tell me about X?

Expert Solution

Questions # 26:

What describes a true limitation of Logistic Regression method?

Options:

It does not handle redundant variables well.

It does not handle missing values well.

It does not handle correlated variables well.

It does not have explanatory values.

Expert Solution

Questions # 27:

You are using one approach for the classification where to teach the agent not by giving explicit categorizations, but by using some sort of reward system to indicate success, where agents might be rewarded for doing certain actions and punished for doing others. Which kind of this learning

Options:

Supervised

Unsupervised

Regression

None of the above

Expert Solution

Questions # 28:

Which analytical method is considered unsupervised?

Question # 28

may have a trend component that is quadratic in nature. Which pattern of data will indicate that the trend in the time series data is quadratic in nature?

Options:

Naive Bayesian classifier

Decision tree

Linear regression

K-means clustering

Expert Solution

Questions # 29:

Refer to exhibit

Question # 29

You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only. After a preliminary analysis of the data, the following findings were made: 1. Multicollinearity is not an issue among the variables 2. Only three variables-A, B, and C-have significant correlation with sales You build a linear regression model on the dependent variable of sales with the independent variables of A, B, and C. The results of the regression are seen in the exhibit. You cannot request additional data. what is a way that you could try to increase the R2 of the model without artificially inflating it?

Options:

Create clusters based on the data and use them as model inputs

Force all 15 variables into the model as independent variables

Create interaction variables based only on variables A, B, and C

Break variables A, B, and C into their own univariate models

Expert Solution

Questions # 30:

Which of the following are advantages of the Support Vector machines?

Options:

Effective in high dimensional spaces.

it is memory efficient

possible to specify custom kernels

Effective in cases where number of dimensions is greater than the number of samples

Number of features is much greater than the number of samples, the method still give good performances

SVMs directly provide probability estimates

Expert Solution

Viewing page 3 out of 5 pages

Viewing questions 21-30 out of questions

Pass the Databricks Databricks Certification Databricks-Certified-Professional-Data-Scientist Questions and answers with CertsForce