Databricks-Machine-Learning-Professional問題集、Databricks実際の試験問題

質問 1

A data scientist has developed and logged a Spark ML random forest model model, and then they ended their Spark session and terminated their cluster. After starting a new cluster, they want to review the featureImportances of the original model object. Which lines of code can be used to restore the model object so that featureImportances is available?

A. This can only be viewed in the MLflow Experiments UI

B. mlflow.sklearn.load_model(model_uri)

C. mlflow.load_model(model_uri)

D. client.pyfunc.load_model(model_uri)

E. client.list_artifacts(run_id)["feature-importances.csv"]

正解: B

質問 2

A machine learning engineer wants to deploy a model for real-time serving using MLflow Model Serving. For the model, the machine learning engineer currently has one model version in each of the stages in the MLflow Model Registry. The engineer wants to know which model versions can be queried once Model Serving is enabled for the model. Which of the following lists all of the MLflow Model Registry stages whose model versions are automatically deployed with Model Serving?

A. Staging, Production

B. Staging, Production, Archived

C. None, Staging, Production

D. Production

E. None, Staging, Production, Archived

正解: A

質問 3

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches. Which tool can be used to provide this type of continuous processing?

A. Delta Lake

B. MLflow

C. Spark UDFs

D. Structured Streaming

正解: C

質問 4

Which of the following MLflow operations can be used to automatically calculate and log a Shapley feature importance plot?

A. mlflow.shap

B. mlflow.shap.log_explanation

C. None of these operations can accomplish the task.

D. client.log_artifact

E. mlflow.log_figure

正解: A

質問 5

A Machine Learning Engineer needs to deploy a production ML workflow that includes an MLflow experiment for tracking model training runs, a registered model in Unity Catalog for version management, and a model serving endpoint for real-time inference. The team requires a unified configuration approach that ensures consistent deployment across development and production environments while adhering to infrastructure-as-code best practices. Which approach should the Machine Learning Engineer use to define all three components together?

A. Use Terraform to provision the infrastructure and then manually configure each ML component through the Databricks UI.

B. Define experiments, registered_models, and model_serving_endpoints resources in a Databricks Asset Bundle (DAB) configuration file.

C. Use MLflow's deployment tools to create individual deployment configurations for each component, then orchestrate them using Databricks Jobs.

D. Create separate REST API calls for each component and run them sequentially using a shell script.

正解: B

解説: (PassTest メンバーにのみ表示されます)

質問 6

A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables. Which of the following tools can the machine learning engineer use to assess their theory?

A. One-way Chi-squared Test

B. None of these

C. Two-way Chi-squared Test

D. Kolmogorov-Smirnov (KS) test

E. Jenson-Shannon distance

正解: A

質問 7

How can you save a trained Spark ML PipelineModel?

A. pipeline.save()

B. pipelineModel.write().save()

C. pipeline.persist()

D. pipeline.store()

正解: B

解説: (PassTest メンバーにのみ表示されます)

質問 8

After a data scientist noticed that a column was missing from a production feature set stored as a Delta table, the machine learning engineering team has been tasked with determining when the column was dropped from the feature set. Which SQL command can be used to accomplish this task?

A. TIMESTAMP

B. DESCRIBE

C. DESCRIBE HISTORY

D. HISTORY

E. VERSION

正解: C

質問 9

A data scientist has developed a scikit-learn model sklearn_model and they want to log the model using MLflow.
They write the following incomplete code block:

Which lines of code can be used to fill in the blank so the code block can successfully complete the task?

A. mlflow.sklearn.log_model(sklearn_model, "model")

B. mlflow.sklearn.track_model(sklearn_model, "model")

C. mlflow.sklearn.load_model("model")

D. mlflow.spark.log_model(sklearn_model, "model")

E. mlflow.spark.track_model(sklearn_model, "model")

正解: E

質問 10

A Machine Learning Engineer has a real-time fraud detection model deployed that approves or blocks millions of transactions daily. They need to deploy a new version of the model with improved detection accuracy to this high-traffic, business-critical application. Because any model downtime could result in lost revenue or customer dissatisfaction, the engineer must ensure zero downtime and minimal disruption for end users. Leadership also requires that any rollback to the previous version be immediate if issues are detected with the new model in production. Which deployment strategy meets these requirements?

A. Use a blue-green deployment for the new model, maintaining two separate production environments (one "blue," one "green") and switching user traffic to the new version only after confirming it is healthy, enabling instant rollback if needed.

B. Deploy the new model in parallel with the old version, monitor both for a set period, and then notify all stakeholders to manually switch over to the new version at a coordinated time.

C. Use a canary deployment by initially routing a small percentage of user traffic to the new model version, monitoring results, and gradually increasing exposure until all traffic uses the new version if no problems are detected.

D. Replace the current production model with the new version during a scheduled maintenance window, notify affected users of potential brief disruptions, and prepare to reroute requests to a backup if failures occur after deployment.

正解: A

解説: (PassTest メンバーにのみ表示されます)

質問 11

A Machine Learning Engineer has deployed a fraud detection model that processes 10,000 transactions per hour. The model was trained on data from Q1 2024, but it's now Q4 2024. The ML team notices three concerning trends: (1) the model's precision has dropped from 92% to
78% over the past month, (2) the average transaction amount in recent data has increased from
$150 to $220 and (3) the relationship between transaction frequency and fraud likelihood has weakened significantly due to new payment methods being introduced. The engineer needs to implement a monitoring solution that can detect the root cause of the performance degradation, identify why the precision dropped, and be able to do this at the scale needed. Which monitoring pipeline component will do this?

A. Logging and Monitoring Pipeline because it captures all prediction requests and can identify patterns in the fraud predictions.

B. Model Health Monitoring Pipeline because it ensures the infrastructure can handle the 10,000 transactions per hour load.

C. Model Performance Monitoring Pipeline because it directly tracks precision metrics and can alert when thresholds are breached.

D. Drift Detection Pipeline because it can identify both the data drift (transaction amount changes) and concept drift (weakened relationship between features and target).

正解: D

解説: (PassTest メンバーにのみ表示されます)

Databricks Certified Machine Learning Professional - Databricks-Machine-Learning-Professional 模擬練習