DEA-C02問題集、Snowflake実際の試験問題

質問 1

You are implementing row access policies on a 'SALES DATA table to restrict access based on the 'REGION' column. Different users are allowed to see data only for specific regions. You have a mapping table 'USER REGION MAP' with columns 'USERNAME' and 'REGION'. You want to create a row access policy that dynamically filters the 'SALES DATA' based on the user and their allowed region. Which of the following options represents a correct approach to create and apply this row access policy?

A. Option A

B. Option C

C. Option E

D. Option B

E. Option D

正解: D

解説: (PassTest メンバーにのみ表示されます)

質問 2

You have a Snowflake table, 'CUSTOMER ORDERS', with columns like 'CUSTOMER ID', 'ORDER DATE', 'ORDER AMOUNT', and 'REGION'. A Bl dashboard relies on a query that aggregates data from this table, but the query performance is consistently slow. The query frequently filters by 'ORDER DATE and groups by 'REGION'. Based on the following 'EXPLAIN' output, which combination of techniques should be considered to improve the performance the most?

A. Increase the virtual warehouse size to 'LARGE or 'XLARGE.

B. Cluster the 'CUSTOMER ORDERS table on 'ORDER DATE' and 'REGION'.

C. Create a materialized view that pre-aggregates the data by 'ORDER DATE and 'REGION'

D. Redesign the dashboard to minimize the data being displayed at once to the user.

E. Create an index on the 'ORDER DATE column.

正解: B,C

解説: (PassTest メンバーにのみ表示されます)

質問 3

You're designing a near real-time data pipeline for clickstream data using Snowpipe Streaming. The data volume is extremely high, with bursts exceeding 1 million events per second. Your team reports intermittent ingestion failures and latency spikes. Considering the constraints of Snowpipe Streaming, which of the following strategies would be MOST effective in mitigating these issues, assuming the data format is optimized and network latency is minimal?

A. Implement a message queue (e.g., Kafka) in front of Snowpipe Streaming to buffer incoming events and smooth out the traffic spikes.

B. Switch from Snowpipe Streaming to Classic Snowpipe, as it is more resilient to high data volumes.

C. Reduce the size of each micro-batch being sent to Snowpipe Streaming to minimize the impact of individual failures.

D. Increase the number of Snowflake virtual warehouses to handle the increased load.

E. Implement client-side retry logic with exponential backoff and jitter to handle transient errors and avoid overwhelming the service.

正解: A,E

解説: (PassTest メンバーにのみ表示されます)

質問 4

You have a table 'EMPLOYEE DATA' containing Personally Identifiable Information (PII), including 'salary' and 'email'. You need to implement column-level security such that: 1) The 'salary' column is only visible to users in the 'FINANCE ROLE. 2) The 'email' column is masked with a SHA256 hash for all users except those in the 'HR ROLE. You create the following masking policies:

Which of the following SQL statements correctly applies these masking policies to the 'EMPLOYEE DATA table?

A. ALTER TABLE EMPLOYEE_DATA MODIFY COLUMN salary SET MASKING POLICY mask_salary; ALTER TABLE EMPLOYEE_DATA MODIFY COLUMN email SET MASKING POLICY mask email;

B. ALTER TABLE EMPLOYEE_DATAALTER COLUMN salary SET MASKING POLICY mask_salary; ALTER TABLE EMPLOYEE_DATAALTER COLUMN email SET MASKING POLICY mask email;

C. ALTER TABLE EMPLOYEE_DATAAPPLY MASKING POLICY mask_salary ON COLUMN salary; ALTER TABLE EMPLOYEE_DATAAPPLY MASKING POLICY mask email ON COLUMN email;

D. ALTER TABLE EMPLOYEE_DATA MODIFY COLUMN salary SET MASKING POLICY = mask_salary; ALTER TABLE EMPLOYEE_DATA MODIFY COLUMN email SET MASKING POLICY = mask email;

E. CREATE OR REPLACE TAG employee_data.salary VALUE 'mask_salary'; CREATE OR REPLACE TAG employee_data.email VALUE 'mask_email';

正解: B

解説: (PassTest メンバーにのみ表示されます)

質問 5

You are designing a data pipeline that involves unloading large amounts of data (hundreds of terabytes) from Snowflake to AWS S3 for archival purposes. To optimize cost and performance, which of the following strategies should you consider? (Select ALL that apply)

A. Use a large Snowflake warehouse size to parallelize the unload operation and reduce the overall unload time.

B. Choose a file format such as Parquet or ORC with compression enabled to reduce storage costs and improve query performance in S3.

C. Partition the data during the unload operation based on a high-cardinality column to maximize parallelism in S3.

D. Enable client-side encryption with KMS in S3 and specify the encryption key in the 'COPY INTO' command to enhance security.

E. Utilize the 'MAX FILE SIZE parameter in the 'COPY INTO' command to control the size of individual files unloaded to S3. Smaller files generally improve query performance in S3.

正解: A,B,D

解説: (PassTest メンバーにのみ表示されます)

質問 6

You are using Snowpark to perform a complex join operation between two large tables: 'ORDERS (1 OOGB) and 'CUSTOMER (50GB). The join is performed on 'ORDERS.CUSTOMER ID = CUSTOMER.ID. The query is running slower than expected. You have already confirmed that the warehouse size is adequate. Which of the following strategies, applied in combination , would most likely improve the join performance within a Snowpark context?

A. Increase the 'AUTO RESIZE' setting on the warehouse to automatically scale up the warehouse size when the load increases.

B. Use 'session.add_import to add external JAR dependencies. This would enable use of external libraries and improve performance.

C. Analyze the query profile in Snowflake's web UI to identify the specific bottleneck (e.g., excessive data spilling, high CPU utilization) and address it directly.

D. Use Snowpark's 'hint function to force a broadcast join, assuming the 'CUSTOMER table can fit into memory on the worker nodes.

E. Ensure both tables are clustered on the join keys CORDERS.CUSTOMER_ID' and 'CUSTOMER.ID').

正解: C,E

解説: (PassTest メンバーにのみ表示されます)

質問 7

You have a table 'CUSTOMERS' with columns 'CUSTOMER ID', 'FIRST NAME', 'LAST NAME, and 'EMAIL'. You need to transform this data into a semi-structured JSON format and store it in a VARIANT column named 'CUSTOMER DATA' in a table called 'CUSTOMER JSON'. The desired JSON structure should include a root element 'customer' containing 'id', 'name', and 'contact' fields. Which of the following SQL statements, used in conjunction with a CREATE TABLE and INSERT INTO statement for CUSTOMER JSON, correctly transforms the data?

A. Option A

B. Option C

C. Option E

D. Option B

E. Option D

正解: A

解説: (PassTest メンバーにのみ表示されます)

質問 8

You are planning to monetize a dataset on the Snowflake Marketplace. You want to provide potential customers with sample data to evaluate before they purchase a full subscription. Which of the following strategies are valid and recommended for offering a free sample of your data within the Snowflake Marketplace? (Select all that apply)

A. Create a view that filters the dataset based on a sampling algorithm (e.g., 'SAMPLE ROW' clause) and share the view through the Marketplace.

B. Upload a sample CSV file to a publicly accessible S3 bucket and provide the link in the Marketplace listing description. Consumers can download and load this data into their own Snowflake account for evaluation.

C. Create a separate share containing a subset (e.g., a smaller number of rows or columns) of the full dataset and offer this share as a free trial listing on the Marketplace.

D. Offer a 'free trial' subscription on the primary listing that automatically expires after a set period (e.g., 7 days), allowing customers to access the full dataset during the trial period. You will need to write custom code to manage trial expiration and data access restrictions based on the trial status.

E. Provide the consumer with the script to create a database link to your data, allowing them read-only access to a pre-defined sample table, and then revoke the access after a set period.

正解: A,C

解説: (PassTest メンバーにのみ表示されます)

質問 9

A Snowflake data warehouse contains a table named 'SALES TRANSACTIONS' with the following columns: 'TRANSACTION ID', 'PRODUCT D', 'CUSTOMER D', 'TRANSACTION DATE, and 'SALES AMOUNT'. You need to optimize a query that calculates the total sales amount per product for a given month. The 'SALES TRANSACTIONS' table is very large (billions of rows), and queries are slow. Given the following initial query: SELECT PRODUCT ID, SUM(SALES AMOUNT) AS TOTAL SALES FROM SALES TRANSACTIONS WHERE TRANSACTION DATE BETWEEN '2023-01-07' AND '2023-01-31' GäOUP BY PRODUCT ID; Which of the following actions, when combined, would MOST effectively improve the performance of this query?

A. Increase the virtual warehouse size to the largest available size.

B. Create a clustering key on 'PRODUCT_ID and 'TRANSACTION_DATE columns in the 'SALES_TRANSACTIONS' table.

C. Create a temporary table with the results of the query and query that table instead.

D. Create a materialized view that pre-aggregates the total sales amount per product and month.

E. Convert the column to a VARCHAR data type.

正解: B,D

解説: (PassTest メンバーにのみ表示されます)

質問 10

A data engineer is tasked with optimizing a Snowflake data pipeline that ingests data from multiple external sources, transforms it, and loads it into a reporting table. The pipeline uses a series of Snowflake tasks orchestrated with a root task and child tasks. Performance monitoring shows inconsistent execution times for the transformation tasks. Which of the following strategies would provide the MOST granular insights into the performance bottlenecks within the pipeline and allow for targeted optimization?

A. Enable query profiling for all queries executed within the transformation tasks using 'ALTER SESSION SET QUERY PROFILE = 'ON" , then analyze the query profiles for performance bottlenecks after each task run.

B. Leverage Snowflake's event tables like QUERY HISTORY and TASK HISTORY in the ACCOUNT USAGE schema joined with custom metadata tags to correlate specific transformation steps to execution times and resource usage. Also set up alerting based on defined performance thresholds.

C. Use Snowflake's Resource Monitors to track overall warehouse consumption and assume that high consumption during transformation tasks indicates a bottleneck within those tasks.

D. Implement a custom logging mechanism within the transformation tasks to record execution times for each stage of the transformation process, and store these logs in a Snowflake table for analysis.

E. Rely solely on the Snowflake web UI's Task History view to identify slow-running tasks.

正解: B

解説: (PassTest メンバーにのみ表示されます)

質問 11

You are building a data pipeline in Snowflake using Snowpark Python. As part of the pipeline, you need to create a dynamic SQL query to filter records from a table named 'PRODUCT REVIEWS based on a list of product categories. The list of categories is passed to a stored procedure as a string argument, where categories are comma separated. The filtered data needs to be further processed within the stored procedure. Which of the following approaches are MOST efficient and secure ways to construct and execute this dynamic SQL query using Snowpark?

A. Using Python's string formatting along with the and 'session.sql()' functions to build and execute the SQL query securely, avoiding SQL injection vulnerabilities.

B. Using Python's string formatting to build the SQL query directly, and then executing it using 'session.sql()'.

C. Using the Snowpark "functions.lit()' function to create literal values from the list of product categories and incorporating them into the SQL query, then use 'session.sql()' to run it.

D. Using Snowpark's on the list of product categories after converting them into a Snowflake array, and then using 'session.sql()' to execute the query.

E. Constructing the SQL query using 'session.sql()' and string concatenation, ensuring proper escaping of single quotes within the product categories string.

正解: A,C

解説: (PassTest メンバーにのみ表示されます)

質問 12

You are designing a data pipeline using Snowpipe to ingest data from multiple S3 buckets into a single Snowflake table. Each S3 bucket represents a different data source and contains files in JSON format. You want to use Snowpipe's auto-ingest feature and a single Snowpipe object for all buckets to simplify management and reduce overhead. However, each data source has a different JSON schem a. How can you best achieve this goal while ensuring data is loaded correctly and efficiently into the target table?

A. Create a separate Snowpipe for each S3 bucket. Although this creates more Snowpipe objects, it allows you to specify a different FILE FORMAT and transformation logic for each data source.

B. Use a single Snowpipe with a generic FILE FORMAT that can handle all possible JSON schemas. Implement a VIEW on top of the target table to transform and restructure the data based on the source bucket.

C. Use a single Snowpipe and leverage Snowflake's ability to call a user-defined function (UDF) within the 'COPY INTO' statement to transform the data based on the S3 bucket path. The UDF can parse the bucket path and apply the appropriate JSON schema transformation.

D. Use a single Snowpipe and leverage Snowflake's VARIANT data type to store the raw JSON data. Create separate external tables, each pointing to a specific S3 bucket, and use SQL queries to transform and load the data into the target table.

E. Since Snowpipe cannot handle multiple schemas with a single pipe, pre-process the data in S3 using an AWS Lambda function to transform all files into a common schema before they are ingested by the Snowpipe.

正解: C

解説: (PassTest メンバーにのみ表示されます)

質問 13

You are tasked with setting up a Kafka Connector to ingest data into Snowflake. You need to ensure fault tolerance. Which of the following Kafka Connect configurations are essential for enabling fault tolerance and ensuring minimal data loss during connector failures? Select all that apply.

A. Enable Kafka Connect's internal offset storage by configuring 'offset.storage.topic' and 'config.storage.topic'.

B. Configure 'errors.tolerance' to 'all'.

C. Configure 'errors.deadletterqueue.topic.name' to specify a Dead Letter Queue (DLQ) topic.

D. Utilize Snowflake's auto-ingest feature alongside the Kafka Connector.

E. Set 'tasks.max' to a value greater than 1.

正解: A,C,E

解説: (PassTest メンバーにのみ表示されます)

質問 14

A financial services company, 'Acme Finance', wants to share aggregated, anonymized transaction data with a research firm, 'Data Insights', through a Snowflake Data Clean Room. Acme Finance needs to ensure that Data Insights can only analyze the data using pre- defined aggregate functions and cannot access the raw, underlying transactional details. Acme Finance has already created a secure view to share the aggregated data'. Which of the following steps are necessary to grant Data Insights access to the data securely while enforcing the required restrictions?

A. Grant SELECT privilege on the secure view directly to the role used by Data Insights' Snowflake account.

B. Create an external function that Data Insights can call to execute pre-approved aggregate functions on the underlying data. Grant USAGE on the function to Data Insights' role and create a secure view that uses that function.

C. Create a row access policy that restricts the rows returned based on the role used by Data Insights. Then, grant SELECT privilege on the secure view directly to the role used by Data Insights' Snowflake account.

D. Create a share object and grant USAGE privilege on the database containing the secure view to the share. Then, grant SELECT privilege on the secure view to the share. Finally, share the share with Data Insights' Snowflake account using their account identifier.

E. Create a masking policy that only allows aggregate functions to be executed by Data Insights' role and apply it to the relevant columns in the underlying table. Then, grant SELECT privilege on the secure view directly to the role used by Data Insights' Snowflake account.

正解: D

解説: (PassTest メンバーにのみ表示されます)

質問 15

You have created a masking policy called which redacts salary information based on the user's role. You have applied this policy to the 'SALARY column in the 'EMPLOYEES table. However, after applying the policy, you notice that even users with the 'ACCOUNTADMIN' role are seeing the masked data, which is not the intended behavior. The intention is that 'ACCOUNTADMIN' and 'SECURITYADMIN' roles should always see the real salary data'. What is the MOST likely cause of this issue and what would you suggest fix that?

A. The masking policy is not properly activated. Run the ALTER TABLE EMPLOYEES MODIFY COLUMN SALARY SET MASKING POLICY salary_mask' command again.

B. The 'ACCOUNTADMIW and 'SECURITYADMIIV roles do not have the 'APPLY MASKING POLICY privilege. Grant this privilege to the roles.

C. The 'ACCOUNTADMIN' role does not have the 'OWNERSHIP' privilege on the table. Grant the 'OWNERSHIP' privilege to 'ACCOUNTADMIN' on the 'EMPLOYEES' table.

D. The masking policy does not explicitly exclude the 'ACCOUNTADMIN' and 'SECURITYADMIN' roles. Modify the masking policy to include a condition that checks for these roles and returns the original value if they are active. e.g., 'CASE WHEN IN ('ACCOUNTADMIN', 'SECURITYADMIN') THEN val ELSE END'

E. The 'ACCOUNTADMIN' and roles need to have 'SELECT' privilege on the 'SNOWFLAKACCOUNT USAGE.MASKING POLICIES view

正解: D

解説: (PassTest メンバーにのみ表示されます)

Snowflake SnowPro Advanced: Data Engineer (DEA-C02) - DEA-C02 模擬練習