cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Help - For Each Workflows Performance Use Case

Pnascima
Visitor

Hey guys, I've been going through a performance problem in my current Workflow. 

Here's my use case:

  • We have several Notebooks, each one is responsible for calculating a specific metric (just like AOV, GMV, etc)
  • I made a pipeline that creates a dataframe with all the information that I need, and at the end of the job, I create a Task that consists of a JSON transformation of that specific DF.
    • It looks like this:
    "{"experimentKey": "ExperimentName\", "startDate": "2024-10-22", "endDate": null, "status""IN PROGRESS", "country": "BR", "variations": ["control_variation", "variant_a", "variant_b", "variant_c"], "lastUpdate": "2024-10-18", "Metrics": "ctr_partner", "isPrimary": true, "isGuardrail": false}",
 
This repeats for each one of the Metrics, only changing the "Metrics" field.
 
  • So, by using the "For each" on the Workflows, it opens a notebook that has this:
    dbutils.notebook.run(
            f"/Workspace/Users/Platform/metrics_multiple_t_test/{api['Metrics']}",
            0,
            {
                "experiment_id": api["experimentKey"],
                "experiment_start": str(api["startDate"]),
                "isPrimary": api["isPrimary"],
                "isGuardrail": api["isGuardrail"],
                "metric": api["Metrics"],
                "environment": environment,
            },
  • It calls the specific metric notebook that I need, passing the necessary information as parameters. The process used to work fine using a Serverless cluster, but now it takes an eternity since I'm using a dedicated cluster. These are the specifications of the cluster:
    1-5 Workers
    64-320 GB Memory
    8-40 Cores
    1 Driver
    64 GB Memory, 8 Cores
    Runtime14.3.x-scala2.12
    Unity Catalog
    Photon
    Standard_E8as_v5
    11–33 DBU/h
     
    How can I improve this process?
 
 
0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group