cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Hritik_Moon
by New Contributor
  • 27 Views
  • 6 replies
  • 4 kudos

Accessing Spark UI in free edition

Hello, is it possible to access Spark UI in free edition, I want to check task and stages.Ultimately I am working on how to check data skewness.

  • 27 Views
  • 6 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 4 kudos

@Hritik_Moon if you're access a LAB environment for your learning, you may well be using classic compute in there. You could check the Spark UI that way. Alternatively, I think the Community Edition still has some life in it, perhaps you could sign u...

  • 4 kudos
5 More Replies
fjrodriguez
by New Contributor II
  • 3 Views
  • 0 replies
  • 0 kudos

Self Dependency TumblingWindowTrigger in adf

Hey !I would like to migrate one ADF batch ingestion which has a TumblingWindowTrigger  on top of the pipeline which pretty much check each 15 min if a file is landing, normally the files land in daily basis so will process it accordingly once in a d...

  • 3 Views
  • 0 replies
  • 0 kudos
bunny1174
by Visitor
  • 9 Views
  • 1 replies
  • 0 kudos

Spark Streaming Loading 1kto 5k rows only delta table

Hi Team,I have 4-5 millions of files in s3 files around 1.5gb data only with 9 million records, when i try to use autoloader to read the data using read stream and writing to delta table the processing is taking too much time, it is loading from 1k t...

  • 9 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @bunny1174 ,You have 4-5 millions of files in s3 and their size is 1.5gb - this clearly indicates small files problem. You need compact those files to bigger size. There's no way your pipeline will be performant if you have such many files and the...

  • 0 kudos
SuMiT1
by New Contributor II
  • 64 Views
  • 5 replies
  • 2 kudos

Flattening the json in databricks

I have chatbot data  I read adls json file in databricks and i stored the output in dataframeIn that table two columns contains json data but the data type is string1.content2.metadata Now i have to flatten the.data but i am not getting how to do tha...

  • 64 Views
  • 5 replies
  • 2 kudos
Latest Reply
SuMiT1
New Contributor II
  • 2 kudos

Hi @szymon_dybczak I gave the wrong content json valueHere is the updated one could you please tell me the code for this it would be helpful for me you gave the code already but i am getting confused so please tell me for this { "activities": [ { "va...

  • 2 kudos
4 More Replies
ext07_rvoort
by Visitor
  • 9 Views
  • 1 replies
  • 0 kudos

Databricks Asset Bundles: issue with python_file path in spark_python_task

Hi,I am trying to run a python file which is stored in src folder. However, I am getting the following error: Error: cannot update job: Invalid python file reference: src/get_git_credentials.py. Please visit the Databricks user guide for supported py...

ext07_rvoort_2-1759997838194.png
Data Engineering
DAB
Databricks Asset Bundles
  • 9 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @ext07_rvoort ,Could you try specify it in relative way:python_file: ../src/get_git_credentials.pySo, in my setup I have following directory structure:root folder:  - databricks.yml file  - src/  - resources/job.ymlAnd then to refer in job.yml to ...

  • 0 kudos
liu
by New Contributor III
  • 62 Views
  • 7 replies
  • 4 kudos

configure AWS authentication for serverless Spark

I only have an AWS Access Key ID and Secret Access Key, and I want to use this information to access S3.However, the official documentation states that I need to set the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables, but I cannot ...

  • 62 Views
  • 7 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @liu ,The proper way is to go to your cluster and in advanced section you can set them up. In that way they will be scoped at cluster level.  It's recommended to store values itself in a secret scopes as environment variables:Use a secret in a Spa...

  • 4 kudos
6 More Replies
touchyvivace
by Visitor
  • 16 Views
  • 1 replies
  • 1 kudos

is there another way to authen to azure databricks using MSI on Java

Hi I am try to connect to azure databricks using MSI on Java but on a documenthttps://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/azure-miit saidThe Databricks SDK for Java has not yet implemented Azure managed identities authentication...

  • 16 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @touchyvivace Unfortunately not, the documentation is up to date. In the Java SDK, MSI has not been implemented yet.And here's an open issue on github:[FEATURE] Add support for Azure Managed Identity authentication (system and user-assigned) · Iss...

  • 1 kudos
IONA
by New Contributor III
  • 30 Views
  • 1 replies
  • 1 kudos

Dev/Pie/Prd and the same workspace

Hi all!I'm appealing to all you folk who are clever than I for some advice on Databricks dev ops.I was asked by my team leader to expand our singular environment to a devops style dev/pie/prd system, potentially using Dabs to promote code to higher e...

  • 30 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @IONA ,I guess you can still use DABs to simulate different environments on single workspace. In targets define 3 different environments but with the same for all of them (something similar to the picture below).Then your intuition is good - it's ...

  • 1 kudos
ckanzabedian
by New Contributor
  • 24 Views
  • 1 replies
  • 1 kudos

ServiceNow LakeFlow Connector - Using TABLE API only for tables and NOT views

The current Databricks ServiceNow Lakeflow connector relies on ServiceNow REST TABLE API to capture data. And for some reason, it is unable to list a user defined view as a data source to be configured, even though ServiceNow user defined views are a...

  • 24 Views
  • 1 replies
  • 1 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 1 kudos

Hi @ckanzabedian, have you checked out the documentation yet for the ServiceNow connector?https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/servicenow-limits The link above is about the limits. I can't see a mention about ...

  • 1 kudos
excavator-matt
by New Contributor III
  • 120 Views
  • 4 replies
  • 2 kudos

How do use Databricks Lakeflow Declarative Pipeline on AWS DMS data?

Hi!I am trying to replicate an AWS RDS PostgreSQL database in Databricks. I have successfully manage to enable CDC using AWS DMS that writes an initial load file and continuous CDC files in parquet.I have been trying to follow the official guide Repl...

Data Engineering
AUTO CDC
AWS DMS
declarative pipelines
LakeFlow
  • 120 Views
  • 4 replies
  • 2 kudos
Latest Reply
mmayorga
Databricks Employee
  • 2 kudos

Hi @excavator-matt , Yes, you are correct. CloudFiles/Autoloader handles idempotency on the file level.  From the guide's perspective, the View is created from the source files in the specified location. This view captures all files and their corresp...

  • 2 kudos
3 More Replies
smoortema
by New Contributor III
  • 49 Views
  • 1 replies
  • 1 kudos

How to make FOR cycle and dynamic SQL and variables work together

I am working on a testing notebook where the table that is tested can be given as a widget. I wanted to write it in SQL. The notebook does the following steps in a cycle that should run 10 times:1. Store the starting version of a delta table in a var...

  • 49 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

Hi @smoortema , Thank you for reaching out! You are very close to getting the “start_version”; you just need to include “INTO start_version” after the “EXECUTE IMMEDIATE”. Here is the updated code BEGIN DECLARE sum INT DEFAULT 0; DECLARE start_ve...

  • 1 kudos
LonguiVic1
by New Contributor III
  • 43 Views
  • 1 replies
  • 1 kudos

Resolved! How to Find DBU Consumption and Cost for a Serverless Job?

Hello community,I'm new to using Serverless compute for my Jobs and I need some help understanding how to monitor the costs.I have configured and run a job that executes a notebook using the "Serverless" compute option. The job completed successfully...

  • 43 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @LonguiVic1 ,You can use system table to track consumption in serverless. In below article they even provide sample queries you can use. Also, notice that there's list_prices system table that includes list prices over time for each available SKU....

  • 1 kudos
Raj_DB
by New Contributor III
  • 38 Views
  • 1 replies
  • 0 kudos

Streamlining Custom Job Notifications with a Centralized Email List

Hi Everyone,I am working on setting up success/failure notifications for a large number of jobs in our Databricks environment. The manual process of configuring email notification using UI for each job individually is not scalable and is becoming ver...

  • 38 Views
  • 1 replies
  • 0 kudos
Latest Reply
mmayorga
Databricks Employee
  • 0 kudos

Hi @Raj_DB , Thank you for reaching out! You can easily achieve this by leveraging the Python SDK that is already installed within the Databricks clusters or by using the Jobs API. With the SDK, you can update each job and its corresponding “email_no...

  • 0 kudos
daan_dw
by New Contributor III
  • 35 Views
  • 1 replies
  • 0 kudos

Databricks asset bundles in Python: referencing variables

Hey,I am using DAB's and in my .yml files I can reference my variables set in my databricks.yml like this: git_branch: ${var.branch}I would like to do the same thing in my DAB's written in Python but I cannot find any documentation on how to do this....

  • 35 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @daan_dw ,To reference variables defined in your databricks.yml in Python DAB code, define your variables class and use bundle.resolve_variablehttps://docs.databricks.com/aws/en/dev-tools/bundles/python/#access-bundle-variables

  • 0 kudos
seefoods
by Valued Contributor
  • 22 Views
  • 0 replies
  • 0 kudos

DQX - datacontract cli

Hello Guyz, Someone can i combine dqx databricks rules check with datacontract cli ? If yes can we share your idea? https://gpt.datacontract.com/sources/cli.datacontract.com/Cordially, 

  • 22 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels