Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Koko • New Contributor II

09-16-2022 8:53:05 AM

668 Views
1 replies
2 kudos

execute sql server agent jobs from Databricks notebook

Is it possible to execute sql server agent job from Databricks notebook?

Data Engineering

668 Views
1 replies
2 kudos

09-16-2022 8:53:05 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-10-2022 6:39:00 AM

2 kudos

i dont think this type of feature is available there

2 kudos

12-10-2022 6:39:00 AM

by akshay_1333 • New Contributor II

11-29-2022 8:39:34 AM

424 Views
1 replies
3 kudos

Note book formatting

I am using DBR 10.4 LTS instance can anyone help me formatting the code.I have tried with format python error pop up with upgrade to DBR 11.2 any other alternative to this?

Data Engineering

424 Views
1 replies
3 kudos

11-29-2022 8:39:34 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-10-2022 6:37:37 AM

3 kudos

please give us a code by that we can help you

3 kudos

12-10-2022 6:37:37 AM

by Ossian • New Contributor

07-21-2021 12:08:18 AM

1211 Views
1 replies
0 kudos

Driver restarts and job dies after 10-20 hours (Structured Streaming)

I am running a java/jar Structured Streaming job on a single node cluster (Databricks runtime 8.3). The job contains a single query which reads records from multiple Azure Event Hubs using Spark Kafka functionality and outputs results to a mssql dat...

Data Engineering

1211 Views
1 replies
0 kudos

07-21-2021 12:08:18 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-10-2022 6:31:30 AM

0 kudos

its seems that when your nodes are increasing it is seeking for init script and it is failing so you can use reserve instances for this activity instead of spot instances it will increase your overall costor alternatively, you can use depended librar...

0 kudos

12-10-2022 6:31:30 AM

by Pragat • New Contributor

07-18-2022 4:49:25 AM

707 Views
1 replies
0 kudos

Databricks job parameterization

I am configuring an Databricks jobs using multiple notebooks having dependency with each other. All the notebooks are parameterized and using similiar parameters. How can i configure the parameter on global level so that all the notebooks can consume...

Data Engineering

707 Views
1 replies
0 kudos

07-18-2022 4:49:25 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-10-2022 6:28:11 AM

0 kudos

actually, it is very hard but if you want to use an alternative option you have to change your code and use a widget feature of data bricks.May be this is not a right option but you can still explore this doc for testing purpose https://docs.databric...

0 kudos

12-10-2022 6:28:11 AM

by Netty • New Contributor III

12-09-2022 12:07:06 PM

2126 Views
1 replies
2 kudos

What's the crontab notation for every other week for Databricks Workflow scheduling?

Hello,I need to schedule some of my jobs within Databricks Workflow every other week (or every 4 weeks). I've scoured a few forums for find what this notation would be, but I've been unfruitful in my searches.Is this scheduling possible in crontab? I...

Data Engineering

2126 Views
1 replies
2 kudos

12-09-2022 12:07:06 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-10-2022 5:05:36 AM

2 kudos

For every seven days starting from Monday, you need to use 2/7. From my experience, that generator works best with databricks https://www.freeformatter.com/cron-expression-generator-quartz.html

2 kudos

12-10-2022 5:05:36 AM

by Heman2 • Valued Contributor II

12-06-2022 3:51:15 AM

1204 Views
6 replies
19 kudos

Can anyone let me know, Is there anyway In which we can access different workspace delta tables in a workspace where we run the pipelines using python...

Can anyone let me know, Is there anyway In which we can access different workspace delta tables in a workspace where we run the pipelines using python?

Data Engineering

1204 Views
6 replies
19 kudos

12-06-2022 3:51:15 AM

View Replies

Latest Reply

Harish2122
Contributor

12-10-2022 2:07:22 AM

19 kudos

@Hemanth A go to the workspace you want data from, in warehouse tab you will find connectivity in that copy host name, http path and generate token for it, by this credentials you can access the data of this workspace in any other workspace.

19 kudos

12-10-2022 2:07:22 AM

5 More Replies

by dulu • New Contributor III

12-03-2022 1:01:27 AM

1617 Views
3 replies
14 kudos

Resolved! How to count the number of campaigns per day based on the start and end dates of the campaigns in SQL Spark Databrick

I need to count the number of campaigns per day based on the start and end dates of the campaignsInput Table: Out needed (result):How do I need to write the SQL command in databricks to get the above result? thanks all

Data Engineering

1617 Views
3 replies
14 kudos

12-03-2022 1:01:27 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-03-2022 12:26:18 PM

14 kudos

Just create an array with sequence, explode it, and then group and count:WITH cte AS (SELECT `campaign name`, explode(sequence(`Start date`, `End date`, interval 1 day)) as `Date` FROM `campaigns`) SELECT Count(`campaign name`) as `count uni...

14 kudos

12-03-2022 12:26:18 PM

2 More Replies

by 183530 • New Contributor III

12-09-2022 2:05:18 PM

217 Views
0 replies
1 kudos

Needed a regex to (CC)

SELECT '(CC) ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST1, 'A(CC) ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST2, 'A (CC)A ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST3, 'A (CC) A ABC' REGEXP '\\b\\(CC\\)\\b' AS TEST4, 'A ABC (CC)' REGEXP '\\b\\(CC\\)\\b' AS TES...

Data Engineering

217 Views
0 replies
1 kudos

12-09-2022 2:05:18 PM

by seberino • New Contributor III

12-09-2022 1:26:06 PM

645 Views
0 replies
1 kudos

How revoke SELECT permissions on a table in Data Explorer when it only lets me revoke new explicit grants I've added myself?

I'm able to make it to the Permission page of the schema and table I'm trying to do access control on within the Data Explorer page.At first you can only grant permissions but not revoke anything. Only after you have made new grants can you revoke w...

Data Engineering

645 Views
0 replies
1 kudos

12-09-2022 1:26:06 PM

by andrew0117 • Contributor

12-08-2022 7:33:23 AM

524 Views
1 replies
2 kudos

How to sync the meta store info with the real data for external delta table

if I manually delete some parque files in location which the real data is stored in, so spark catalog still has the old version. How can I sync them?Thanks!

Data Engineering

524 Views
1 replies
2 kudos

12-08-2022 7:33:23 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

12-09-2022 7:56:34 AM

2 kudos

You just need to create a new table and specify the location of the data for your case it's going to be an ADLS, S3...ExampleCreate table customer using delta location 'mnt/data./'

2 kudos

12-09-2022 7:56:34 AM

by KellenO • New Contributor II

12-08-2022 12:30:22 PM

1179 Views
2 replies
8 kudos

Resolved! How can I use cluster autoscaling with intensive subprocess calls?

I have a custom application/executable that I upload to DBFS and transfer to my cluster's local storage for execution. I want to call multiple instances of this application in parallel, which I've only been able to successfully do with Python's subpr...

Data Engineering

1179 Views
2 replies
8 kudos

12-08-2022 12:30:22 PM

View Replies

Latest Reply

Anonymous
Not applicable

12-08-2022 4:18:17 PM

8 kudos

Autoscaling works for spark jobs only. It works by monitoring the job queue, which python code won't go into. If it's just python code, try single node.https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling

8 kudos

12-08-2022 4:18:17 PM

1 More Replies

by Taha_Hussain • Valued Contributor II

08-10-2022 3:54:24 PM

4554 Views
5 replies
5 kudos

Connect a BI Tool: How do I access my lakehouse data from my BI tool?

You can find a rich ecosystem of tools that allow you to work with all your data in-place and deliver real-time business insights faster.This post will help you connect your existing tools like dbt, Fivetran, PowerBI, Tableau or SAP to ingest, transf...

Data Engineering

4554 Views
5 replies
5 kudos

08-10-2022 3:54:24 PM

View Replies

Latest Reply

Axserv
New Contributor II

12-09-2022 5:34:49 AM

5 kudos

Hello Taha, here is a fairly recent video provided by Databricks on conncecting Power BI : Demo Video: Connect to Power BI Desktop from Databricks - YouTube

5 kudos

12-09-2022 5:34:49 AM

4 More Replies

by ranged_coop • Valued Contributor II

12-09-2022 4:18:29 AM

758 Views
2 replies
3 kudos

Equivalent Machine Types between Databricks on Azure and GCP

Hi All,Hope everyone is doing well.We are currently validating Databricks on GCP and Azure.We have a python notebook that does some ETL (Copy, extract zip files and process files within the zip files)Our Cluster Config on AzureDBX Runtime - 10.4 - Dr...

Data Engineering

758 Views
2 replies
3 kudos

12-09-2022 4:18:29 AM

View Replies

Latest Reply

ranged_coop
Valued Contributor II

12-09-2022 5:26:04 AM

3 kudos

hi @Tunde Abib , I have gone through the links while updating, but did not see any major documented slow downs mentioned in them.

3 kudos

12-09-2022 5:26:04 AM

1 More Replies

by Sujitha • Community Manager

12-09-2022 12:20:05 AM

872 Views
6 replies
5 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...

Data Engineering

872 Views
6 replies
5 kudos

12-09-2022 12:20:05 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-09-2022 1:03:07 AM

5 kudos

Thanks for sharing @Sujitha Ramamoorthy

5 kudos

12-09-2022 1:03:07 AM

5 More Replies

by Netty • New Contributor III

12-08-2022 9:16:44 AM

2386 Views
5 replies
7 kudos

Resolved! What's the easiest way to upsert data into a table? (Azure ADLS Gen2)

I had been trying to upsert rows into a table in Azure Blob Storage (ADLS Gen 2) based on two partitions (sample code below). insert overwrite table new_clicks_table partition(client_id, mm_date) select click_id ,user_id ,click_timestamp_gmt ,ca...

Data Engineering

2386 Views
5 replies
7 kudos

12-08-2022 9:16:44 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-08-2022 11:37:54 PM

7 kudos

Below code might help youPython- (df.write .mode("overwrite") .option("partitionOverwriteMode", "dynamic") .saveAsTable("default.people10m") ) SQL- SET spark.sql.sources.partitionOverwriteMode=dynamic; INSERT OVERWRITE TABLE default.people10m...

7 kudos

12-08-2022 11:37:54 PM

4 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

execute sql server agent jobs from Databricks notebook

Note book formatting

Driver restarts and job dies after 10-20 hours (Structured Streaming)

Databricks job parameterization

What's the crontab notation for every other week for Databricks Workflow scheduling?

Can anyone let me know, Is there anyway In which we can access different workspace delta tables in a workspace where we run the pipelines using python...

Resolved! How to count the number of campaigns per day based on the start and end dates of the campaigns in SQL Spark Databrick

Needed a regex to (CC)

How revoke SELECT permissions on a table in Data Explorer when it only lets me revoke new explicit grants I've added myself?

How to sync the meta store info with the real data for external delta table

Resolved! How can I use cluster autoscaling with intensive subprocess calls?

Connect a BI Tool: How do I access my lakehouse data from my BI tool?

Equivalent Machine Types between Databricks on Azure and GCP

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

Resolved! What's the easiest way to upsert data into a table? (Azure ADLS Gen2)

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...