Data Engineering

Forum Posts

Sorted by:

by george2020 • New Contributor II

03-18-2022 10:22:57 AM

1687 Views
0 replies
2 kudos

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

I am having an issue with Github Actions workflow using the Databricks Repos API. We want the API call in the Git Action to bring the Repo in our Databricks Repos Top-level folder to the latest version on a merge into the main branch.The Github Actio...

Data Engineering

1687 Views
0 replies
2 kudos

03-18-2022 10:22:57 AM

by RicksDB • Contributor III

02-02-2022 11:00:45 AM

7102 Views
3 replies
6 kudos

Resolved! Restricting file upload to DBFS

Hi,Is it possible to restrict upload files to dfbs root (Since everyone has access) ? The idea is to force users to use an ADLS2 mnt with credential passthrough for security reasons.Also, right now users use azure blob explorer to interact with ADLS2...

Data Engineering

7102 Views
3 replies
6 kudos

02-02-2022 11:00:45 AM

View Replies

Latest Reply

User16764241763
Databricks Employee

03-18-2022 8:49:27 AM

6 kudos

Hello @E H You can disable DBFS file browser in the workspace, if users directly upload from there. This will prevent uploads to DBFS.https://docs.databricks.com/administration-guide/workspace/dbfs-browser.html Please let us know if this solution wo...

6 kudos

03-18-2022 8:49:27 AM

2 More Replies

by wyzer • Contributor II

02-17-2022 5:59:29 AM

5486 Views
2 replies
3 kudos

Resolved! Insert data into an on-premise SQL Server

Hello,Is it possible to insert data from Databricks into an on-premise SQL Server ?Thanks.

Data Engineering

5486 Views
2 replies
3 kudos

02-17-2022 5:59:29 AM

View Replies

Latest Reply

wyzer
Contributor II

03-18-2022 8:19:18 AM

3 kudos

Hello,Yes we find out how to do it by installing a JDBC connector.It works fine.Thanks.

3 kudos

03-18-2022 8:19:18 AM

1 More Replies

by Soma • Valued Contributor

03-18-2022 3:39:16 AM

5032 Views
3 replies
5 kudos

Resolved! Enable custom Ipython Extension

How to enable custom Ipython Extension on Databricks Notebook Start

Data Engineering

5032 Views
3 replies
5 kudos

03-18-2022 3:39:16 AM

View Replies

Latest Reply

Soma
Valued Contributor

03-18-2022 5:24:01 AM

5 kudos

I want to load custom extensions which I create like custom call back events on cell runhttps://ipython.readthedocs.io/en/stable/config/callbacks.html

5 kudos

03-18-2022 5:24:01 AM

2 More Replies

by emanuele_maffeo • Databricks Partner

03-17-2022 7:55:24 AM

6358 Views
5 replies
8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

Data Engineering

6358 Views
5 replies
8 kudos

03-17-2022 7:55:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-17-2022 11:53:21 AM

8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

8 kudos

03-17-2022 11:53:21 AM

4 More Replies

by alonisser • Contributor II

03-14-2022 2:42:21 PM

4027 Views
3 replies
4 kudos

Resolved! How to migrate an existing workspace for an external metastore

Currently we're on an azure databricks workspace, we've setup during the POC, a long time ago. In the meanwhile we have built quite a production workload above databricks.Now we want to split workspaces - one for analysts and one for data engineeri...

Data Engineering

4027 Views
3 replies
4 kudos

03-14-2022 2:42:21 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-14-2022 3:30:36 PM

4 kudos

From databricks notebook just run mysqldump. Server address and details you can take from logs or configuration.I am including also link to example notebook https://docs.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/2016-election-tweets.h...

4 kudos

03-14-2022 3:30:36 PM

2 More Replies

by USHAK • New Contributor II

03-14-2022 6:07:46 PM

1562 Views
1 replies
0 kudos

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed ...

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed without entering voucher. I do not have voucher.Please help

Data Engineering

1562 Views
1 replies
0 kudos

03-14-2022 6:07:46 PM

View Replies

Latest Reply

USHAK
New Contributor II

03-17-2022 8:04:38 AM

0 kudos

Can someone Please respond to my above question ? Can i write certification test without Voucher ?

0 kudos

03-17-2022 8:04:38 AM

by Jeff1 • Contributor II

03-07-2022 11:53:52 AM

15956 Views
3 replies
4 kudos

Resolved! How to convert lat/long to geohash in databricks using geohashTools R library

I continues to receive a parsing error when attempting to convert lat/long data to a geohash in data bricks . I've tried two coding methods in R and get the same error.library(geohashTools)Method #1my_tbl$geo_hash <- gh_encode(my_tbl$Latitude, my_tbl...

Data Engineering

15956 Views
3 replies
4 kudos

03-07-2022 11:53:52 AM

View Replies

Latest Reply

Jeff1
Contributor II

03-17-2022 7:05:31 AM

4 kudos

The problem was I was trying to run the gh_encode function on a Spark dataframe. I needed to collect the date into a R dataframe then run the function.

4 kudos

03-17-2022 7:05:31 AM

2 More Replies

by manasa • Databricks Partner

03-15-2022 8:33:42 AM

21413 Views
3 replies
7 kudos

Resolved! How to set retention period for a delta table lower than the default period? Is it even possible?

I am trying to set retention period for a delta by using following commands.deltaTable = DeltaTable.forPath(spark,delta_path)spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")deltaTable.logRetentionDuration = "interval 1...

Data Engineering

21413 Views
3 replies
7 kudos

03-15-2022 8:33:42 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-15-2022 8:56:16 AM

7 kudos

There are two ways:1) Please set in cluster (Clusters -> edit -> Spark -> Spark config):spark.databricks.delta.retentionDurationCheck.enabled false 2) or just before DeltaTable.forPath set (I think you need to change order in your code):spark.conf.se...

7 kudos

03-15-2022 8:56:16 AM

2 More Replies

by AmanSehgal • Honored Contributor III

02-10-2022 12:30:25 AM

6688 Views
5 replies
12 kudos

Resolved! Query delta tables using databricks cluster in near real time.

I'm trying to query delta tables using JDBC connector in a Ruby app. I've noticed that it takes around 8 seconds just to connect with databricks cluster and then additional time to run the query.The app is connected to a web portal where users genera...

Data Engineering

6688 Views
5 replies
12 kudos

02-10-2022 12:30:25 AM

View Replies

Latest Reply

User16763506477
Databricks Employee

03-15-2022 3:14:48 AM

12 kudos

Hi @Aman Sehgal Could you please check SQL endpoints? SQL endpoint uses a photon engine. It can reduce the query processing time. And Serverless SQL endpoint can accelerate the launch timemore info: https://docs.databricks.com/sql/admin/sql-endpoin...

12 kudos

03-15-2022 3:14:48 AM

4 More Replies

by zayeem • New Contributor

03-16-2022 10:51:15 AM

4013 Views
1 replies
3 kudos

Resolved! Databricks - Jobs Last run date

Is there a way to get the last run date of job(s) ? I am trying to compile a report and trying to see if this output exists either in databricks jobs cli output or via api?

Data Engineering

4013 Views
1 replies
3 kudos

03-16-2022 10:51:15 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

03-16-2022 3:03:04 PM

3 kudos

Sure. Using Databricks jobs API you can get this information.Use the following API endpoint to get list of all the jobs and their executions till date in descending order.You can pass job_id as parameter to get runs of a specific job.https://<databri...

3 kudos

03-16-2022 3:03:04 PM

by Anonymous • Not applicable

03-16-2022 11:03:55 AM

1319 Views
0 replies
3 kudos

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags &#xd83c;&#xdf89; ...

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags Databrags are glimpses into how Bricksters and community folks like you use data to solve everyday problems, e...

Data Engineering

1319 Views
0 replies
3 kudos

03-16-2022 11:03:55 AM

by Abel_Martinez • Contributor

03-11-2022 3:06:29 AM

2975 Views
1 replies
1 kudos

Resolved! Create data bricks service account

Hi all, I need to create service account users who can only query some delta tables. I guess I do that by creating the user and granting select right to the desired tables. But Data bricks requests a mail account for these users. Is there a way to cr...

Data Engineering

2975 Views
1 replies
1 kudos

03-11-2022 3:06:29 AM

View Replies

Latest Reply

Abel_Martinez
Contributor

03-16-2022 8:16:00 AM

1 kudos

HI @Kaniz Fatma , I've checked the link but the standard method requires a mailbox and the user creation using SCIM API looks too complicated. I solved the issue, I created a mailbox for the service account and I created the user using that mailbox....

1 kudos

03-16-2022 8:16:00 AM

by lecardozo • New Contributor II

02-17-2022 7:22:57 AM

7622 Views
5 replies
1 kudos

Resolved! Problems with HiveMetastoreClient and internal Databricks Metastore.

I've been trying to use the HiveMetastoreClient class in Scala to extract some metadata from Databricks internal Metastore, without success. I'm currently using the 7.3 LTS runtime.The error seems to be related to some kind of inconsistency between...

Data Engineering

7622 Views
5 replies
1 kudos

02-17-2022 7:22:57 AM

View Replies

Latest Reply

lecardozo
New Contributor II

03-04-2022 9:28:58 AM

1 kudos

Thanks for the reference, @Atanu Sarkar .Seems a little odd to me that I'd need to change the internal Databricks Metastore table to add a column expected by the client default Scala client. I'm afraid this could cause issues with other users/jobs ...

1 kudos

03-04-2022 9:28:58 AM

4 More Replies

by irfanaziz • Contributor II

02-08-2022 6:51:27 AM

9898 Views
4 replies
0 kudos

Resolved! If two Data Factory pipelines are run at the same time or share a window of execution do they share the Databricks spark cluster(if both have the same linked service)? ( job clusters are those that are create on the go, defined in the linked service).

Continuing the above case, does that mean if i have several like 5 ADF pipelines scheduled regularly at the same time, its better to use an existing cluster as all of the ADF pipelines would share the same cluster and hence the cost will be lower?

Data Engineering

9898 Views
4 replies
0 kudos

02-08-2022 6:51:27 AM

View Replies

Latest Reply

Atanu
Databricks Employee

03-15-2022 10:03:59 PM

0 kudos

for adf or job run we always prefer job cluster. but for streaming, you may consider using interactive cluster . but anyway you need to monitor the cluster load, if loads are high there will be chance to job slowness as well as failure. also data siz...

0 kudos

03-15-2022 10:03:59 PM

3 More Replies

Databricks Community

Forum Posts

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

Resolved! Restricting file upload to DBFS

Resolved! Insert data into an on-premise SQL Server

Resolved! Enable custom Ipython Extension

Resolved! Trigger.AvailableNow on scala - compile issue

Resolved! How to migrate an existing workspace for an external metastore

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed ...

Resolved! How to convert lat/long to geohash in databricks using geohashTools R library

Resolved! How to set retention period for a delta table lower than the default period? Is it even possible?

Resolved! Query delta tables using databricks cluster in near real time.

Resolved! Databricks - Jobs Last run date

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags &#xd83c;&#xdf89; ...

Resolved! Create data bricks service account

Resolved! Problems with HiveMetastoreClient and internal Databricks Metastore.

Resolved! If two Data Factory pipelines are run at the same time or share a window of execution do they share the Databricks spark cluster(if both have the same linked service)? ( job clusters are those that are create on the go, defined in the linked service).

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template