Data Engineering

Forum Posts

Sorted by:

Start a conversation

by db-avengers2rul • Contributor II

11-29-2022 5:18:27 AM

1075 Views
1 replies
0 kudos

Resolved! zip file not able to import in workspace

Dear Team,Using the community edition when i tried to import a zip file it is always throwing some error

Data Engineering

1075 Views
1 replies
0 kudos

11-29-2022 5:18:27 AM

View Replies

Latest Reply

db-avengers2rul
Contributor II

11-29-2022 5:19:52 AM

0 kudos

Please refer to the error in the attachment my question is this restriction is only for community edition ? or also for premium account ?

0 kudos

11-29-2022 5:19:52 AM

by yang • New Contributor II

11-29-2022 4:16:03 AM

798 Views
1 replies
2 kudos

Resolved! Error in DE 4.1 - DLT UI Walkthrough (from Data Engineering with Databricks v3 course)

I am working on Data Engineering with Databricks v3 course. In notebook DE 4.1 - DLT UI Walkthrough, I countered an error in cmd 11: DA.validate_pipeline_config(pipeline_language)The error message is: AssertionError: Expected the parameter "suite" to...

Data Engineering

798 Views
1 replies
2 kudos

11-29-2022 4:16:03 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-29-2022 5:15:19 AM

2 kudos

The DA validate function is just to check that you named the pipeline correctly, set up the correct number of workers, 0, and other configurations. The name and directory aren't crucial to the learning process. The goal is to get familiar with the ...

2 kudos

11-29-2022 5:15:19 AM

by eimis_pacheco • Contributor

02-16-2022 8:49:28 PM

740 Views
1 replies
1 kudos

How to remove more than 4 byte characters using pyspark in databricks?

Hi community,We have the need of removing more than 4 byte characters using pyspark in databricks since these are not supported by amazon Redshift. Does someone know how can I accomplish this?Thank you very much in advanceRegards

Data Engineering

740 Views
1 replies
1 kudos

02-16-2022 8:49:28 PM

View Replies

Latest Reply

Shalabh007
Honored Contributor

11-29-2022 3:00:30 AM

1 kudos

assuming you are having a string type column in pyspark dataframe, one possible way could beidentify total number of characters for each value in column (say identify no of bytes taken by each character (say b)use substring() function to select first...

1 kudos

11-29-2022 3:00:30 AM

by Ullsokk • New Contributor III

11-29-2022 1:06:06 AM

2062 Views
1 replies
5 kudos

How do I import a notebook from workspaces to repos?

I have a few notebooks in workspaces that I created before linking repo to my git. I have tried importing them from the repo (databricks repo). The only two options are a local file from my pc or a url. The url for a notebook does not work. Do I need...

Data Engineering

2062 Views
1 replies
5 kudos

11-29-2022 1:06:06 AM

View Replies

Latest Reply

Geeta1
Valued Contributor

11-29-2022 2:55:19 AM

5 kudos

Hi @Stian Arntsen , when you click on the down arrow beside your notebook name (in your workspace), you will have a option called 'clone'. You can use it to clone your notebook from your workspace to repos. Hope it helps!

5 kudos

11-29-2022 2:55:19 AM

by hare • New Contributor III

05-15-2022 11:13:34 PM

7351 Views
1 replies
1 kudos

Failed to merge incompatible data types

We are processing the josn file from the storage location on every day and it will get archived once the records are appended into the respective tables.source_location_path: "..../mon=05/day=01/fld1" , "..../mon=05/day=01/fld2" ..... "..../mon=05/d...

Data Engineering

7351 Views
1 replies
1 kudos

05-15-2022 11:13:34 PM

View Replies

Latest Reply

Shalabh007
Honored Contributor

11-29-2022 2:48:55 AM

1 kudos

@Hare Krishnan the issues highlighted can easily be handled using the .option("mergeSchema", "true") at the time of reading all the files.Sample code:spark.read.option("mergeSchema", "true").json(<file paths>, multiLine=True)The only scenario this w...

1 kudos

11-29-2022 2:48:55 AM

by Magnus • Contributor

11-23-2022 5:57:05 AM

1561 Views
3 replies
10 kudos

Resolved! How to retrieve Auto Loader client secret from Azure Key Vault?

I'm using Auto Loader in a SQL notebook and I would like to configure file notification mode, but I don't know how to retrieve the client secret of the service principal from Azure Key Vault. Is there any example notebook somewhere? The notebook is p...

Data Engineering

1561 Views
3 replies
10 kudos

11-23-2022 5:57:05 AM

View Replies

Latest Reply

Geeta1
Valued Contributor

11-24-2022 7:43:18 AM

10 kudos

Hi @Magnus Johannesson , you must use the Secrets utility (dbutils.secrets) in a notebook or job to read a secret.https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils#dbutils-secretsHope it helps!

10 kudos

11-24-2022 7:43:18 AM

2 More Replies

by Priyanka48 • New Contributor III

11-27-2022 9:01:27 PM

6427 Views
5 replies
10 kudos

The functionality of table property delta.logRetentionDuration

We have one project requirement where we have to store only the 14 days history for delta tables. So for testing, I have set the delta.logRetentionDuration = 2 days using the below commandspark.sql("alter table delta.`[delta_file_path]` set TBLPROPER...

Data Engineering

6427 Views
5 replies
10 kudos

11-27-2022 9:01:27 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-29-2022 1:41:15 AM

10 kudos

Hi @Priyanka Mane, We haven’t heard from you since the last response from @Werner Stinckens and @Uma Maheswara Rao Desula, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with the c...

10 kudos

11-29-2022 1:41:15 AM

4 More Replies

by andrew0117 • Contributor

11-26-2022 7:50:51 PM

1403 Views
5 replies
9 kudos

Resolved! How to call a few child notebooks from master notebook parallelly?

Planning using dbutils.notebook.run() to call all the child notebooks in the master notebook, but they are executed sequentially.

Data Engineering

1403 Views
5 replies
9 kudos

11-26-2022 7:50:51 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-29-2022 1:00:02 AM

9 kudos

Hi @andrew li, We haven’t heard from you since the last response from @Uma Maheswara Rao Desula, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpf...

9 kudos

11-29-2022 1:00:02 AM

4 More Replies

by lawrence009 • Contributor

11-27-2022 2:08:30 AM

1091 Views
3 replies
2 kudos

FutureWarning: ``databricks.feature_store.entities.feature_table.FeatureTable.keys`` is deprecated since v0.3.6

I'm getting this message with the following code:from databricks import feature_store fs = feature_store.FeatureStoreClient() fs.create_table( name='feature_store.user_login', primary_keys=['user_id'], df=df_x, description='user l...

Data Engineering

1091 Views
3 replies
2 kudos

11-27-2022 2:08:30 AM

View Replies

Latest Reply

DavideAnghileri
Contributor

11-29-2022 12:45:34 AM

2 kudos

Yes, it's a nice thing to do. You can report it here: https://community.databricks.com/s/topic/0TO3f000000CnKrGAK/bug-report and if it's more urgent or blocking for you, you can also open a ticket to the help center: https://docs.databricks.com/resou...

2 kudos

11-29-2022 12:45:34 AM

2 More Replies

by lohic55322 • New Contributor

11-28-2022 10:13:39 PM

222 Views
0 replies
0 kudos

patientportaltvobgyn

Data Engineering

222 Views
0 replies
0 kudos

11-28-2022 10:13:39 PM

by dragonH • New Contributor

11-28-2022 6:54:42 PM

706 Views
0 replies
0 kudos

The CDC Logs from AWS DMS not apply correctly

I have a dms task that processing the full-load and replication ongoing tasksfrom source (MSSQL) to target (AWS S3)then use delta lake to handle the CDC logsI've a notebook that would insert data into mssql continuously (with id as primary key)then d...

204293406-01bf6cc1-bb6f-42bb-9bfe-e9b1f5135ae9[1]

Data Engineering

706 Views
0 replies
0 kudos

11-28-2022 6:54:42 PM

by apayne • New Contributor III

11-23-2022 9:32:31 AM

1497 Views
1 replies
4 kudos

Databricks Jobs API not returning notebook run results?

Calling a databricks notebook using the Rest API, can confirm that it is executing the notebook, but is not accepting my parameters or returning a notebook output. Any ideas on what I am doing wrong here?My code and notebook function are below, tryin...

Data Engineering

1497 Views
1 replies
4 kudos

11-23-2022 9:32:31 AM

View Replies

Latest Reply

apayne
New Contributor III

11-28-2022 11:30:59 AM

4 kudos

Resolved this by using dbutils within the notebook being called from the API.# databricks notebook function data = dbutils.widgets.get('data') # pulls base_parameters from API call def add_test(i): result = i + ' COMPLETE' return result ...

4 kudos

11-28-2022 11:30:59 AM

by Swapnil1998 • New Contributor III

11-28-2022 9:01:31 AM

512 Views
0 replies
0 kudos

How to query a MySQL Table from Databricks?

I wanted to query a MySQL Table using Databricks rather than reading the complete data using a dbtable option, which will help in incremental loads.remote_table = (spark.read .format("jdbc") .option("driver", driver) .option("url", URL) .option("quer...

Data Engineering

512 Views
0 replies
0 kudos

11-28-2022 9:01:31 AM

by Harish14 • New Contributor III

11-27-2022 8:46:44 AM

825 Views
4 replies
1 kudos

Hi @Kaniz Fatma and @Nadia Elsayed , i have taken databricks data engineer associate exam on nov 27th . in result mail it is mentioned i have obtain...

Hi @Kaniz Fatma and @Nadia Elsayed ,i have taken databricks data engineer associate exam on nov 27th . in result mail it is mentioned i have obtained below 70% in assessment but as per the section wise results i have gained more than 70% . Can you ...

Data Engineering

825 Views
4 replies
1 kudos

11-27-2022 8:46:44 AM

View Replies

Latest Reply

Nadia1
Honored Contributor

11-28-2022 7:18:59 AM

1 kudos

Hello Harish - I have responded via email. Thank you

1 kudos

11-28-2022 7:18:59 AM

3 More Replies

by Bartek • Contributor

11-26-2022 3:11:34 PM

2601 Views
3 replies
7 kudos

Resolved! Number of partitions in Spark UI Simulator experiment

I am learning how to optimize Spark applications with experiments from Spark UI Simulator. There is experiment #1596 about data skew and in command 2 there is comment about how many partitions will be set as default:// Factor of 8 cores and greater ...

Data Engineering

2601 Views
3 replies
7 kudos

11-26-2022 3:11:34 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-27-2022 10:20:28 AM

7 kudos

Hi @Bartosz Maciejewski Generally we arrive at the number of shuffle partitions using the following method.Input Size Data - 100 GBIdeal partition target size - 128 MBCores - 8Ideal number of partitions = (100*1028)/128 = 803.25 ~ 804To utiltize the...

7 kudos

11-27-2022 10:20:28 AM

2 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Resolved! zip file not able to import in workspace

Resolved! Error in DE 4.1 - DLT UI Walkthrough (from Data Engineering with Databricks v3 course)

How to remove more than 4 byte characters using pyspark in databricks?

How do I import a notebook from workspaces to repos?

Failed to merge incompatible data types

Resolved! How to retrieve Auto Loader client secret from Azure Key Vault?

The functionality of table property delta.logRetentionDuration

Resolved! How to call a few child notebooks from master notebook parallelly?

FutureWarning: ``databricks.feature_store.entities.feature_table.FeatureTable.keys`` is deprecated since v0.3.6

patientportaltvobgyn

The CDC Logs from AWS DMS not apply correctly

Databricks Jobs API not returning notebook run results?

How to query a MySQL Table from Databricks?

Hi @Kaniz Fatma and @Nadia Elsayed , i have taken databricks data engineer associate exam on nov 27th . in result mail it is mentioned i have obtain...

Resolved! Number of partitions in Spark UI Simulator experiment

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...