Data Engineering

Forum Posts

Sorted by:

by NimaiAhl • New Contributor II

09-08-2022 10:24:11 PM

669 Views
1 replies
0 kudos

External Tables - SQL

To create external tables we need to use the location keyword and use the link for the storage location, in reference to that does the user need to have permission for the storage location if not then will we use storage credentials to provide the ac...

Data Engineering

669 Views
1 replies
0 kudos

09-08-2022 10:24:11 PM

View Replies

Latest Reply

Shikamaru
New Contributor II

04-14-2023 12:23:22 AM

0 kudos

Hi Nimai, That's partially right. You can grant permissions directly on the storage credential, but Databricks recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage cre...

0 kudos

04-14-2023 12:23:22 AM

by UmaMahesh1 • Honored Contributor III

04-11-2023 7:01:42 AM

1080 Views
1 replies
2 kudos

Checkpoint issue when loading data from confluent kafka

I have a streaming notebook which fetches messages from confluent Kafka topic and loads them into adls. It is a streaming notebook with the trigger as continuous processing. Before loading the message (which is in Avro format), I'm flattening out the...

Data Engineering

1080 Views
1 replies
2 kudos

04-11-2023 7:01:42 AM

View Replies

Latest Reply

Avinash_94
New Contributor III

04-14-2023 12:21:44 AM

2 kudos

Best approach is to not to depend on Kafka’s commit mechanism! We can store processing result and message offset to external data store in the same database transaction. So, if the database transaction fails, both commit and processing will fail and ...

2 kudos

04-14-2023 12:21:44 AM

by Himanshu1 • New Contributor II

08-21-2022 11:05:34 PM

1478 Views
1 replies
3 kudos

How to read XML files in delta live tables?

Even after maven library installation using the Auto installation.spark.read.option("rowTag", "tag").xml("dbfs:/mnt/dev/bronze/xml/fileName.xml")not working.

Data Engineering

1478 Views
1 replies
3 kudos

08-21-2022 11:05:34 PM

View Replies

Latest Reply

DD_Sharma
New Contributor III

04-14-2023 12:20:03 AM

3 kudos

At present DLT does not support installing the maven library from the DLT pipeline. In the future this feature will come for sure so please wait for some time and keep checking data bricks runtime release docs https://docs.databricks.com/release-note...

3 kudos

04-14-2023 12:20:03 AM

by samruddhi • New Contributor

06-29-2022 7:46:08 AM

840 Views
1 replies
0 kudos

Issue while creating Workspace in databricks using AWS

I am trying to configure databricks with AWS, I have configured the cloud resources as described in this https://docs.databricks.com/administration-guide/account-api/iam-role.html#language-Databricks%C2%A0VPC I have selected "Your VPC Default" as the...

Data Engineering

840 Views
1 replies
0 kudos

06-29-2022 7:46:08 AM

View Replies

Latest Reply

Abishek
Valued Contributor

04-14-2023 12:19:39 AM

0 kudos

@samruddhi ChitnisCan you please check the below troubleshooting guide : Credentials configuration error messages: Malformed request: Failed credential configuration validation checksThe list of permissions checks in the error message indicate the li...

0 kudos

04-14-2023 12:19:39 AM

by 183530 • New Contributor III

08-18-2022 7:32:06 PM

649 Views
1 replies
0 kudos

How to search an array of words in a text field

Example:TABLE 1FIELD_TEXTI like salty food and Italian foodI have Italian foodbread, rice and beansmexican foodscoke, spritearray['italia', 'mex','coke']match TABLE1 X ARRAYResults:I like salty food and Italian foodI have Italian foodmexican foodsis ...

Data Engineering

649 Views
1 replies
0 kudos

08-18-2022 7:32:06 PM

View Replies

Latest Reply

User16756723392
New Contributor III

04-14-2023 12:17:07 AM

0 kudos

A simple wayselect FIELD_TEXT from TABLE 1 where FIELD_TEXT like 'italia' OR FIELD_TEXT like 'mex' OR FIELD_TEXT like 'coke'

0 kudos

04-14-2023 12:17:07 AM

by sajith_appukutt • Honored Contributor II

06-09-2021 12:36:37 AM

1380 Views
2 replies
1 kudos

Resolved! How can I configure S3 Client-Side Encryption (CSE-KMS ) for my data pipeline

Data Engineering

1380 Views
2 replies
1 kudos

06-09-2021 12:36:37 AM

View Replies

Latest Reply

AdrianRojas
New Contributor II

04-13-2023 4:35:26 PM

1 kudos

a bit old, but I just faced the same issue, specifying a custom EncryptionMaterialsProvider (as described in the previous post) did the trick for me but I did had to also specify my kms endpoint, just because my region:"fs.s3.cse.kms.endpoint" -> "km...

1 kudos

04-13-2023 4:35:26 PM

1 More Replies

by Samit110978 • New Contributor II

12-16-2022 7:38:15 AM

1141 Views
3 replies
1 kudos

Passing Parameter from SSRS to Databricks user defined function

I am trying to pass parameter from SSRS to User Defined Function in Databricks which in turn will return table that will be shown as output in report.I tried below calling function from SSRS, but it looks like parameter value is not passed. I have di...

Data Engineering

1141 Views
3 replies
1 kudos

12-16-2022 7:38:15 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-17-2022 10:48:32 PM

1 kudos

can you share full code and dataset by that we can also debug this

1 kudos

12-17-2022 10:48:32 PM

2 More Replies

by Raghu_Bindingan • New Contributor III

04-12-2023 7:18:17 AM

2574 Views
2 replies
0 kudos

Resolved! SQL Merge Statement not working

Hi I am trying to use the SQL Merge statement on databricksMERGE INTO targetUSING sourceON source.key = target.keyWHEN MATCHED UPDATE SET *WHEN NOT MATCHED INSERT *WHEN NOT MATCHED BY SOURCE DELETEThis is failing with the error [PARSE_SYNTAX_ERROR...

Data Engineering

2574 Views
2 replies
0 kudos

04-12-2023 7:18:17 AM

View Replies

Latest Reply

Raghu_Bindingan
New Contributor III

04-13-2023 11:11:03 AM

0 kudos

I was missing the THEN before UPDATE, INSERT and DELETE. This keyword is missing from the documentation on Databricks https://learn.microsoft.com/en-us/azure/databricks/delta/mergeIt now works. Thanks

0 kudos

04-13-2023 11:11:03 AM

1 More Replies

by Rexton • New Contributor

05-26-2022 7:22:11 AM

3047 Views
5 replies
3 kudos

AWS Databricks Pyspark - Unable to connect to Azure MySQL - Shows "SSL Connection is required"

Even after specifying SSL options, unable to connect to MySQL. What could have gone wrong? Could anyone experience similar issues? df_target_master = spark.read.format("jdbc")\.option("driver", "com.mysql.jdbc.Driver")\.option("url", host_url)\.optio...

Data Engineering

3047 Views
5 replies
3 kudos

05-26-2022 7:22:11 AM

View Replies

Latest Reply

a2barbosa
New Contributor II

04-13-2023 7:34:56 AM

3 kudos

Hey,Here the solution: The correct option for ssl is "useSSL" and not just "ssl".This code below could works:df_target_master = spark.read.format("jdbc")\.option("driver", "com.mysql.jdbc.Driver")\.option("url", host_url)\.option("dbtable", supply_ma...

3 kudos

04-13-2023 7:34:56 AM

4 More Replies

by Punnu • New Contributor

04-11-2023 1:13:56 PM

993 Views
1 replies
0 kudos

Error while running spark.catalog.listDatabases()

I am running steps mentioned in https://github.com/databrickslabs/splunk-integration/blob/master/notebooks/source/push_to_splunk.pyWhen I am running spark.catalog.listDatabases()getting error py4j.security.Py4JSecurityException: Method public java.l...

Data Engineering

993 Views
1 replies
0 kudos

04-11-2023 1:13:56 PM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-13-2023 5:37:00 AM

0 kudos

Hi @Purnima Bhatia , I faced a similar error for a different command when I was using a wrong type of cluster access mode. You can try to create a different cluster with different access mode and check. I might be wrong but try and check this.

0 kudos

04-13-2023 5:37:00 AM

by Manju1202 • New Contributor II

04-11-2023 2:57:19 PM

1181 Views
3 replies
1 kudos

Saving Number field as String in Databricks

Do we see any risk of saving a Number field as String? Will we use any functionality/feature if we save as String ? Will it have any impact on performance ?

Data Engineering

1181 Views
3 replies
1 kudos

04-11-2023 2:57:19 PM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-12-2023 1:08:15 AM

1 kudos

Hi @Manju Chugani. Yes. In Short, it is not really recommended to save the columns as string if all the values are expected to be numbers.Here are some of them Storage Space: Storing numbers as strings can take up more storage space than storing the...

1 kudos

04-12-2023 1:08:15 AM

2 More Replies

by nicole_wong • New Contributor II

09-22-2021 3:46:34 PM

5736 Views
13 replies
7 kudos

Resolved! Can Terraform be used to set configurations in Admin / workspace settings?

I am posting this on behalf of my customer. They are currently working on the deployment & config of their workspace on AWS via Terraform.Is it possible to set some configs in the Admin/workspace settings via TF? According to the Terraform module, it...

Data Engineering

5736 Views
13 replies
7 kudos

09-22-2021 3:46:34 PM

View Replies

Latest Reply

francly
New Contributor II

08-10-2022 10:13:53 PM

7 kudos

Hi, can I get a full list of the latest configurable supported workspace_conf on tf, I can't find the list on tf registry site.

7 kudos

08-10-2022 10:13:53 PM

12 More Replies

by RateVan • New Contributor II

04-01-2023 4:31:49 AM

1224 Views
3 replies
0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

Data Engineering

1224 Views
3 replies
0 kudos

04-01-2023 4:31:49 AM

View Replies

Latest Reply

RateVan
New Contributor II

04-13-2023 3:37:43 AM

0 kudos

No, the problem remains the same. The meaning doesn't change because you increased the timeout a little bit. As the window did not close, and does not close until a new message arrives

0 kudos

04-13-2023 3:37:43 AM

2 More Replies

by johnb1 • New Contributor III

03-30-2023 1:28:30 AM

1286 Views
3 replies
0 kudos

Cluster Configuration for ML Model Training

Hi!I am training a Random Forest (pyspark.ml.classification.RandomForestClassifier) on Databricks with 1,000,000 training examples and 25 features. I employ a cluster with one driver (16 GB Memory, 4 Cores), 2-6 workers (32-96 GB Memory, 8-24 Cores),...

Data Engineering

1286 Views
3 replies
0 kudos

03-30-2023 1:28:30 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:11:15 PM

0 kudos

Hi @John B Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can...

0 kudos

03-31-2023 7:11:15 PM

2 More Replies

by ALIDI • New Contributor II

04-04-2023 1:40:12 AM

936 Views
3 replies
3 kudos

training_set.load_df().toPandas() fails with the new pandas version (2.0.0)

pandas 2.0.0 was released on 4.3.2023 and was pushed to my cluster on the same day. The day after I tried using training_set.load_df().toPandas() and it failed. Reverting to pandas 1.5.3. fixed the problem.

Data Engineering

936 Views
3 replies
3 kudos

04-04-2023 1:40:12 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2023 12:54:20 AM

3 kudos

Hi @Al IDI Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

3 kudos

04-12-2023 12:54:20 AM

2 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

External Tables - SQL

Checkpoint issue when loading data from confluent kafka

How to read XML files in delta live tables?

Issue while creating Workspace in databricks using AWS

How to search an array of words in a text field

Resolved! How can I configure S3 Client-Side Encryption (CSE-KMS ) for my data pipeline

Passing Parameter from SSRS to Databricks user defined function

Resolved! SQL Merge Statement not working

AWS Databricks Pyspark - Unable to connect to Azure MySQL - Shows "SSL Connection is required"

Error while running spark.catalog.listDatabases()

Saving Number field as String in Databricks

Resolved! Can Terraform be used to set configurations in Admin / workspace settings?

Spark last window dont flush in append mode

Cluster Configuration for ML Model Training

training_set.load_df().toPandas() fails with the new pandas version (2.0.0)

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...