Data Engineering

Forum Posts

Sorted by:

by drag7ter • New Contributor III

04-11-2024 2:18:44 PM

1080 Views
2 replies
0 kudos

Resolved! How to enable CDF when saveAsTable from pyspark code?

I'm running this code in databricks notebook and I want the table from dataframe in catalog were created with CDF enables. When I run the code table hasn't exited yet.This code doesn't create a table with enables CDF. It doesn't add:delta.enableChang...

Data Engineering

1080 Views
2 replies
0 kudos

04-11-2024 2:18:44 PM

View Replies

Latest Reply

raphaelblg
Contributor III

04-12-2024 8:41:00 AM

0 kudos

Hello @drag7ter ,I don't see anything wrong with your approach, check my repro:

0 kudos

04-12-2024 8:41:00 AM

1 More Replies

by nyehia • Contributor

04-20-2023 9:14:40 AM

3457 Views
9 replies
0 kudos

Can not access a sql file from Notebook

Hey,I have a repo of notebooks and SQL files, the typical way is to update/create notebooks in the repo then push it and CICD pipeline deploys the notebooks to the Shared workspace.the issue is that I can access the SQL files in the Repo but can not ...

Data Engineering

3457 Views
9 replies
0 kudos

04-20-2023 9:14:40 AM

View Replies

Latest Reply

ok_1
New Contributor II

04-13-2024 3:15:00 AM

0 kudos

0 kudos

04-13-2024 3:15:00 AM

8 More Replies

by Deepikamani • New Contributor

04-11-2024 1:13:46 PM

2800 Views
1 replies
0 kudos

Exam vochure

Hii I am planning to take Databricks certified data engineer assosciate certification. where can i get the exam vochure.

Data Engineering

2800 Views
1 replies
0 kudos

04-11-2024 1:13:46 PM

View Replies

Latest Reply

TPSteve
New Contributor II

04-12-2024 8:13:34 AM

0 kudos

The Help Center provides an additional forum for this topic. You can request a voucher by submitting a help request, however, vouchers are not provided in all cases. Other ways to obtain a voucher are participation in training events held throughout ...

0 kudos

04-12-2024 8:13:34 AM

by cszczotka • New Contributor III

04-11-2024 5:17:35 AM

648 Views
3 replies
0 kudos

Not able to create table shallow clone on DBR 15.0

Hi,I'm getting below error when I'm trying to create table shallow clone on my DBR 15.0.[CANNOT_SHALLOW_CLONE_NON_UC_MANAGED_TABLE_AS_SOURCE_OR_TARGET] Shallow clone is only supported for the MANAGED table type. The table xxx_clone is not MANAGED tab...

Data Engineering

648 Views
3 replies
0 kudos

04-11-2024 5:17:35 AM

View Replies

Latest Reply

cszczotka
New Contributor III

04-12-2024 5:00:56 AM

0 kudos

Hi,Source table is external table in UC and result table should be also external. I'm running such command CREATE TABLE target_catalog.target_schema.table_clone SHALLOW CLONE source_catalog.source_schema.source_table but this for some reason doesn't...

0 kudos

04-12-2024 5:00:56 AM

2 More Replies

by hanspetter • New Contributor III

08-02-2017 12:26:46 AM

39620 Views
19 replies
4 kudos

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

When running a notebook using dbutils.notebook.run from a master-notebook, an url to that running notebook is printed, i.e.: Notebook job #223150 Notebook job #223151 Are there any ways to capture that Job Run ID (#223150 or #223151)? We have 50 or ...

Data Engineering

39620 Views
19 replies
4 kudos

08-02-2017 12:26:46 AM

View Replies

Latest Reply

Rodrigo_Mohr
New Contributor II

04-11-2024 12:49:47 PM

4 kudos

I know this is an old thread, but sharing what is working for me well in Python now, for retrieving the run_id as well and building the entire link to that job run:job_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().jobId().get...

4 kudos

04-11-2024 12:49:47 PM

18 More Replies

by Sandesh87 • New Contributor III

05-27-2022 12:39:10 PM

3039 Views
3 replies
2 kudos

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

I have a getS3Object function to get (json) objects located in aws s3 object client_connect extends Serializable { val s3_get_path = "/dbfs/mnt/s3response" def getS3Objects(s3ObjectName: String, s3Client: AmazonS3): String = { val...

Data Engineering

3039 Views
3 replies
2 kudos

05-27-2022 12:39:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-28-2022 10:13:48 AM

2 kudos

Hey there @Sandesh Puligundla Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

2 kudos

07-28-2022 10:13:48 AM

2 More Replies

by surband • New Contributor III

04-10-2024 11:57:02 AM

1516 Views
7 replies
1 kudos

Resolved! Failures Streaming data to Pulsar

I am encountering the following exception when attempting to stream data to a pulsar topic. This is a first time implementation - any ideas to resolve this is greatly appreciated.DBR: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)1 Driver64 GB...

Data Engineering

1516 Views
7 replies
1 kudos

04-10-2024 11:57:02 AM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

04-10-2024 12:12:00 PM

1 kudos

Hi @surband - can you please share the full error stack trace. Also, please use the compatible DBR(Spark) version instead of ML runtime. Please refer to the below document and validate if you have the necessary connector libraries added to the clust...

1 kudos

04-10-2024 12:12:00 PM

6 More Replies

by Jennifer • New Contributor III

04-11-2024 12:25:05 AM

260 Views
1 replies
0 kudos

Support for dataskipping for type TimestampNTZ

More people begin to use TimestampNTZ as cluster key.According to the thread here Unsupported datatype 'TimestampNTZType' with liquid clustering , optimization is not supported yet. We use this type as cluster key in Production already and can't opti...

Data Engineering

260 Views
1 replies
0 kudos

04-11-2024 12:25:05 AM

View Replies

Latest Reply

Jennifer
New Contributor III

04-11-2024 12:50:54 AM

0 kudos

Also, does it mean that even I specify a column of type TimestampNTZ in the clustering key, it is not clustered by this column?

0 kudos

04-11-2024 12:50:54 AM

by anonymous_567 • New Contributor II

04-10-2024 10:40:44 AM

452 Views
1 replies
0 kudos

Autoloader ingestion same top level directory different files corresponding to different tables

Hello, Currently I have files landing in a storage account. They are all located in subfolders of a common directory. Some subdirectories may contain files, others may not. Each file name is unique and corresponds to a unique table as well. No two fi...

Data Engineering

452 Views
1 replies
0 kudos

04-10-2024 10:40:44 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

04-10-2024 11:44:00 PM

0 kudos

Read all the files using auto loader and add an additional column as follows:.withColumn("filePath",input_file_name())Now that you've file name, you can split the data frame as per your requirement and ingest data into different tables.

0 kudos

04-10-2024 11:44:00 PM

by InTimetec • New Contributor II

04-04-2024 10:43:44 PM

1095 Views
5 replies
1 kudos

Unable to connect mongo with Databricks

Hello,I am trying to connect mongo with Databricks. I also used SSL certificate.I created my own cluster and installed maven library org.mongodb.spark:mongo-spark-connector_2.12:3.0.1.This is my code: connection_string =f"mongodb://{secret['user']}:{...

Data Engineering

1095 Views
5 replies
1 kudos

04-04-2024 10:43:44 PM

View Replies

Latest Reply

InTimetec
New Contributor II

04-07-2024 9:34:05 PM

1 kudos

@Kaniz_Fatma I updated my code as below: df = spark.read.format("com.mongodb.spark.sql.DefaultSource")\ .option("database", database)\ .option("collection", collection)\ .option("spark.mongodb.input.uri", connectionString)\ ...

1 kudos

04-07-2024 9:34:05 PM

4 More Replies

by arunak • New Contributor

04-09-2024 5:50:37 PM

743 Views
1 replies
0 kudos

Connecting to Serverless Redshift from a Databricks Notebook

Hello Experts, A new databricks user here. I am trying to access an Redshift serverless table using a databricks notebook. Here is what happens when I try the below code, df = spark.read.format("redshift")\.option("dbtable", "public.customer")\.opti...

Data Engineering

743 Views
1 replies
0 kudos

04-09-2024 5:50:37 PM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

04-10-2024 1:10:24 PM

0 kudos

@arunak - we need to specify forward_spark_s3_credentials to true during read. This will help spark detect the credentials used to authenticate to the S3 bucket and use these credentials to r read from redshift.

0 kudos

04-10-2024 1:10:24 PM

by mh_db • New Contributor III

04-10-2024 10:46:45 AM

1271 Views
1 replies
0 kudos

Write to csv file in S3 bucket

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save itimport boto3import s3fsdf_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)but I keep getting thi...

Data Engineering

1271 Views
1 replies
0 kudos

04-10-2024 10:46:45 AM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

04-10-2024 11:56:58 AM

0 kudos

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you ...

0 kudos

04-10-2024 11:56:58 AM

by juanc • New Contributor II

11-11-2021 7:17:21 PM

3383 Views
9 replies
2 kudos

Activate spark extensions on SQL Endpoints

It would be possible to activate a custom extensions like Sedona (https://sedona.apache.org/download/databricks/ ) in SQL Endopoints?Example error:java.lang.ClassNotFoundException: org.apache.spark.sql.sedona_sql.UDT.GeometryUDT at org.apache.spark....

Data Engineering

3383 Views
9 replies
2 kudos

11-11-2021 7:17:21 PM

View Replies

Latest Reply

naveenanto
New Contributor III

04-10-2024 9:42:45 AM

2 kudos

@Kaniz_Fatma What is the right way to add custom spark extension to sql warehouse clusters?

2 kudos

04-10-2024 9:42:45 AM

8 More Replies

by marcuskw • Contributor

04-10-2024 6:48:21 AM

4885 Views
1 replies
0 kudos

Resolved! Lakehouse Federation for SQL Server and Security Policy

We've been able to setup a Foreign Catalog using the following documentation:https://learn.microsoft.com/en-us/azure/databricks/query-federation/sql-serverHowever the tables that have RLS using a Security Policy appear empty. I imagine that this solu...

Data Engineering

4885 Views
1 replies
0 kudos

04-10-2024 6:48:21 AM

View Replies

Latest Reply

marcuskw
Contributor

04-10-2024 7:20:19 AM

0 kudos

Was a bit quick here, found out that the SUSER_NAME() of the query is of course the connection that was setup.So the User/Password defined here:Once I added that same user to the RLS logic I get the correct result.

0 kudos

04-10-2024 7:20:19 AM

by 64883 • New Contributor

11-30-2022 1:59:51 AM

581 Views
1 replies
0 kudos

Support for Delta tables multicluster writes in Databricks cluster

Hello, We're using Databricks on AWS and we've recently started using Delta tables. We're using R.While the code below[1] works in a notebook, when running it from RStudio on a Databricks cluster we get the following error: java.lang.IllegalStateExce...

Data Engineering

581 Views
1 replies
0 kudos

11-30-2022 1:59:51 AM

View Replies

Latest Reply

NandiniN
Honored Contributor

04-10-2024 3:21:53 AM

0 kudos

Sorry, for being very late here - If you can not use multi write to false, we can try to split this table into separate tables for each stream.

0 kudos

04-10-2024 3:21:53 AM

User

Count

1602

738

348

285

247

Databricks Community

Forum Posts

Resolved! How to enable CDF when saveAsTable from pyspark code?

Can not access a sql file from Notebook

Exam vochure

Not able to create table shallow clone on DBR 15.0

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

Resolved! Failures Streaming data to Pulsar

Support for dataskipping for type TimestampNTZ

Autoloader ingestion same top level directory different files corresponding to different tables

Unable to connect mongo with Databricks

Connecting to Serverless Redshift from a Databricks Notebook

Write to csv file in S3 bucket

Activate spark extensions on SQL Endpoints

Resolved! Lakehouse Federation for SQL Server and Security Policy

Support for Delta tables multicluster writes in Databricks cluster

Databricks with Private cloud

Pyspark serialization

Getting com.databricks.client.jdbc.Driver is not f...

Unit Testing DLT Pipelines

Retrieve job-level parameters in spark_python_task...