cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

drag7ter
by New Contributor III
  • 1080 Views
  • 2 replies
  • 0 kudos

Resolved! How to enable CDF when saveAsTable from pyspark code?

I'm running this code in databricks notebook and I want the table from dataframe in catalog were created with CDF enables. When I run the code table hasn't exited yet.This code doesn't create a table with enables CDF. It doesn't add:delta.enableChang...

  • 1080 Views
  • 2 replies
  • 0 kudos
Latest Reply
raphaelblg
Contributor III
  • 0 kudos

Hello @drag7ter ,I don't see anything wrong with your approach, check my repro:    

  • 0 kudos
1 More Replies
nyehia
by Contributor
  • 3457 Views
  • 9 replies
  • 0 kudos

Can not access a sql file from Notebook

Hey,I have a repo of notebooks and SQL files, the typical way is to update/create notebooks in the repo then push it and CICD pipeline deploys the notebooks to the Shared workspace.the issue is that I can access the SQL files in the Repo but can not ...

tempsnip
  • 3457 Views
  • 9 replies
  • 0 kudos
Latest Reply
ok_1
New Contributor II
  • 0 kudos

ok

  • 0 kudos
8 More Replies
Deepikamani
by New Contributor
  • 2800 Views
  • 1 replies
  • 0 kudos

Exam vochure

Hii I am planning to take Databricks certified data engineer assosciate certification. where can i get the exam vochure.

  • 2800 Views
  • 1 replies
  • 0 kudos
Latest Reply
TPSteve
New Contributor II
  • 0 kudos

The Help Center provides an additional forum for this topic. You can request a voucher by submitting a help request, however, vouchers are not provided in all cases. Other ways to obtain a voucher are participation in training events held throughout ...

  • 0 kudos
cszczotka
by New Contributor III
  • 648 Views
  • 3 replies
  • 0 kudos

Not able to create table shallow clone on DBR 15.0

Hi,I'm getting below error when I'm trying to create table shallow clone on my DBR 15.0.[CANNOT_SHALLOW_CLONE_NON_UC_MANAGED_TABLE_AS_SOURCE_OR_TARGET] Shallow clone is only supported for the MANAGED table type. The table xxx_clone is not MANAGED tab...

  • 648 Views
  • 3 replies
  • 0 kudos
Latest Reply
cszczotka
New Contributor III
  • 0 kudos

Hi,Source table is external table in UC and  result table should be also external. I'm running such command CREATE TABLE target_catalog.target_schema.table_clone SHALLOW CLONE source_catalog.source_schema.source_table but this for some reason doesn't...

  • 0 kudos
2 More Replies
hanspetter
by New Contributor III
  • 39620 Views
  • 19 replies
  • 4 kudos

Resolved! Is it possible to get Job Run ID of notebook run by dbutils.notbook.run?

When running a notebook using dbutils.notebook.run from a master-notebook, an url to that running notebook is printed, i.e.: Notebook job #223150 Notebook job #223151 Are there any ways to capture that Job Run ID (#223150 or #223151)? We have 50 or ...

  • 39620 Views
  • 19 replies
  • 4 kudos
Latest Reply
Rodrigo_Mohr
New Contributor II
  • 4 kudos

I know this is an old thread, but sharing what is working for me well in Python now, for retrieving the run_id as well and building the entire link to that job run:job_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().jobId().get...

  • 4 kudos
18 More Replies
Sandesh87
by New Contributor III
  • 3039 Views
  • 3 replies
  • 2 kudos

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

I have a getS3Object function to get (json) objects located in aws s3  object client_connect extends Serializable { val s3_get_path = "/dbfs/mnt/s3response" def getS3Objects(s3ObjectName: String, s3Client: AmazonS3): String = { val...

  • 3039 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Sandesh Puligundla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

  • 2 kudos
2 More Replies
surband
by New Contributor III
  • 1516 Views
  • 7 replies
  • 1 kudos

Resolved! Failures Streaming data to Pulsar

I am encountering the following exception when attempting to stream data to a pulsar topic. This is a first time implementation - any ideas to resolve this is greatly appreciated.DBR: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)1 Driver64 GB...

  • 1516 Views
  • 7 replies
  • 1 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 1 kudos

Hi @surband  - can you please share the full error stack trace. Also, please use the compatible DBR(Spark) version instead of ML runtime. Please refer to the below document and validate if you have the necessary connector libraries added to the clust...

  • 1 kudos
6 More Replies
Jennifer
by New Contributor III
  • 260 Views
  • 1 replies
  • 0 kudos

Support for dataskipping for type TimestampNTZ

More people begin to use TimestampNTZ as cluster key.According to the thread here Unsupported datatype 'TimestampNTZType' with liquid clustering , optimization is not supported yet. We use this type as cluster key in Production already and can't opti...

  • 260 Views
  • 1 replies
  • 0 kudos
Latest Reply
Jennifer
New Contributor III
  • 0 kudos

Also, does it mean that even I specify a column of type TimestampNTZ in the clustering key, it is not clustered by this column?

  • 0 kudos
anonymous_567
by New Contributor II
  • 452 Views
  • 1 replies
  • 0 kudos

Autoloader ingestion same top level directory different files corresponding to different tables

Hello, Currently I have files landing in a storage account. They are all located in subfolders of a common directory. Some subdirectories may contain files, others may not. Each file name is unique and corresponds to a unique table as well. No two fi...

  • 452 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Read all the files using auto loader and add an additional column as follows:.withColumn("filePath",input_file_name())Now that you've file name, you can split the data frame as per your requirement and ingest data into different tables.

  • 0 kudos
InTimetec
by New Contributor II
  • 1095 Views
  • 5 replies
  • 1 kudos

Unable to connect mongo with Databricks

Hello,I am trying to connect mongo with Databricks. I also used SSL certificate.I created my own cluster and installed maven library org.mongodb.spark:mongo-spark-connector_2.12:3.0.1.This is my code: connection_string =f"mongodb://{secret['user']}:{...

InTimetec_0-1712295715248.png
  • 1095 Views
  • 5 replies
  • 1 kudos
Latest Reply
InTimetec
New Contributor II
  • 1 kudos

@Kaniz_Fatma I updated my code as below: df = spark.read.format("com.mongodb.spark.sql.DefaultSource")\ .option("database", database)\ .option("collection", collection)\ .option("spark.mongodb.input.uri", connectionString)\ ...

  • 1 kudos
4 More Replies
arunak
by New Contributor
  • 743 Views
  • 1 replies
  • 0 kudos

Connecting to Serverless Redshift from a Databricks Notebook

Hello Experts, A new databricks user here. I am trying to access an Redshift serverless table using a databricks notebook. Here is what happens when I try the below code,  df = spark.read.format("redshift")\.option("dbtable", "public.customer")\.opti...

  • 743 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@arunak - we need to specify forward_spark_s3_credentials to true during read. This will help spark detect the credentials used to authenticate to the S3 bucket and use these credentials to r read from redshift.  

  • 0 kudos
mh_db
by New Contributor III
  • 1271 Views
  • 1 replies
  • 0 kudos

Write to csv file in S3 bucket

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save itimport boto3import s3fsdf_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)but I keep getting thi...

  • 1271 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you ...

  • 0 kudos
juanc
by New Contributor II
  • 3383 Views
  • 9 replies
  • 2 kudos

Activate spark extensions on SQL Endpoints

It would be possible to activate a custom extensions like Sedona (https://sedona.apache.org/download/databricks/ ) in SQL Endopoints?Example error:java.lang.ClassNotFoundException: org.apache.spark.sql.sedona_sql.UDT.GeometryUDT at org.apache.spark....

  • 3383 Views
  • 9 replies
  • 2 kudos
Latest Reply
naveenanto
New Contributor III
  • 2 kudos

@Kaniz_Fatma What is the right way to add custom spark extension to sql warehouse clusters?

  • 2 kudos
8 More Replies
marcuskw
by Contributor
  • 4885 Views
  • 1 replies
  • 0 kudos

Resolved! Lakehouse Federation for SQL Server and Security Policy

We've been able to setup a Foreign Catalog using the following documentation:https://learn.microsoft.com/en-us/azure/databricks/query-federation/sql-serverHowever the tables that have RLS using a Security Policy appear empty. I imagine that this solu...

  • 4885 Views
  • 1 replies
  • 0 kudos
Latest Reply
marcuskw
Contributor
  • 0 kudos

Was a bit quick here, found out that the SUSER_NAME() of the query is of course the connection that was setup.So the User/Password defined here:Once I added that same user to the RLS logic I get the correct result. 

  • 0 kudos
64883
by New Contributor
  • 581 Views
  • 1 replies
  • 0 kudos

Support for Delta tables multicluster writes in Databricks cluster

Hello, We're using Databricks on AWS and we've recently started using Delta tables. We're using R.While the code below[1] works in a notebook, when running it from RStudio on a Databricks cluster we get the following error: java.lang.IllegalStateExce...

  • 581 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Honored Contributor
  • 0 kudos

Sorry, for being very late here -  If you can not use  multi write to false, we can try to split this table into separate tables for each stream.

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors