cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dhruv-22
by New Contributor III
  • 10056 Views
  • 4 replies
  • 1 kudos

Resolved! Managed table overwrites existing location for delta but not for oth

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.Suppose I have a view named v1 and a database f1_processed created from the following comman...

  • 10056 Views
  • 4 replies
  • 1 kudos
Latest Reply
Red_blue_green
New Contributor III
  • 1 kudos

Hi,this is how the delta format work. With overwrite you are not deleting the files in the folder or replacing them. Delta is creating a new file with the overwritten schema and data. This way you are also able to return to former versions of the del...

  • 1 kudos
3 More Replies
sanjay
by Valued Contributor II
  • 12497 Views
  • 1 replies
  • 0 kudos

pyspark dropDuplicates performance issue

Hi,I am trying to delete duplicate records found by key but its very slow.  Its continuous running pipeline so data is not that huge but still it takes time to execute this command.df = df.dropDuplicates(["fileName"])Is there any better approach to d...

  • 12497 Views
  • 1 replies
  • 0 kudos
Accn
by New Contributor
  • 1209 Views
  • 1 replies
  • 0 kudos

Dashboard from Notebook - How to schedule

notebook is created with insight and have created dashboard (Not a SQL) from it.Need to schedule this. I have tried scheduling by workflow - it only takes you to the notebookeven the schedule from dashboard takes me to the notebook and not the dashbo...

  • 1209 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @Accn , Thanks for bringing up your concerns, always happy to help  We understand your concern but right now there is only way to refresh a notebook dashboard is via scheduled jobs. To schedule a dashboard to refresh at a specified interval, click...

  • 0 kudos
chrisf_sts
by New Contributor II
  • 8177 Views
  • 1 replies
  • 1 kudos

Resolved! After moving mounted s3 bucket under unity catalog control, python file paths no longer work

I have been using a mounted external s3 bucket with json files up until a few days ago, when my company changed to using all file mounts under control of the unity catalog.  Suddenly I can no loner run a command like:with open("/mnt/my_files/my_json....

  • 8177 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 1 kudos

Hi @chrisf_sts ,Thanks for bringing up your concerns, always happy to help  May I know which cluster access mode you are using to run the notebook commands? Can you please try to run this below command on Single user cluster access mode.  "with open(...

  • 1 kudos
brickster_2018
by Databricks Employee
  • 12533 Views
  • 3 replies
  • 6 kudos

Resolved! How to add I custom logging in Databricks

I want to add custom logs that redirect in the Spark driver logs. Can I use the existing logger classes to have my application logs or progress message in the Spark driver logs.

  • 12533 Views
  • 3 replies
  • 6 kudos
Latest Reply
Kaizen
Valued Contributor
  • 6 kudos

1) Is it possible to save all the custom logging to its own file? Currently it is being logging with all other cluster logs (see image) 2) Also Databricks it seems like a lot of blank files are also being created for this. Is this a bug? this include...

  • 6 kudos
2 More Replies
sha
by New Contributor
  • 1302 Views
  • 1 replies
  • 0 kudos

Importing data from S3 to Azure DataBricks Cluster with Unity Catalog in Shared Mode

Environment details:DataBricks on Azure, 13.3 LTS, Unity Catalog, Shared Cluster mode.Currently in the environment I'm in, we run imports from S3 with code like:spark.read.option('inferSchema', 'true').json(s3_path).  When running on a cluster in Sha...

  • 1302 Views
  • 1 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Contributor
  • 0 kudos

Hello Sha, We usually get such error while working with shared cluster mode assuming this your dev environment just to avoid errors, please use different clusters. However as a alternative solution in case if would like to keep the shared cluster the...

  • 0 kudos
Dhruv-22
by New Contributor III
  • 4853 Views
  • 4 replies
  • 0 kudos

CREATE TABLE does not overwrite location whereas CREATE OR REPLACE TABLE does

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.Suppose I have a view named v1 and a database f1_processed created from the following comman...

  • 4853 Views
  • 4 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @Dhruv-22 ,  Based on the information you shared above, the "CREATE OR REPLACE" and "CREATE" commands in Databricks do have different behaviours, particularly when it comes to handling tables with specific target locations. The "CREATE OR REPLACE"...

  • 0 kudos
3 More Replies
DApt
by New Contributor II
  • 8742 Views
  • 1 replies
  • 2 kudos

REDACTED_POSSIBLE_SECRET_ACCESS_KEY as part of column value result form aes_encrypt

Hi, i've encountered an error using base64/aes_encrypt, as result the string saved contains 'REDACTED_POSSIBLE_SECRET_ACCESS_KEY' at the end destroying the original data, rendering it useless undecryptable, is there a way to avoid this replacement in...

Captura de pantalla 2023-12-11 152523.png DApt_0-1702326511602.png DApt_3-1702327037748.png DApt_1-1702326665014.png
  • 8742 Views
  • 1 replies
  • 2 kudos
Latest Reply
DataEnthusiast1
New Contributor II
  • 2 kudos

I had the same issue, and my usage was similar to OP:base64(aes_encrypt(<clear_text>, unbase64(secret(<scope>, <key>))))Databricks support suggested to not call secret within the insert/update operation that writes to the table. After updating the py...

  • 2 kudos
Dhruv-22
by New Contributor III
  • 4153 Views
  • 3 replies
  • 1 kudos

Resolved! REPLACE TABLE AS SELECT is not working with parquet whereas it works fine for delta

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.Suppose I have a view named v1 and a database f1_processed created from the following comman...

  • 4153 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 1 kudos

Hi @Dhruv-22  We understand that you are facing the following error when using REPLACE TABLE AS SELECT  on the Parquet Table but at this moment the REPLACE TABLE AS SELECT operation you're trying to perform is not supported for Parquet tables. Accord...

  • 1 kudos
2 More Replies
Kroy
by Contributor
  • 1556 Views
  • 2 replies
  • 0 kudos

Near Real time Solutioning on data from Core System which gets updated

We are trying to build solution , where customer data stored in one of RDBM database SQL server and we are moving this data to delta lake in medallion architecture and want to this to be near real time by using DLT pipeline.Problem is that source tab...

  • 1556 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kroy
Contributor
  • 0 kudos

came across this matrix while reading about DLT what is read from complete and write to Incremental means ?

  • 0 kudos
1 More Replies
John_Rotenstein
by New Contributor II
  • 7381 Views
  • 1 replies
  • 0 kudos

Resolved! "Run Job" without waiting for target job to finish?

We have configured a task in Job-A to run Job-B.However, the task in Job-A continues to 'run' until Job-B has completed.I can see this would be useful if we wanted to wait for Job-B and then perform another task, but we would actually like Job-A to e...

  • 7381 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

you can create and run job use sdkdatabricks-sdk-py/examples/jobs/run_now_jobs_api_full_integration.py at main · databricks/databricks-sdk-py (github.com)

  • 0 kudos
pgagliardi
by New Contributor II
  • 1974 Views
  • 1 replies
  • 2 kudos

Latest pushed code is not taken into account by Notebook

Hello, I cloned a repo my_repo in the Dataricks space Repos.Inside my_repo, I created a notebook new_experiment where I can import functions from my_repo, which is really handy. When I want to modify a function in my_repo, I open my local IDE, do the...

  • 1974 Views
  • 1 replies
  • 2 kudos
Latest Reply
Jnguyen
Databricks Employee
  • 2 kudos

Use  %reload_ext autoreload instead, it will do your expected behavior.You just need to run it once, like %load_ext autoreload %autoreload 2

  • 2 kudos
jcoggs
by New Contributor II
  • 4611 Views
  • 2 replies
  • 1 kudos

Handling Exceptions from dbutils.fs in Python

I have a notebook that calls dbutils.fs.ls() for some derived file path in azure. Occasionally, this path may not exist, and in general I can't always guarantee that the path exists. When the path doesn't exist it throws an "ExecutionError" which app...

Data Engineering
dbutils
Error
Exceptions
  • 4611 Views
  • 2 replies
  • 1 kudos
Latest Reply
Palash01
Valued Contributor
  • 1 kudos

Hey @jcoggs The problem looks legit though never occurred to me as I try to keep my mounts manually fed to the pipeline using a parameters or a variable by doing this you will have more control over your pipelines see if you could do the same in your...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels