cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Dhruv-22
by New Contributor III
  • 4117 Views
  • 4 replies
  • 0 kudos

CREATE TABLE does not overwrite location whereas CREATE OR REPLACE TABLE does

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.Suppose I have a view named v1 and a database f1_processed created from the following comman...

  • 4117 Views
  • 4 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @Dhruv-22 ,  Based on the information you shared above, the "CREATE OR REPLACE" and "CREATE" commands in Databricks do have different behaviours, particularly when it comes to handling tables with specific target locations. The "CREATE OR REPLACE"...

  • 0 kudos
3 More Replies
DApt
by New Contributor II
  • 8514 Views
  • 1 replies
  • 2 kudos

REDACTED_POSSIBLE_SECRET_ACCESS_KEY as part of column value result form aes_encrypt

Hi, i've encountered an error using base64/aes_encrypt, as result the string saved contains 'REDACTED_POSSIBLE_SECRET_ACCESS_KEY' at the end destroying the original data, rendering it useless undecryptable, is there a way to avoid this replacement in...

Captura de pantalla 2023-12-11 152523.png DApt_0-1702326511602.png DApt_3-1702327037748.png DApt_1-1702326665014.png
  • 8514 Views
  • 1 replies
  • 2 kudos
Latest Reply
DataEnthusiast1
New Contributor II
  • 2 kudos

I had the same issue, and my usage was similar to OP:base64(aes_encrypt(<clear_text>, unbase64(secret(<scope>, <key>))))Databricks support suggested to not call secret within the insert/update operation that writes to the table. After updating the py...

  • 2 kudos
Dhruv-22
by New Contributor III
  • 3663 Views
  • 3 replies
  • 1 kudos

Resolved! REPLACE TABLE AS SELECT is not working with parquet whereas it works fine for delta

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue.Suppose I have a view named v1 and a database f1_processed created from the following comman...

  • 3663 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 1 kudos

Hi @Dhruv-22  We understand that you are facing the following error when using REPLACE TABLE AS SELECT  on the Parquet Table but at this moment the REPLACE TABLE AS SELECT operation you're trying to perform is not supported for Parquet tables. Accord...

  • 1 kudos
2 More Replies
Kroy
by Contributor
  • 1483 Views
  • 2 replies
  • 0 kudos

Near Real time Solutioning on data from Core System which gets updated

We are trying to build solution , where customer data stored in one of RDBM database SQL server and we are moving this data to delta lake in medallion architecture and want to this to be near real time by using DLT pipeline.Problem is that source tab...

  • 1483 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kroy
Contributor
  • 0 kudos

came across this matrix while reading about DLT what is read from complete and write to Incremental means ?

  • 0 kudos
1 More Replies
billfoster
by New Contributor II
  • 16450 Views
  • 8 replies
  • 4 kudos

how can I learn DataBricks

I am currently enrolled in data engineering boot camp. We go over various technologies azure , pyspark , airflow , Hadoop ,nosql,SQL, python. But not over something like databricks. I am in contact with lots of recent graduates who landed a job. Almo...

  • 16450 Views
  • 8 replies
  • 4 kudos
Latest Reply
Ali23
New Contributor II
  • 4 kudos

 I'd be glad to help you on your journey to learning Databricks! Whether you're a beginner or aiming to advance your skills, here's a comprehensive guide:Foundations:Solid understanding of core concepts: Begin with foundational knowledge in big data,...

  • 4 kudos
7 More Replies
John_Rotenstein
by New Contributor II
  • 7209 Views
  • 1 replies
  • 0 kudos

Resolved! "Run Job" without waiting for target job to finish?

We have configured a task in Job-A to run Job-B.However, the task in Job-A continues to 'run' until Job-B has completed.I can see this would be useful if we wanted to wait for Job-B and then perform another task, but we would actually like Job-A to e...

  • 7209 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

you can create and run job use sdkdatabricks-sdk-py/examples/jobs/run_now_jobs_api_full_integration.py at main · databricks/databricks-sdk-py (github.com)

  • 0 kudos
DavidMooreZA
by New Contributor II
  • 1853 Views
  • 2 replies
  • 0 kudos

Structure Streaming - Table(s) to File(s) - Is it possible?

Hi,I'm trying to do something that's probably considered a no-no. The documentation makes me believe it should be possible. But, I'm getting lots of weird errors when trying to make it work.If anyone has managed to get something similar to work, plea...

  • 1853 Views
  • 2 replies
  • 0 kudos
Latest Reply
DavidMooreZA
New Contributor II
  • 0 kudos

I created a new schema and volume specifically for this testing and the names are all distinct from any other objects in the catalog.I did a quick double check just in case I somehow missed a duplicate and there were none.

  • 0 kudos
1 More Replies
pgagliardi
by New Contributor II
  • 1865 Views
  • 1 replies
  • 2 kudos

Latest pushed code is not taken into account by Notebook

Hello, I cloned a repo my_repo in the Dataricks space Repos.Inside my_repo, I created a notebook new_experiment where I can import functions from my_repo, which is really handy. When I want to modify a function in my_repo, I open my local IDE, do the...

  • 1865 Views
  • 1 replies
  • 2 kudos
Latest Reply
Jnguyen
Databricks Employee
  • 2 kudos

Use  %reload_ext autoreload instead, it will do your expected behavior.You just need to run it once, like %load_ext autoreload %autoreload 2

  • 2 kudos
jcoggs
by New Contributor II
  • 3805 Views
  • 2 replies
  • 1 kudos

Handling Exceptions from dbutils.fs in Python

I have a notebook that calls dbutils.fs.ls() for some derived file path in azure. Occasionally, this path may not exist, and in general I can't always guarantee that the path exists. When the path doesn't exist it throws an "ExecutionError" which app...

Data Engineering
dbutils
Error
Exceptions
  • 3805 Views
  • 2 replies
  • 1 kudos
Latest Reply
Palash01
Valued Contributor
  • 1 kudos

Hey @jcoggs The problem looks legit though never occurred to me as I try to keep my mounts manually fed to the pipeline using a parameters or a variable by doing this you will have more control over your pipelines see if you could do the same in your...

  • 1 kudos
1 More Replies
tobyevans
by New Contributor II
  • 4576 Views
  • 0 replies
  • 1 kudos

Ingesting complex/unstructured data

Hi there,my company is reasonably new to using Databricks, and we're running our first PoCs.  Some of the data we have structured/reasonably structured, so it drops into a bucket, we point a notebook at it, and all is well and DeltaThe problem is ari...

  • 4576 Views
  • 0 replies
  • 1 kudos
AndyM
by New Contributor II
  • 1230 Views
  • 1 replies
  • 0 kudos

Databricks import api with lakeview dashboard "error_code":"INVALID_PARAMETER_VALUE"

Hi Community!I was trying to use the import api to replicate a lakehouse dashboard in a workspace but I keep bumping into INVALID_PARAMETER_VALUE error. After spending some time with getting the "content" property to (probably) correct base64 string ...

  • 1230 Views
  • 1 replies
  • 0 kudos
Latest Reply
SergeRielau
Databricks Employee
  • 0 kudos

There may be more information in the log. Look for "Cannot parse the zip archive."

  • 0 kudos
zyang
by Contributor
  • 12799 Views
  • 12 replies
  • 13 kudos

Option "delta.columnMapping.mode","name" introduces unexpected result

Hi, I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.A...

image
  • 12799 Views
  • 12 replies
  • 13 kudos
Latest Reply
CkoockieMonster
New Contributor II
  • 13 kudos

Hello, I'm a bit late to the party, but I'll put that for posterity:There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't thi...

  • 13 kudos
11 More Replies
melbourne
by Contributor
  • 1232 Views
  • 2 replies
  • 1 kudos

Unable to write to Volume from DLT pipeline

Hi,I've a DLT pipeline running in Unity Catalog, and one of the task is to write content into a file within volume.I was able to write to file within volume using just PySpark, however when I do the same in DLT, I get an error:OSError: [Errno 30] Rea...

  • 1232 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

It seems like a permission error.  Can you check if the managed identity has correct permissions to write to the volume?DLT should be supported to write into volumes.

  • 1 kudos
1 More Replies
param_sen
by New Contributor II
  • 2535 Views
  • 1 replies
  • 0 kudos

What is the best practice for data model in silver layer in lakehouse

As per databricks https://www.databricks.com/glossary/medallion-architecture silver layers typically represent the "enterprise view" with improved quality than bronze (cleansed, deduplicated , augmented ) and mostly has 3NF like normalised data . The...

  • 2535 Views
  • 1 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @param_sen Given your concerns about expensive joins and prioritizing analytics with flat raw data, here are some suggestions:Analyze the most common queries and reports you anticipate. Do they heavily rely on joins across dimensions? If not, the...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels