cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vannipart
by New Contributor III
  • 422 Views
  • 2 replies
  • 0 kudos

Volumes unzip files

I have this shell unzip that I use to unzip files %shsudo apt-get updatesudo apt-get install -y p7zip-full But when it comes to new workspace, I get error sudo: a terminal is required to read the password; either use the -S option to read from standa...

  • 422 Views
  • 2 replies
  • 0 kudos
Latest Reply
karthickrs
New Contributor II
  • 0 kudos

First, you can read the ZIP file in a binary format [ spark.read.format("binaryFile") ], then use the zipfile Python package to unzip and extract all the files from the zipped file and store them in a Volume.

  • 0 kudos
1 More Replies
mmceld1
by New Contributor II
  • 127 Views
  • 1 replies
  • 1 kudos

Resolved! Does Autoloader Detect New Records in a Snowflake Table or Only Work With Files?

The only thing I can find with autoloader is picking up new files, nothing about new records in an existing snowflake table.

  • 127 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @mmceld1,  Autoloader is for cloud storage files. You can achieve similar functionality by using Delta Lake and its capabilities for handling slowly changing dimensions (SCD Type 2) and change data capture (CDC)

  • 1 kudos
lauraxyz
by Contributor
  • 141 Views
  • 1 replies
  • 1 kudos

Resolved! Programmatically edit notebook

I have a job to move notebook from Volume to workspace, then execute it with dbutils.notebook.run(). Instead of directly running the notebook, i want to append some logic (i.e. Save results to a certain able) at the end of the notebook, is there a su...

  • 141 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @lauraxyz, Currently, there is no built-in feature in Databricks that directly supports appending logic to a notebook before execution, so treating the notebook as a regular file and modifying its content is a practical solution.

  • 1 kudos
ossoul
by New Contributor
  • 503 Views
  • 1 replies
  • 0 kudos

Not able to get spark application in Spark History server using cluster eventlogs

I'm encountering an issue with incomplete Spark event logs. When I am running the local Spark History Server using the cluster logs, my application appears as "incomplete". Sometime I also see few queries listed as still running, even though the appl...

  • 503 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question! I believe Databricks has its own SHS implementation, so it's not expected to work with the vanilla SHS. Regarding the queries marked as still running, we can also find this when there are event logs which were not properly c...

  • 0 kudos
martindlarsson
by New Contributor III
  • 336 Views
  • 2 replies
  • 0 kudos

Jobs indefinitely pending with libraries install

I think I found a bug where you get Pending indefinitely on jobs that has a library requirement and the user of the job does not have Manage permission on the cluster.In my case I was trying to start a dbt job with dbt-databricks=1.8.5 as library. Th...

  • 336 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your feedback! Just checking is this still an issue for you? would you share more details? if I wanted to reproduce this for example.

  • 0 kudos
1 More Replies
ashraf1395
by Valued Contributor
  • 351 Views
  • 1 replies
  • 0 kudos

Schema issue while fetching data from oracle

I dont have the complete context of the issue.But Here it is what I know, a friend of mine facing this""I am fetching data from Oracle data in databricks using python.But every time i do it the schema gets changesso if the column is of type decimal f...

  • 351 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question!To address schema issues when fetching Oracle data in Databricks, use JDBC schema inference to define data types programmatically or batch-cast columns dynamically after loading. For performance, enable predicate pushdown and...

  • 0 kudos
chris_b
by New Contributor
  • 333 Views
  • 1 replies
  • 0 kudos

Increase Stack Size for Python Subprocess

I need to increase the stack size (from the default of 16384) to run a subprocess that requires a larger stack size.I tried following this: https://community.databricks.com/t5/data-engineering/increase-stack-size-databricks/td-p/71492And this: https:...

  • 333 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question!Are you referring to a Java stack size (-Xss) or a Python subprocess (ulimit -s)?

  • 0 kudos
upatint07
by New Contributor II
  • 325 Views
  • 1 replies
  • 0 kudos

Facing Issue in "import dlt" using Databricks Runtime 14.3 LTS version

Facing issues while Importing dlt library in Databricks Runtime 14.3 LTS. Previously while using the Runtime 13.1 The `import dlt` was working fine but now when updating the Runtime version to 14.3 LTS it is giving me error. 

upatint07_0-1724733273085.png upatint07_1-1724733284564.png
  • 325 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question! Unfortunately, this is actually a known limitation with Spark Connect clusters.

  • 0 kudos
CURIOUS_DE
by New Contributor III
  • 350 Views
  • 1 replies
  • 1 kudos

A Surprise Findings in Delta Live Table

While DLT has some powerful features, I found myself doing a double-take when I realized it doesn’t natively support hard deletes. Instead, it leans on a delete flag identifier to manage these in the source table. A bit surprising for a tool of its c...

  • 350 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

Thanks for your feedback ! I believe, Delta Live Tables (DLT) does not natively support hard deletes and instead uses a delete flag identifier to manage deletions, a design choice rooted in ensuring compliance with regulations like GDPR and CCPA. Thi...

  • 1 kudos
dener
by New Contributor
  • 365 Views
  • 1 replies
  • 0 kudos

Infinity load execution

I am experiencing performance issues when loading a table with 50 million rows into Delta Lake on AWS using Databricks. Despite successfully handling other larger tables, this especific table/process takes hours and doesn't finish. Here's the command...

  • 365 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thank you for your question! To optimize your Delta Lake write process: Disable Overhead Options: Avoid overwriteSchema and mergeSchema unless necessary. Use: df.write.format("delta").mode("overwrite").save(sink) Increase Parallelism: Use repartition...

  • 0 kudos
alexgavrysh
by New Contributor
  • 254 Views
  • 1 replies
  • 0 kudos

Job scheduled run fail alert

Hello,I have a job that should run every six hours. I need to set up an alert for the case if this doesn't start (for example, someone paused it). How do I configure such an alert using Databricks native alerts?Theoretically, this may be done using s...

  • 254 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thank you for your question! Here’s a concise workflow to set up an alert for missed job runs in Databricks: Write a Query: Use system tables to identify jobs that haven’t started on time.Save the Query: Save this query in Databricks SQL as a named q...

  • 0 kudos
Thor
by New Contributor III
  • 314 Views
  • 1 replies
  • 0 kudos

Native code in Databricks clusters

Is it possible to install our own binaries (lib or exec) in Databricks clusters and use JNI to execute them?I guess that Photon is native code as far as I could read so it must use a similar technic.

  • 314 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question! I believe it should be possible, although Photon itself is not extensible by users. Are you currently facing any issues while installing and using your own libraries, and JNI to execute them?

  • 0 kudos
ed_carv
by New Contributor
  • 357 Views
  • 1 replies
  • 0 kudos

Databricks S3 Commit Service

Is Databricks S3 Commit Service enabled by default if Unity Catalog is not enabled and the compute resources run in our AWS account (classic compute plane)? If not, how can it be enabled?This service seems to resolve the limitations with multi-cluste...

  • 357 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

  No, the Databricks S3 commit service is not guaranteed to be enabled by default in the AWS classic compute plane. The configuration may vary based on your specific workspace setup. How can it be enabled? To enable the Databricks S3 commit service, ...

  • 0 kudos
MathewDRitch
by New Contributor II
  • 2049 Views
  • 4 replies
  • 1 kudos

Connecting from Databricks to Network Path

Hi All,Will appreciate if someone can help me with some references links on connecting from Databricks to external network path. I have Databricks on AWS and previously used to connect to files on external network path using Mount method. Now Databri...

  • 2049 Views
  • 4 replies
  • 1 kudos
Latest Reply
faisaljaved1
New Contributor II
  • 1 kudos

Were you able to access external network path ?I have a similar requirement where i have to write data from databricks to a network location where other external systems also access and update that file, I am new to databricks and have spend consider...

  • 1 kudos
3 More Replies
David_Billa
by New Contributor III
  • 349 Views
  • 8 replies
  • 3 kudos

Extract datetime value from the file name

I've the filename as below and I want to extract the datetime values and convert to datetime data type. This_is_new_file_2024_12_06T11_00_49_AM.csvHere I want to extract only '2024_12_06T11_00_49' and convert to datetime value in new field. I tried S...

  • 349 Views
  • 8 replies
  • 3 kudos
Latest Reply
Walter_C
Databricks Employee
  • 3 kudos

Unfortunately I am not able to make it work with SQL functions

  • 3 kudos
7 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels