cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Data_Engineer3
by Contributor III
  • 2166 Views
  • 4 replies
  • 8 kudos

Resolved! Getting error popup in databricks

when i migrated to new databricks workspace, I am getting error popup message continuously and also indentation what I changed it is getting changed to other value every with new login .

image
  • 2166 Views
  • 4 replies
  • 8 kudos
Latest Reply
Sivagurunathann
New Contributor II
  • 8 kudos

Hi I am facing this issue session expired pop-ups frequently every 3 minutes while I start working on databricks.

  • 8 kudos
3 More Replies
amoralca
by New Contributor
  • 452 Views
  • 3 replies
  • 0 kudos

Exploring the Use of Databricks as a Transactional Database

Hey everyone, I’m currently working on a project where my team is thinking about using Databricks as a transactional database for our backend application. We're familiar with Databricks for analytics and big data processing, but we're not sure if it’...

  • 452 Views
  • 3 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor
  • 0 kudos

My 2 cents, Databricks Lakehouse is like a DWH which is similar to Azure Synapse dedicated pool and meant for a certain purpose. With all that power comes a limitation in concurrency and number of queries that can run in parallel. So, it's great if y...

  • 0 kudos
2 More Replies
ggsmith
by New Contributor III
  • 197 Views
  • 1 replies
  • 0 kudos

Resolved! Question: Decrypt many files with UDF

I have around 20 pgp files in a folder in my volume that I need to decrypt. I have a decryption function that accepts a file name and writes the decrypted file to a new folder in the same volume. I had thought I could create a spark dataframe with th...

  • 197 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Contributor III
  • 0 kudos

Spark's error happens because the worker nodes can't access your local files. Instead of using Spark to decrypt, try doing it outside of Spark using Python's multiprocessing or a simple batch script for parallel processing. Another option is to move ...

  • 0 kudos
ramdasp1
by New Contributor
  • 319 Views
  • 4 replies
  • 2 kudos

Delta Table Properties

Hi When I look at the properties of a delta table I see these two properties that are set to a value of 1.I went through the manual for these properties and this is what the manual says.delta.minReaderVersion: The minimum required protocol reader ver...

  • 319 Views
  • 4 replies
  • 2 kudos
Latest Reply
Brahmareddy
Contributor III
  • 2 kudos

Hi Ramdas1,Let me explain you in simple terms with an example.A Delta table is like a special book for data. delta.minReaderVersion is the minimum version of the "reader" you need to open and read the book, while delta.minWriterVersion is the minimum...

  • 2 kudos
3 More Replies
Megan05
by New Contributor III
  • 2646 Views
  • 5 replies
  • 4 kudos

Resolved! Out of Memory/Connection Lost When Writing to External SQL Server from Databricks Using JDBC Connection

I am working on writing a large amount of data from Databricks to an external SQL server using a JDB connection. I keep getting timeout errors/connection lost but digging deeper it appears to be a memory problem. I am wondering what cluster configura...

  • 2646 Views
  • 5 replies
  • 4 kudos
Latest Reply
hotrabattecom
New Contributor II
  • 4 kudos

Thanks for the answer. I am also get in this problem. Hotrabatt

  • 4 kudos
4 More Replies
yvishal519
by Contributor
  • 763 Views
  • 5 replies
  • 0 kudos

Resolved! Implementing Full Load Strategy with Delta Live Tables and Unity Catalog

Hello Databricks Community,I am seeking guidance on handling full load scenarios with Delta Live Tables (DLT) and Unity Catalog. Here’s the situation I’m dealing with:We have a data folder in Azure Data Lake Storage (ADLS) where we use Auto Loader to...

  • 763 Views
  • 5 replies
  • 0 kudos
Latest Reply
yvishal519
Contributor
  • 0 kudos

To efficiently manage full data loads, we can leverage a regex pattern to dynamically identify the latest data folders within our bronze layer. These folders typically contain the most recent data updates for our tables. By using a Python script, we ...

  • 0 kudos
4 More Replies
HoussemBL
by New Contributor
  • 147 Views
  • 0 replies
  • 0 kudos

Databricks Asset Bundle deploy failure

Hello,I have deployed successfully a Databricks Job that contains one task of type DLT using Databricks Asset Bundle.First deployment works well. For this particular Databricks job, I have clicked on "disconnect from source" to do some customization....

  • 147 Views
  • 0 replies
  • 0 kudos
novytskyi
by New Contributor
  • 116 Views
  • 0 replies
  • 0 kudos

Timeout for dbutils.jobs.taskValues.set(key, value)

I have a job that call notebook with dbutils.jobs.taskValues.set(key, value) method and assigns around 20 parameters.When I run it - it works.But when I try to call 2 or more copies of a job with different parameters - it fails with error on differen...

  • 116 Views
  • 0 replies
  • 0 kudos
mrcity
by New Contributor II
  • 1786 Views
  • 3 replies
  • 1 kudos

Exclude absent lookup keys from dataframes made by create_training_set()

I've got data stored in feature tables, plus in a data lake. The feature tables are expected to lag the data lake by at least a little bit. I want to filter data coming out of the feature store by querying the data lake for lookup keys out of my inde...

  • 1786 Views
  • 3 replies
  • 1 kudos
Latest Reply
Quinten
New Contributor II
  • 1 kudos

I'm facing the same issue as described by @mrcity. There is no easy way to alter the dataframe, which is created inside the score_batch() function. Filtering out rows in the (sklearn) pipeline itself is also not convenient since these transformers ar...

  • 1 kudos
2 More Replies
Prashanth24
by New Contributor III
  • 218 Views
  • 1 replies
  • 0 kudos

Databricks Liquid Clustering

Liquid Clustering is a combination of Partitioning and Zordering. As we know, partitioning will create folders based on the column values and stores the similar value. I believe Liquid Clustering will not create folders like partitioning, so how it w...

  • 218 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 0 kudos

why do you think Liquid Clustering uses Z ordering? I recommend you  read the design paper https://docs.google.com/document/d/1FWR3odjOw4v4-hjFy_hVaNdxHVs4WuK1asfB6M6XEMw/edit#heading=h.skpz7c7ga1wl

  • 0 kudos
js54123875
by New Contributor III
  • 406 Views
  • 3 replies
  • 2 kudos

Azure Document Intelligence

Azure AI Document Intelligence | Microsoft AzureDoes anyone have experience ingesting outputs from Azure Document Intelligence and/or know of some guides on how best to ingest this data? Specifically we are looking to ingest tax form data that has be...

  • 406 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @js54123875, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This wil...

  • 2 kudos
2 More Replies
supportvector
by New Contributor II
  • 553 Views
  • 1 replies
  • 1 kudos

Failed to start isolated execution environment

 Hi, I'm using PySpark to convert images to base64 format. The code works perfectly fine when I run it from any location on cluster. However, when the notebook is part of a GitHub repo hosted on Databricks, I get the following error: [ISOLATION_START...

  • 553 Views
  • 1 replies
  • 1 kudos
Latest Reply
supportvector
New Contributor II
  • 1 kudos

below is the base64 conversion code that i am usingdef image_to_base64(image_pathtry:with open(image_path, "rb") as image_file:return base64.b64encode(image_file.read()).decode("utf-8")except Exception as e:return str(e)

  • 1 kudos
Maatari
by New Contributor III
  • 234 Views
  • 2 replies
  • 0 kudos

Resolved! Pre-Partitioning a delta table to reduce suffling of wide operation

Assuming i need to perfom a groupby i.e. aggregation on a dataset stored in a delta table. If the delta table is partitioned by the field by which to group, can that have an impact on the suffling that the groupby would normally cause ? As a connecte...

  • 234 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Maatari, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.   If the response resolves your issue, kindly mark it as the accepted solution. T...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels