cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ramdasp1
by New Contributor
  • 2271 Views
  • 4 replies
  • 2 kudos

Delta Table Properties

Hi When I look at the properties of a delta table I see these two properties that are set to a value of 1.I went through the manual for these properties and this is what the manual says.delta.minReaderVersion: The minimum required protocol reader ver...

  • 2271 Views
  • 4 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor II
  • 2 kudos

Hi Ramdas1,Let me explain you in simple terms with an example.A Delta table is like a special book for data. delta.minReaderVersion is the minimum version of the "reader" you need to open and read the book, while delta.minWriterVersion is the minimum...

  • 2 kudos
3 More Replies
Megan05
by New Contributor III
  • 6446 Views
  • 5 replies
  • 4 kudos

Resolved! Out of Memory/Connection Lost When Writing to External SQL Server from Databricks Using JDBC Connection

I am working on writing a large amount of data from Databricks to an external SQL server using a JDB connection. I keep getting timeout errors/connection lost but digging deeper it appears to be a memory problem. I am wondering what cluster configura...

  • 6446 Views
  • 5 replies
  • 4 kudos
Latest Reply
hotrabattecom
New Contributor II
  • 4 kudos

Thanks for the answer. I am also get in this problem. Hotrabatt

  • 4 kudos
4 More Replies
mrcity
by Databricks Partner
  • 4328 Views
  • 3 replies
  • 1 kudos

Exclude absent lookup keys from dataframes made by create_training_set()

I've got data stored in feature tables, plus in a data lake. The feature tables are expected to lag the data lake by at least a little bit. I want to filter data coming out of the feature store by querying the data lake for lookup keys out of my inde...

  • 4328 Views
  • 3 replies
  • 1 kudos
Latest Reply
Quinten
Databricks Partner
  • 1 kudos

I'm facing the same issue as described by @mrcity. There is no easy way to alter the dataframe, which is created inside the score_batch() function. Filtering out rows in the (sklearn) pipeline itself is also not convenient since these transformers ar...

  • 1 kudos
2 More Replies
Prashanth24
by New Contributor III
  • 1871 Views
  • 1 replies
  • 0 kudos

Databricks Liquid Clustering

Liquid Clustering is a combination of Partitioning and Zordering. As we know, partitioning will create folders based on the column values and stores the similar value. I believe Liquid Clustering will not create folders like partitioning, so how it w...

  • 1871 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 0 kudos

why do you think Liquid Clustering uses Z ordering? I recommend you  read the design paper https://docs.google.com/document/d/1FWR3odjOw4v4-hjFy_hVaNdxHVs4WuK1asfB6M6XEMw/edit#heading=h.skpz7c7ga1wl

  • 0 kudos
js54123875
by New Contributor III
  • 7177 Views
  • 3 replies
  • 2 kudos

Azure Document Intelligence

Azure AI Document Intelligence | Microsoft AzureDoes anyone have experience ingesting outputs from Azure Document Intelligence and/or know of some guides on how best to ingest this data? Specifically we are looking to ingest tax form data that has be...

  • 7177 Views
  • 3 replies
  • 2 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 2 kudos

Hi @js54123875, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This wil...

  • 2 kudos
2 More Replies
supportvector
by New Contributor II
  • 2236 Views
  • 1 replies
  • 1 kudos

Failed to start isolated execution environment

 Hi, I'm using PySpark to convert images to base64 format. The code works perfectly fine when I run it from any location on cluster. However, when the notebook is part of a GitHub repo hosted on Databricks, I get the following error: [ISOLATION_START...

  • 2236 Views
  • 1 replies
  • 1 kudos
Latest Reply
supportvector
New Contributor II
  • 1 kudos

below is the base64 conversion code that i am usingdef image_to_base64(image_pathtry:with open(image_path, "rb") as image_file:return base64.b64encode(image_file.read()).decode("utf-8")except Exception as e:return str(e)

  • 1 kudos
Maatari
by New Contributor III
  • 2047 Views
  • 2 replies
  • 0 kudos

Resolved! Pre-Partitioning a delta table to reduce suffling of wide operation

Assuming i need to perfom a groupby i.e. aggregation on a dataset stored in a delta table. If the delta table is partitioned by the field by which to group, can that have an impact on the suffling that the groupby would normally cause ? As a connecte...

  • 2047 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @Maatari, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.   If the response resolves your issue, kindly mark it as the accepted solution. T...

  • 0 kudos
1 More Replies
Prashanth24
by New Contributor III
  • 4857 Views
  • 5 replies
  • 4 kudos

Databricks Worker node - Would like to know number of memory in each core

Under Databricks Compute and Worker nodes, we find different types of types as belowStandard_D4ds_v5 => 16 GB Memory, 4 CoresStandard_D8ds_v5 => 32 GB Memory, 8 CoresIn Databricks, each node will have one executor. I have questions below(1) How much ...

  • 4857 Views
  • 5 replies
  • 4 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 4 kudos

Hi @Prashanth24, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.   If the response resolves your issue, kindly mark it as the accepted solutio...

  • 4 kudos
4 More Replies
koantek_user
by Databricks Partner
  • 2269 Views
  • 2 replies
  • 0 kudos

geometric functions in databricks

Hi All,We are working on a migration project from snowflake to databricks and there are some scripts that utilizegeometric functions like st_makepoint, st_geohash from snowflake scripts which we need to convert to databricksHas some encountered this ...

  • 2269 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @koantek_user, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This w...

  • 0 kudos
1 More Replies
rameshybr
by Databricks Partner
  • 2570 Views
  • 2 replies
  • 1 kudos

Resolved! Data Bricks workflow - Executing the workflow with different input parameters with concurrently

How can I trigger a workflow concurrently (multiple times) with different input parameters? Please share your thoughts or any related articles.

  • 2570 Views
  • 2 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @rameshybr, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will...

  • 1 kudos
1 More Replies
koantek_user
by Databricks Partner
  • 7594 Views
  • 2 replies
  • 0 kudos

lateral view explode in databricks - need help

We are working on snowflake to databricks migration and we encountered the lateral flatten function of snowflake which we tried to convert to lateral view explode in databricks- but its output is a subset of lateral flatten-----------------------http...

  • 7594 Views
  • 2 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @koantek_user, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This w...

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 6678 Views
  • 2 replies
  • 0 kudos

Resolved! Is Spark Driver a synonym for Spark Master daemon

If I understand correctly, Spark driver is a master process. Is it the same as the Spark Master. I get confused with the Spark master and Spark driver.

  • 6678 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is a common misconception. Spark Master and Spark driver are two independent and isolated JVM's running on the same instance. Spark Master's responsibilities are to ensure the Spark worker's daemons are up and running and monitor the health. Als...

  • 0 kudos
1 More Replies
Rishabh-Pandey
by Databricks MVP
  • 14054 Views
  • 1 replies
  • 3 kudos

Key Advantages of Serverless Compute in Databricks

Serverless compute in Databricks offers several advantages, enhancing efficiency, scalability, and ease of use. Here are some key benefits:1. Simplified Infrastructure ManagementNo Server Management: Users don't need to manage or configure servers or...

  • 14054 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ashu24
Databricks Partner
  • 3 kudos

Thanks for the clear understanding 

  • 3 kudos
Pritam
by New Contributor II
  • 7840 Views
  • 4 replies
  • 1 kudos

Not able create Job via Jobs api in databricks

I am not able to create jobs via jobs API in databricks.Error=INVALID_PARAMETER_VALUE: Job settings must be specified.I simply copied the JSON file and saved it. Loaded the same JSON file and tried to create the job via API but the got the above erro...

  • 7840 Views
  • 4 replies
  • 1 kudos
Latest Reply
rAlex
New Contributor III
  • 1 kudos

@Pritam Arya​  I had the same problem today. In order to use the JSON that you can get from the GUI in an existing job, in a request to the Jobs API, you want to use just the JSON that is the value of the settings key.

  • 1 kudos
3 More Replies
Labels