cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Manish1231
by New Contributor
  • 3372 Views
  • 0 replies
  • 0 kudos

how to migrate features from azure databricks workspace to gcp

I’m in the process of migrating feature tables from Azure Databricks to GCP Databricks and am having trouble listing all feature tables from Azure Databricks.I’ve tried using the FeatureStoreClient API, but it doesn’t have a function to list all feat...

Data Engineering
data engineering
  • 3372 Views
  • 0 replies
  • 0 kudos
ptambe
by New Contributor III
  • 5822 Views
  • 6 replies
  • 3 kudos

Resolved! Is Concurrent Writes from multiple databricks clusters to same delta table on S3 Supported?

Does databricks have support for writing to same Delta Table from multiple clusters concurrently. I am specifically interested to know if there is any solution for https://github.com/delta-io/delta/issues/41 implemented in databricks OR if you have a...

  • 5822 Views
  • 6 replies
  • 3 kudos
Latest Reply
dennyglee
Databricks Employee
  • 3 kudos

Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Del...

  • 3 kudos
5 More Replies
talenik
by New Contributor III
  • 2042 Views
  • 2 replies
  • 1 kudos

Resolved! Ingesting logs from Databricks (GCP) to Azure log Analytics

Hi everyone, I wanted to ask if there is any way through which we can ingest logs from GCP databricks to azure log analytics in store-sync fashion. Meaning we will save logs into some cloud bucket lets say, then from there we should be able to send l...

Data Engineering
azure log analytics
Databricks
GCP databricks
google cloud
  • 2042 Views
  • 2 replies
  • 1 kudos
Latest Reply
talenik
New Contributor III
  • 1 kudos

Hi @Retired_mod ,Thanks for help. We decided to develop our own library for logging to azure log analytics. We used buffer for this. We are currently on timer based logs but in future versions we wanted to move to memory based.Thanks,Nikhil

  • 1 kudos
1 More Replies
Gary_Irick
by New Contributor III
  • 12265 Views
  • 9 replies
  • 10 kudos

Delta table partition directories when column mapping is enabled

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...

  • 12265 Views
  • 9 replies
  • 10 kudos
Latest Reply
talenik
New Contributor III
  • 10 kudos

Hi @Retired_mod , I have few queries on Directory Names with Column Mapping. I have this delta table on ADLS and I am trying to read it, but I am getting below error. How can we read delta tables with column mapping enabled with pyspark?Can you pleas...

  • 10 kudos
8 More Replies
kodexolabs
by New Contributor
  • 1757 Views
  • 0 replies
  • 0 kudos

Federated Learning for Decentralized, Secure Model Training

Federated learning allows you to train machine learning models on decentralized data while ensuring data privacy and security by storing data on local devices and only sharing model updates. This approach assures that raw data never leaves its source...

  • 1757 Views
  • 0 replies
  • 0 kudos
venkateshp
by New Contributor II
  • 1550 Views
  • 3 replies
  • 3 kudos

How to reliably get the databricks run time version as part of init scripts in aws/azure databricks

We currently use the script below, but it is not working in some environments.The environment variable used in the script is not listed in this link Databricks Environment Variables```bash#!/bin/bashecho "Databricks Runtime Version: $DATABRICKS_RUNTI...

Data Engineering
init scripts
  • 1550 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

If environment variable doesn't work for you, then maybe try with REST API or databrick cli?

  • 3 kudos
2 More Replies
guangyi
by Contributor III
  • 1876 Views
  • 1 replies
  • 0 kudos

Resolved! How exactly to create cluster policy via Databricks CLI ?

I tried these ways they are all not working:  Save the json config into a JSON file locally and run databricks cluster-policies create --json cluster-policy.json Error message: Error: invalid character 'c' looking for beginning of valueSave the json ...

  • 1876 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @guangyi ,Try to add @ before the name of json filedatabricks cluster-policies create --json @policy.json Also make sure that you're escaping quotation marks like they do in below documenation:Create a new policy | Cluster Policies API | REST API ...

  • 0 kudos
mddheeraj
by New Contributor
  • 781 Views
  • 0 replies
  • 0 kudos

Streaming Kafka data without duplication

Hello,We are creating an application to read data from Kafka topic send by a source. After we get the data, we do some transformations and send to other Kafka topic. In this process source may send same data twice.Our questions are1. How can we contr...

  • 781 Views
  • 0 replies
  • 0 kudos
suqadi
by New Contributor
  • 814 Views
  • 1 replies
  • 0 kudos

systems table predictive_optimization_operations_history stays empty

Hi,For our lakehouse with Unity catalog enabled, we enabled predictive optimization feature for several catalogs to clean up storage with Vacuum. When we describe the catalogs, we can see that predictive optimization is enabled. The system table for ...

  • 814 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello as per docs data could take 24 hours to be retrieved, can you confirm if the below requirement are met?Your region must support predictive optimization (see Databricks clouds and regions).

  • 0 kudos
anh-le
by New Contributor
  • 977 Views
  • 1 replies
  • 2 kudos

Image disappears after notebook export to HTML

Hi everyone,I have an image saved at DBFS which I want to include in my notebook. I'm using the standard markdown syntax![my image] (/files/my_image.png)which works and the image shows. However, when I export the notebook to HTML, the image disappear...

  • 977 Views
  • 1 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

The issue you're experiencing might be due to the fact that when you export your notebook to HTML, the image from DBFS isn't accessible in the same way as it is within the Databricks environment. The DBFS path isn't accessible from outside Databricks...

  • 2 kudos
prasadvaze
by Valued Contributor II
  • 1187 Views
  • 1 replies
  • 2 kudos

Resolved! Grant permission on catalog but revoke from schema for the same user

I have a catalog ( in unity catalog) containing multiple schemas.  I need an AD group to have select permission on all the schemas so at catalog level I granted Select to AD grp.  Then, I need to revoke permission on one particular schema in this cat...

  • 1187 Views
  • 1 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

This unfortunately is not possible due to the hierarchical mechanism in UC, you will need to grant permissions to the specific schemas directly and not by providing a major permission at the catalog level

  • 2 kudos
Abhot
by New Contributor II
  • 7849 Views
  • 4 replies
  • 0 kudos

Temp Table Vs Temp View Vs temp table function- which one is better for large Databrick data processing

Hello , 1 ) Which one is better during large data processing - Temp table vs Temporary view vs temp Table function . 2) How lazy evaluation better for processing ? and which one of the above helps in lazy evaluation

  • 7849 Views
  • 4 replies
  • 0 kudos
Latest Reply
Abhot
New Contributor II
  • 0 kudos

Does anyone have any suggestions regarding the question above?

  • 0 kudos
3 More Replies
greyamber
by New Contributor II
  • 1489 Views
  • 1 replies
  • 0 kudos

Python UDF vs Scala UDF in pyspark code

Is there a performance difference between Python UDF vs Scala UDF in pyspark code.

  • 1489 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @greyamber ,Yes, there is a difference. Scala would be faster. You read about the reason and benchmark on following blog:Spark UDF — Deep Insights in Performance | by QuantumBlack, AI by McKinsey | QuantumBlack, AI by McKinsey | Medium

  • 0 kudos
hpant
by New Contributor III
  • 2219 Views
  • 3 replies
  • 0 kudos
  • 2219 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @hpant ,I think they are really similiar to overall best practices when in comes to python logging, like having centralize logging configuration, using correct log levels etc.Look for example on below article:10 Best Practices for Logging in Pytho...

  • 0 kudos
2 More Replies
Phani1
by Valued Contributor II
  • 1380 Views
  • 1 replies
  • 0 kudos

Huge Delta table performance consideration

Hi Team,We want to create a delta table which have historical load of 10 TB of data, and we expect an incremental refresh of about 15 GB each day.What factors should we take into account for managing such a large volume of data especially cost and pe...

  • 1380 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

@Phani1 ,All that you've mentioned is correct. Additionally, if you have scenario which require DELETE, UPDATE or MERGE you can turn on deletion vecors:Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. By d...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels