cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aschiff
by Contributor II
  • 735135 Views
  • 33 replies
  • 5 kudos

GC Driver Error

I am using a cluster in databricks to connect to a Tableau workbook through the JDBC connector. My Tableau workbook has been unable to load due to resources not being available through the data connection. I went to look at the driver log for my clus...

  • 735135 Views
  • 33 replies
  • 5 kudos
Latest Reply
galang123
New Contributor II
  • 5 kudos

yesasd

  • 5 kudos
32 More Replies
KosmaS
by New Contributor III
  • 10270 Views
  • 3 replies
  • 7 kudos

Resolved! Efficient caching/persisting

To cache/persist an action needs to be triggered. I'm just wondering, will it make any difference if, after persisting some df, I use, for instance, take(5) instead of count()?Will it be a bit more effective, because of sending results from 5 partiti...

  • 10270 Views
  • 3 replies
  • 7 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 7 kudos

Yes take (5) will be more efficient in some ways.When you cache or persist a DataFrame in Spark, you are instructing Spark to store the DataFrame's intermediate data in memory (or on disk, depending on the storage level). This can significantly speed...

  • 7 kudos
2 More Replies
Manish1231
by New Contributor
  • 3823 Views
  • 0 replies
  • 0 kudos

how to migrate features from azure databricks workspace to gcp

I’m in the process of migrating feature tables from Azure Databricks to GCP Databricks and am having trouble listing all feature tables from Azure Databricks.I’ve tried using the FeatureStoreClient API, but it doesn’t have a function to list all feat...

Data Engineering
data engineering
  • 3823 Views
  • 0 replies
  • 0 kudos
ptambe
by Databricks Partner
  • 7653 Views
  • 6 replies
  • 3 kudos

Resolved! Is Concurrent Writes from multiple databricks clusters to same delta table on S3 Supported?

Does databricks have support for writing to same Delta Table from multiple clusters concurrently. I am specifically interested to know if there is any solution for https://github.com/delta-io/delta/issues/41 implemented in databricks OR if you have a...

  • 7653 Views
  • 6 replies
  • 3 kudos
Latest Reply
dennyglee
Databricks Employee
  • 3 kudos

Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Del...

  • 3 kudos
5 More Replies
talenik
by New Contributor III
  • 3405 Views
  • 2 replies
  • 1 kudos

Resolved! Ingesting logs from Databricks (GCP) to Azure log Analytics

Hi everyone, I wanted to ask if there is any way through which we can ingest logs from GCP databricks to azure log analytics in store-sync fashion. Meaning we will save logs into some cloud bucket lets say, then from there we should be able to send l...

Data Engineering
azure log analytics
Databricks
GCP databricks
google cloud
  • 3405 Views
  • 2 replies
  • 1 kudos
Latest Reply
talenik
New Contributor III
  • 1 kudos

Hi @Retired_mod ,Thanks for help. We decided to develop our own library for logging to azure log analytics. We used buffer for this. We are currently on timer based logs but in future versions we wanted to move to memory based.Thanks,Nikhil

  • 1 kudos
1 More Replies
kodexolabs
by New Contributor
  • 4449 Views
  • 0 replies
  • 0 kudos

Federated Learning for Decentralized, Secure Model Training

Federated learning allows you to train machine learning models on decentralized data while ensuring data privacy and security by storing data on local devices and only sharing model updates. This approach assures that raw data never leaves its source...

  • 4449 Views
  • 0 replies
  • 0 kudos
venkateshp
by New Contributor II
  • 3055 Views
  • 3 replies
  • 3 kudos

How to reliably get the databricks run time version as part of init scripts in aws/azure databricks

We currently use the script below, but it is not working in some environments.The environment variable used in the script is not listed in this link Databricks Environment Variables```bash#!/bin/bashecho "Databricks Runtime Version: $DATABRICKS_RUNTI...

Data Engineering
init scripts
  • 3055 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

If environment variable doesn't work for you, then maybe try with REST API or databrick cli?

  • 3 kudos
2 More Replies
guangyi
by Contributor III
  • 3776 Views
  • 1 replies
  • 0 kudos

Resolved! How exactly to create cluster policy via Databricks CLI ?

I tried these ways they are all not working:  Save the json config into a JSON file locally and run databricks cluster-policies create --json cluster-policy.json Error message: Error: invalid character 'c' looking for beginning of valueSave the json ...

  • 3776 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @guangyi ,Try to add @ before the name of json filedatabricks cluster-policies create --json @policy.json Also make sure that you're escaping quotation marks like they do in below documenation:Create a new policy | Cluster Policies API | REST API ...

  • 0 kudos
mddheeraj
by New Contributor
  • 1218 Views
  • 0 replies
  • 0 kudos

Streaming Kafka data without duplication

Hello,We are creating an application to read data from Kafka topic send by a source. After we get the data, we do some transformations and send to other Kafka topic. In this process source may send same data twice.Our questions are1. How can we contr...

  • 1218 Views
  • 0 replies
  • 0 kudos
suqadi
by New Contributor
  • 1408 Views
  • 1 replies
  • 0 kudos

systems table predictive_optimization_operations_history stays empty

Hi,For our lakehouse with Unity catalog enabled, we enabled predictive optimization feature for several catalogs to clean up storage with Vacuum. When we describe the catalogs, we can see that predictive optimization is enabled. The system table for ...

  • 1408 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello as per docs data could take 24 hours to be retrieved, can you confirm if the below requirement are met?Your region must support predictive optimization (see Databricks clouds and regions).

  • 0 kudos
anh-le
by Databricks Partner
  • 1602 Views
  • 1 replies
  • 2 kudos

Image disappears after notebook export to HTML

Hi everyone,I have an image saved at DBFS which I want to include in my notebook. I'm using the standard markdown syntax![my image] (/files/my_image.png)which works and the image shows. However, when I export the notebook to HTML, the image disappear...

  • 1602 Views
  • 1 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

The issue you're experiencing might be due to the fact that when you export your notebook to HTML, the image from DBFS isn't accessible in the same way as it is within the Databricks environment. The DBFS path isn't accessible from outside Databricks...

  • 2 kudos
prasadvaze
by Valued Contributor II
  • 2181 Views
  • 1 replies
  • 2 kudos

Resolved! Grant permission on catalog but revoke from schema for the same user

I have a catalog ( in unity catalog) containing multiple schemas.  I need an AD group to have select permission on all the schemas so at catalog level I granted Select to AD grp.  Then, I need to revoke permission on one particular schema in this cat...

  • 2181 Views
  • 1 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

This unfortunately is not possible due to the hierarchical mechanism in UC, you will need to grant permissions to the specific schemas directly and not by providing a major permission at the catalog level

  • 2 kudos
Abhot
by New Contributor II
  • 10957 Views
  • 4 replies
  • 0 kudos

Temp Table Vs Temp View Vs temp table function- which one is better for large Databrick data processing

Hello , 1 ) Which one is better during large data processing - Temp table vs Temporary view vs temp Table function . 2) How lazy evaluation better for processing ? and which one of the above helps in lazy evaluation

  • 10957 Views
  • 4 replies
  • 0 kudos
Latest Reply
Abhot
New Contributor II
  • 0 kudos

Does anyone have any suggestions regarding the question above?

  • 0 kudos
3 More Replies
greyamber
by New Contributor II
  • 3119 Views
  • 1 replies
  • 0 kudos

Python UDF vs Scala UDF in pyspark code

Is there a performance difference between Python UDF vs Scala UDF in pyspark code.

  • 3119 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @greyamber ,Yes, there is a difference. Scala would be faster. You read about the reason and benchmark on following blog:Spark UDF — Deep Insights in Performance | by QuantumBlack, AI by McKinsey | QuantumBlack, AI by McKinsey | Medium

  • 0 kudos
Labels