cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

564824
by New Contributor II
  • 1513 Views
  • 1 replies
  • 1 kudos

Will enabling Unity Catalog affect existing user access and jobs in production?

Hi, at my company, we are using Databricks with AWS IAM identity center as single sign on, I was looking into Unity catalog which seems to offer centralized access but I wanted to know if there will be any downside like loss of existing user profile ...

  • 1513 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

You can look into this doc https://docs.databricks.com/en/data-governance/unity-catalog/migrate.html which have some details about your question here. 

  • 1 kudos
SaraCorralLou
by New Contributor III
  • 10124 Views
  • 7 replies
  • 2 kudos

Bad performance UDFs functions

Hello,I am contacting you because I am having a problem with the performance of my notebooks on databricks.My notebook is written in python (pypark) in it I read a delta table that I copy to a dataframe and do several transformations and create sever...

SaraCorralLou_0-1692357805407.png
  • 10124 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

looping over records is a performance killer.  To be avoided at all costs.beware the for-loop (databricks.com)

  • 2 kudos
6 More Replies
Chris_Shehu
by Valued Contributor III
  • 4356 Views
  • 2 replies
  • 1 kudos

Resolved! Custom Library's(Unity Catalog Enabled Clusters)

I'm trying to use a custom library that I created from a .whl file in the workspace/shared location. The library attaches to the cluster without any issues and I can it when I list the modules using pip. When I try to call the module I get an error t...

  • 4356 Views
  • 2 replies
  • 1 kudos
Latest Reply
Szpila
New Contributor III
  • 1 kudos

Hello Guys,I am working on the project where we need to use spark-excel library (Maven) in order to ingest data from excel files. As those 3rd party library are not allowed on shared cluster, do you have any workaround other then using pandas for exa...

  • 1 kudos
1 More Replies
User15986662700
by Databricks Employee
  • 6042 Views
  • 4 replies
  • 1 kudos
  • 6042 Views
  • 4 replies
  • 1 kudos
Latest Reply
User15986662700
Databricks Employee
  • 1 kudos

Yes, it is possible to connect databricks to a kerberized hbase cluster. The attached article explains the steps. It consists of setting up a kerberos client using a keytab in the cluster nodes, installing the hbase-spark integration library, and set...

  • 1 kudos
3 More Replies
naga_databricks
by Contributor
  • 5131 Views
  • 1 replies
  • 0 kudos

Reading bigquery data using a query

To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. df = spark.read.format("bigquery") \.option("query", query) \.option("materializationProject", materializationProject) \...

  • 5131 Views
  • 1 replies
  • 0 kudos
EDDatabricks
by Contributor
  • 1746 Views
  • 2 replies
  • 2 kudos

Appropriate storage account type for reference data (Azure)

Hello,We are using a reference dataset for our Production applications. We would like to create a delta table for this dataset to be used from our applications. Currently, manual updates will occur on this dataset through a script on a weekly basis. ...

Data Engineering
Delta Live Table
Storage account
  • 1746 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

+1 for ADLS.  Hierarchical storage, hot/cold/premium storage, things not possible in blob storage

  • 2 kudos
1 More Replies
irispan
by New Contributor II
  • 5535 Views
  • 4 replies
  • 1 kudos

Recommended Hive metastore pattern for Trino integration

Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...

test - Databricks
  • 5535 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunlinZeng
Databricks Employee
  • 1 kudos

> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...

  • 1 kudos
3 More Replies
Agus1
by New Contributor III
  • 6986 Views
  • 3 replies
  • 3 kudos

Update destination table when using Spark Structured Streaming and Delta tables

I’m trying to implement a streaming pipeline that will run hourly using Spark Structured Streaming, Scala and Delta tables. The pipeline will process different items with their details.The source are delta tables that already exists, written hourly u...

  • 6986 Views
  • 3 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 3 kudos

@Agus1 Could you try using CDC in delta. You could use readChangeFeed to read only the changes that got applied on the source table. This is also explained here.https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed

  • 3 kudos
2 More Replies
Eric_Kieft
by New Contributor III
  • 3615 Views
  • 2 replies
  • 1 kudos

Unity Catalog Table/View Column Data Type Changes

When changing a delta table column data type in Unity Catalog, we noticed a view that is referencing that table did not automatically update to reflect the new data type.Is there a way to update the delta table column data type so that it also update...

  • 3615 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Can you try refreshing the view by running the command: REFRESH TABLE <viewname>

  • 1 kudos
1 More Replies
Vibhor
by Contributor
  • 6143 Views
  • 5 replies
  • 4 kudos

Resolved! Cluster Performance

Facing an issue with cluster performance, in event log can see - cluster is not responsive likely due to GC. Number of pipeline (databricks notebooks) running and cluster configuration is same as it used to be before but started seeing this issue sin...

  • 6143 Views
  • 5 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Vibhor Sethi​ ,Do you see any other error messages? did you data volume increase? what kind of job are you running?

  • 4 kudos
4 More Replies
ajain80
by New Contributor III
  • 25892 Views
  • 5 replies
  • 10 kudos

Resolved! SFTP Connect

How I can connect sftp server from databricks. So I can write files into tables directly?

  • 25892 Views
  • 5 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

The classic solution is to copy data from FTP to ADLS storage using Azure Data Factory, and after the copy is done in the ADF pipeline, trigger the databricks notebook.

  • 10 kudos
4 More Replies
Souvikng
by New Contributor
  • 2566 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks lakehouse fundamental certification page is not working

While going to give  Databricks lakehouse fundamental certification quizIt is showing that you don't have permission to access this. Could you please guide me on that or can provide a link through which I can get permission to access this and complet...

  • 2566 Views
  • 1 replies
  • 0 kudos
Latest Reply
APadmanabhan
Databricks Employee
  • 0 kudos

Hi @Souvikng, Here is the link for the Accrediition.

  • 0 kudos
parimalpatil28
by New Contributor III
  • 5082 Views
  • 1 replies
  • 0 kudos

How to Get spark history server logs using python REST API

Hello,I am trying to load logs from spark job in remote location using Python Rest API.I want to collect those logs for particular job runs using runID field, that log should contain Errors, Exceptions, and print details.I have tried "/api/2.1/jobs/r...

  • 5082 Views
  • 1 replies
  • 0 kudos
Latest Reply
parimalpatil28
New Contributor III
  • 0 kudos

Hello,How can I access endpoint (http URL) of spark history server for a particular cluster?  Thanks,Parimal  

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels