cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

srDataEngineer
by New Contributor II
  • 5218 Views
  • 4 replies
  • 0 kudos

Resolved! udf not admin user

java.lang.SecurityException: User does not have permission SELECT on anonymous function.

  • 5218 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @data engineer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

  • 0 kudos
3 More Replies
Retko
by Contributor
  • 4189 Views
  • 0 replies
  • 0 kudos

Custom logging using Log4J to a file

Hello,I would like to ask for help setting up the log4j.I want to use log4j (log4j2) to generate custom log messages in my notebook when running.This message would be generated like this: logger.info("some info message") but using log4j not python lo...

  • 4189 Views
  • 0 replies
  • 0 kudos
Frank
by New Contributor III
  • 1053 Views
  • 1 replies
  • 1 kudos

Design Question

we have an application that takes in raw metrics data like key-value pairs. then we split them into four different table like below`key1, min, max, average`Those four tables are later used for dashboard. What are the design recommendations to this? S...

  • 1053 Views
  • 1 replies
  • 1 kudos
Latest Reply
stefnhuy
New Contributor III
  • 1 kudos

Hey,I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing d...

  • 1 kudos
Matt_L
by New Contributor III
  • 6558 Views
  • 3 replies
  • 3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

  • 6558 Views
  • 3 replies
  • 3 kudos
Latest Reply
Matt_L
New Contributor III
  • 3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

  • 3 kudos
2 More Replies
r0nald
by New Contributor II
  • 7697 Views
  • 3 replies
  • 1 kudos

UDF not working inside transform() & lambda (SQL)

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...

  • 7697 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

the transform function in sql is not the same as the scala/pyspark counterpart.  It is in fact a map().Here is some interesting infoI agree that functions are essential for code modularity.  Hence I prefer not to use sql but scala/pyspark instead.

  • 1 kudos
2 More Replies
UmaMahesh1
by Honored Contributor III
  • 7409 Views
  • 7 replies
  • 17 kudos

Spark Structured Streaming : Data write is too slow into adls.

I'm a bit new to spark structured streaming stuff so do ask all the relevant questions if I missed any.I have a notebook which consumes the events from a kafka topic and writes those records into adls. The topic is json serialized so I'm just writing...

  • 7409 Views
  • 7 replies
  • 17 kudos
Latest Reply
Miletto
New Contributor II
  • 17 kudos

 

  • 17 kudos
6 More Replies
564824
by New Contributor II
  • 1141 Views
  • 1 replies
  • 1 kudos

Will enabling Unity Catalog affect existing user access and jobs in production?

Hi, at my company, we are using Databricks with AWS IAM identity center as single sign on, I was looking into Unity catalog which seems to offer centralized access but I wanted to know if there will be any downside like loss of existing user profile ...

  • 1141 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

You can look into this doc https://docs.databricks.com/en/data-governance/unity-catalog/migrate.html which have some details about your question here. 

  • 1 kudos
SaraCorralLou
by New Contributor III
  • 6232 Views
  • 7 replies
  • 2 kudos

Bad performance UDFs functions

Hello,I am contacting you because I am having a problem with the performance of my notebooks on databricks.My notebook is written in python (pypark) in it I read a delta table that I copy to a dataframe and do several transformations and create sever...

SaraCorralLou_0-1692357805407.png
  • 6232 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

looping over records is a performance killer.  To be avoided at all costs.beware the for-loop (databricks.com)

  • 2 kudos
6 More Replies
Chris_Shehu
by Valued Contributor III
  • 2941 Views
  • 2 replies
  • 1 kudos

Resolved! Custom Library's(Unity Catalog Enabled Clusters)

I'm trying to use a custom library that I created from a .whl file in the workspace/shared location. The library attaches to the cluster without any issues and I can it when I list the modules using pip. When I try to call the module I get an error t...

  • 2941 Views
  • 2 replies
  • 1 kudos
Latest Reply
Szpila
New Contributor III
  • 1 kudos

Hello Guys,I am working on the project where we need to use spark-excel library (Maven) in order to ingest data from excel files. As those 3rd party library are not allowed on shared cluster, do you have any workaround other then using pandas for exa...

  • 1 kudos
1 More Replies
User15986662700
by New Contributor III
  • 4532 Views
  • 4 replies
  • 1 kudos
  • 4532 Views
  • 4 replies
  • 1 kudos
Latest Reply
User15986662700
New Contributor III
  • 1 kudos

Yes, it is possible to connect databricks to a kerberized hbase cluster. The attached article explains the steps. It consists of setting up a kerberos client using a keytab in the cluster nodes, installing the hbase-spark integration library, and set...

  • 1 kudos
3 More Replies
naga_databricks
by Contributor
  • 3588 Views
  • 1 replies
  • 0 kudos

Reading bigquery data using a query

To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. df = spark.read.format("bigquery") \.option("query", query) \.option("materializationProject", materializationProject) \...

  • 3588 Views
  • 1 replies
  • 0 kudos
EDDatabricks
by Contributor
  • 1226 Views
  • 2 replies
  • 2 kudos

Appropriate storage account type for reference data (Azure)

Hello,We are using a reference dataset for our Production applications. We would like to create a delta table for this dataset to be used from our applications. Currently, manual updates will occur on this dataset through a script on a weekly basis. ...

Data Engineering
Delta Live Table
Storage account
  • 1226 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

+1 for ADLS.  Hierarchical storage, hot/cold/premium storage, things not possible in blob storage

  • 2 kudos
1 More Replies
irispan
by New Contributor II
  • 3805 Views
  • 4 replies
  • 1 kudos

Recommended Hive metastore pattern for Trino integration

Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...

test - Databricks
  • 3805 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunlinZeng
Databricks Employee
  • 1 kudos

> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...

  • 1 kudos
3 More Replies
Agus1
by New Contributor III
  • 4599 Views
  • 3 replies
  • 3 kudos

Update destination table when using Spark Structured Streaming and Delta tables

I’m trying to implement a streaming pipeline that will run hourly using Spark Structured Streaming, Scala and Delta tables. The pipeline will process different items with their details.The source are delta tables that already exists, written hourly u...

  • 4599 Views
  • 3 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 3 kudos

@Agus1 Could you try using CDC in delta. You could use readChangeFeed to read only the changes that got applied on the source table. This is also explained here.https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed

  • 3 kudos
2 More Replies
Eric_Kieft
by New Contributor III
  • 2632 Views
  • 2 replies
  • 1 kudos

Unity Catalog Table/View Column Data Type Changes

When changing a delta table column data type in Unity Catalog, we noticed a view that is referencing that table did not automatically update to reflect the new data type.Is there a way to update the delta table column data type so that it also update...

  • 2632 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Can you try refreshing the view by running the command: REFRESH TABLE <viewname>

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels