cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

prasadvaze
by Valued Contributor II
  • 850 Views
  • 1 replies
  • 2 kudos

Resolved! Grant permission on catalog but revoke from schema for the same user

I have a catalog ( in unity catalog) containing multiple schemas.  I need an AD group to have select permission on all the schemas so at catalog level I granted Select to AD grp.  Then, I need to revoke permission on one particular schema in this cat...

  • 850 Views
  • 1 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

This unfortunately is not possible due to the hierarchical mechanism in UC, you will need to grant permissions to the specific schemas directly and not by providing a major permission at the catalog level

  • 2 kudos
Abhot
by New Contributor II
  • 6482 Views
  • 4 replies
  • 0 kudos

Temp Table Vs Temp View Vs temp table function- which one is better for large Databrick data processing

Hello , 1 ) Which one is better during large data processing - Temp table vs Temporary view vs temp Table function . 2) How lazy evaluation better for processing ? and which one of the above helps in lazy evaluation

  • 6482 Views
  • 4 replies
  • 0 kudos
Latest Reply
Abhot
New Contributor II
  • 0 kudos

Does anyone have any suggestions regarding the question above?

  • 0 kudos
3 More Replies
lozik
by New Contributor II
  • 720 Views
  • 1 replies
  • 0 kudos

Python callback functions fail to trigger

How can I get sys.exceptionhook and atexit module to trigger a callback function on exit of a python notebook? These fail to work when an unhandled exception is encountered (exceptionhook), or the program exits (atexit). 

  • 720 Views
  • 1 replies
  • 0 kudos
greyamber
by New Contributor II
  • 908 Views
  • 1 replies
  • 0 kudos

Python UDF vs Scala UDF in pyspark code

Is there a performance difference between Python UDF vs Scala UDF in pyspark code.

  • 908 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @greyamber ,Yes, there is a difference. Scala would be faster. You read about the reason and benchmark on following blog:Spark UDF — Deep Insights in Performance | by QuantumBlack, AI by McKinsey | QuantumBlack, AI by McKinsey | Medium

  • 0 kudos
hpant
by New Contributor III
  • 1292 Views
  • 3 replies
  • 0 kudos
  • 1292 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @hpant ,I think they are really similiar to overall best practices when in comes to python logging, like having centralize logging configuration, using correct log levels etc.Look for example on below article:10 Best Practices for Logging in Pytho...

  • 0 kudos
2 More Replies
Phani1
by Valued Contributor II
  • 976 Views
  • 1 replies
  • 0 kudos

Huge Delta table performance consideration

Hi Team,We want to create a delta table which have historical load of 10 TB of data, and we expect an incremental refresh of about 15 GB each day.What factors should we take into account for managing such a large volume of data especially cost and pe...

  • 976 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

@Phani1 ,All that you've mentioned is correct. Additionally, if you have scenario which require DELETE, UPDATE or MERGE you can turn on deletion vecors:Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. By d...

  • 0 kudos
SalmanDB2024
by New Contributor II
  • 779 Views
  • 1 replies
  • 0 kudos

Email Notification not received even when configured in Alerts

Hi Experts,In the Alerts of a job when configured the email id which is whitelisted and auto prompt in the dropdown  by databricks, even when configured to receive emails notification it does not shares the email, whereas the same notification of job...

  • 779 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @SalmanDB2024 ,Are you on Azure? If so, look at below solution. Maybe it will be helpfulSolved: Re: Why I am not receiving any mail sent to the Az... - Databricks Community - 15156  

  • 0 kudos
ruoyuqian
by New Contributor II
  • 1535 Views
  • 2 replies
  • 0 kudos

Upload to Volume

How to programmatically upload parquet files from Azure data lake to Catalog's Volumes? source_path = "abfss://datalake-raw-dev@xxx.dfs.core.windows.net/xxxxx/saxxles/xx/source/ETL/transformed_data/parquet/" # Define the path to your Unity Catalog V...

  • 1535 Views
  • 2 replies
  • 0 kudos
Latest Reply
Witold
Honored Contributor
  • 0 kudos

Besides, when accessing volumes, you don't need to provide dbfs protocol: `/Volumes/xxx/xxx/transformed_parquet`

  • 0 kudos
1 More Replies
lixing
by New Contributor II
  • 327 Views
  • 1 replies
  • 0 kudos

helps on receiving events with python

According to the python code of Create a Python script to receive events, I need to define BLOB_STORAGE_CONNECTION_STRING and BLOB_CONTAINER_NAME, Could you tell me how to get them? Thanks. 

  • 327 Views
  • 1 replies
  • 0 kudos
Latest Reply
Witold
Honored Contributor
  • 0 kudos

Hey @lixing How is this actually related to Databricks?Besides that, these are just meta information of your storage container. One way to get these data is to go to the Azure portal, and navigate to the appropriate container.

  • 0 kudos
Takao
by New Contributor II
  • 931 Views
  • 2 replies
  • 2 kudos

Resolved! How to run OPTIMIZE to too big data set which has 11TB and more ?

Sorry for my very poor English and low Databricks Skill.At work, my boss asked me to perform liquid clustering on four columns for a Delta Lake table with an 11TB capacity and over 80 columns, and I was estimating the resources and costs required to ...

  • 931 Views
  • 2 replies
  • 2 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 2 kudos

Couple of things:OPTIMIZE is a very compute intensive operation. Make sure you pick a VM that is compute optimized.I had to look into the AWS instances but it seems the r6g.large you're using is just a 2 CPU 16GB machine. This is by far not sufficien...

  • 2 kudos
1 More Replies
bojian_tw
by New Contributor
  • 642 Views
  • 0 replies
  • 0 kudos

Delta Live Table pipeline hanging at INITIALIZING forever

I have a dlt pipeline haning at INIALIZING forever, it never stops. But I found the Analysis Exeption already happened at beginningpyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or functi...

Screenshot 2024-07-27 at 07.50.31.png
Data Engineering
Delta Live Table
dlt
  • 642 Views
  • 0 replies
  • 0 kudos
gweakliem
by New Contributor
  • 860 Views
  • 0 replies
  • 0 kudos

"No module named google.cloud.spark" errors querying BigQuery

Personal Cluster 15.3 ML, Running the following notebook:import pyspark.sql.functions as F from datetime import datetime, timedelta spark.sparkContext.addPyFile("gs://spark-lib/bigquery/spark-bigquery-support-0.26.0.zip") target_hour = datetime(202...

  • 860 Views
  • 0 replies
  • 0 kudos
YS1
by Contributor
  • 997 Views
  • 2 replies
  • 0 kudos

Delta Live Tables and Pivoting

Hello,I'm trying to create a DLT pipeline where I read data as a streaming dataset from a Kafka source, save it in a table, and then filter, transform, and pivot the data. However, I've encountered an issue: DLT doesn't support pivoting, and using fo...

Data Engineering
dlt
streaming
  • 997 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @YS1 ,As a workaround you can rewrite pivot to sql with case statements.Below Pivot:data = [ ("ProductA", "North", 100), ("ProductA", "South", 150), ("ProductA", "East", 200), ("ProductA", "West", 250), ("ProductB", "North", 30...

  • 0 kudos
1 More Replies
BenDataBricks
by New Contributor II
  • 2920 Views
  • 6 replies
  • 4 kudos

OAuth U2M Manual token generation failing

I am writing a frontend webpage that will log into DataBricks and allow the user to select datasets.I am new to front end development, so there may be some things I am missing here, but I know that the DataBricks SQL connector for javascript only wor...

  • 2920 Views
  • 6 replies
  • 4 kudos
Latest Reply
MaheshMandlik
New Contributor III
  • 4 kudos

@szymon_dybczak Thank you for your help. Your solution has worked very well for me.

  • 4 kudos
5 More Replies
j_al
by New Contributor II
  • 6669 Views
  • 10 replies
  • 5 kudos

Jobs API 2.1 OpenAPI specification seems broken.

Jobs API 2.1 OpenAPI specification seems broken.The swagger file seems to be invalid.https://docs.databricks.com/_extras/api-refs/jobs-2.1-aws.yaml

  • 6669 Views
  • 10 replies
  • 5 kudos
Latest Reply
JeffShutt_
New Contributor II
  • 5 kudos

@Debayan Mukherjee​ , are you suggesting to revert the openapi version specified in https://docs.databricks.com/_extras/api-refs/jobs-2.1-aws.yaml from 3.1.0 to 3.0.3?

  • 5 kudos
9 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels