cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Takao
by New Contributor II
  • 876 Views
  • 2 replies
  • 2 kudos

Resolved! How to run OPTIMIZE to too big data set which has 11TB and more ?

Sorry for my very poor English and low Databricks Skill.At work, my boss asked me to perform liquid clustering on four columns for a Delta Lake table with an 11TB capacity and over 80 columns, and I was estimating the resources and costs required to ...

  • 876 Views
  • 2 replies
  • 2 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 2 kudos

Couple of things:OPTIMIZE is a very compute intensive operation. Make sure you pick a VM that is compute optimized.I had to look into the AWS instances but it seems the r6g.large you're using is just a 2 CPU 16GB machine. This is by far not sufficien...

  • 2 kudos
1 More Replies
bojian_tw
by New Contributor
  • 608 Views
  • 0 replies
  • 0 kudos

Delta Live Table pipeline hanging at INITIALIZING forever

I have a dlt pipeline haning at INIALIZING forever, it never stops. But I found the Analysis Exeption already happened at beginningpyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or functi...

Screenshot 2024-07-27 at 07.50.31.png
Data Engineering
Delta Live Table
dlt
  • 608 Views
  • 0 replies
  • 0 kudos
gweakliem
by New Contributor
  • 827 Views
  • 0 replies
  • 0 kudos

"No module named google.cloud.spark" errors querying BigQuery

Personal Cluster 15.3 ML, Running the following notebook:import pyspark.sql.functions as F from datetime import datetime, timedelta spark.sparkContext.addPyFile("gs://spark-lib/bigquery/spark-bigquery-support-0.26.0.zip") target_hour = datetime(202...

  • 827 Views
  • 0 replies
  • 0 kudos
YS1
by Contributor
  • 923 Views
  • 2 replies
  • 0 kudos

Delta Live Tables and Pivoting

Hello,I'm trying to create a DLT pipeline where I read data as a streaming dataset from a Kafka source, save it in a table, and then filter, transform, and pivot the data. However, I've encountered an issue: DLT doesn't support pivoting, and using fo...

Data Engineering
dlt
streaming
  • 923 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @YS1 ,As a workaround you can rewrite pivot to sql with case statements.Below Pivot:data = [ ("ProductA", "North", 100), ("ProductA", "South", 150), ("ProductA", "East", 200), ("ProductA", "West", 250), ("ProductB", "North", 30...

  • 0 kudos
1 More Replies
BenDataBricks
by New Contributor II
  • 2797 Views
  • 6 replies
  • 4 kudos

OAuth U2M Manual token generation failing

I am writing a frontend webpage that will log into DataBricks and allow the user to select datasets.I am new to front end development, so there may be some things I am missing here, but I know that the DataBricks SQL connector for javascript only wor...

  • 2797 Views
  • 6 replies
  • 4 kudos
Latest Reply
MaheshMandlik
New Contributor III
  • 4 kudos

@szymon_dybczak Thank you for your help. Your solution has worked very well for me.

  • 4 kudos
5 More Replies
j_al
by New Contributor II
  • 6589 Views
  • 10 replies
  • 5 kudos

Jobs API 2.1 OpenAPI specification seems broken.

Jobs API 2.1 OpenAPI specification seems broken.The swagger file seems to be invalid.https://docs.databricks.com/_extras/api-refs/jobs-2.1-aws.yaml

  • 6589 Views
  • 10 replies
  • 5 kudos
Latest Reply
JeffShutt_
New Contributor II
  • 5 kudos

@Debayan Mukherjee​ , are you suggesting to revert the openapi version specified in https://docs.databricks.com/_extras/api-refs/jobs-2.1-aws.yaml from 3.1.0 to 3.0.3?

  • 5 kudos
9 More Replies
RishabhGarg
by New Contributor II
  • 721 Views
  • 2 replies
  • 2 kudos

Keywords and Functions supported in SQL but not in Databricks SQL.

Actually, I have around 2000 SQL queries. I have to convert them in Databricks supported SQLs, so that I can run them in databricks environment. So I want to know the list of all keywords, functions or anything that is different in databricks SQL. Pl...

  • 721 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @RishabhGarg ,You're saying SQL, but which dialect? Because every provider has its own extension to ANSI SQL standard. So for example, if you're using SQL Server for example, there is a TOP keyword to limit the rows.

  • 2 kudos
1 More Replies
RobsonNLPT
by Contributor II
  • 1458 Views
  • 6 replies
  • 3 kudos

Resolved! Databricks Variant Data Type

HiI've tried to enabled a table to test the new variant data type (public preview)I used the alter cmd: ALTER TABLE tablexxxx SET TBLPROPERTIES('delta.feature.variantType-preview' = 'supported')and I have the error[DELTA_UNSUPPORTED_FEATURES_IN_CONFI...

  • 1458 Views
  • 6 replies
  • 3 kudos
Latest Reply
RobsonNLPT
Contributor II
  • 3 kudos

Yes but I tried using 2 cluster typesSQL Serverless and New Compute Serverless.Error in both

  • 3 kudos
5 More Replies
Patricckk
by New Contributor II
  • 1520 Views
  • 3 replies
  • 1 kudos

Attributed-Based Access Control

Hi,Over here they are explaining attribute-based-access-controls, which I want to implement in my project but can't find the documentation or the option to create rules myself. Is this feature already available?https://www.databricks.com/dataaisummit...

  • 1520 Views
  • 3 replies
  • 1 kudos
Latest Reply
mhiltner
Databricks Employee
  • 1 kudos

Expected for Q3 in preview mode

  • 1 kudos
2 More Replies
KamilK
by New Contributor II
  • 392 Views
  • 0 replies
  • 1 kudos

Include SPARK-46990 in databricks 15.4 LTS

Hi, could you include fix SPARK-46990 ([SPARK-46990] Regression: Unable to load empty avro files emitted by event-hubs - ASF JIRA (apache.org)) in Databricks 15.4? (15.4 is in the beta stage, so it might be a right time to include fix)

  • 392 Views
  • 0 replies
  • 1 kudos
Kayla
by Valued Contributor II
  • 753 Views
  • 1 replies
  • 0 kudos

Resolved! Compute Policy Does Not Install Libraries

Has anyone else run into the issue where applying libraries through a compute policy just completely does not work?I'm trying to insane some pretty basic Python libraries from PyPI (pytest and paramiko, for example) and it is failing on 13.3 and 14.3...

  • 753 Views
  • 1 replies
  • 0 kudos
rk1994
by New Contributor
  • 1137 Views
  • 2 replies
  • 0 kudos

Incrementally ingesting from a static db into a Delta Table

Hello everyone,I’m very new to Delta Live Tables (and Delta Tables too), so please forgive me if this question has been asked here before.Some context: I have over 100M records stored in a Postgres table. I can connect to this table using the convent...

  • 1137 Views
  • 2 replies
  • 0 kudos
Latest Reply
rdmeyersf
New Contributor II
  • 0 kudos

If I'm reading this right you created a materialized view to prep your data in Postgres. You may not need to do that, and it will also limit your integration options. It puts more work on Postgres, usually creates more data to move, and will not as m...

  • 0 kudos
1 More Replies
Gareema
by New Contributor III
  • 1379 Views
  • 3 replies
  • 1 kudos

Not able to unzip the zip file with mount and unity catalog

Hello Team, I have a zip file in ADLS Gen 2. The folder I am using is mounted and when I run command : dbutils.fs.ls(path) it lists all the files(including the zip require). However, when I try to read the zip using 'zipfile' module, it displays 'Fil...

Gareema_0-1721743802964.png
  • 1379 Views
  • 3 replies
  • 1 kudos
Latest Reply
Witold
Honored Contributor
  • 1 kudos

@Gareema, since you're using UC, can you use Volumes instead? It basically replaces the old mount approach.

  • 1 kudos
2 More Replies
Venkat369
by New Contributor II
  • 994 Views
  • 4 replies
  • 2 kudos

How to send variables from control-m to Data bricks Jobs

I wanted to send variables from control-m software which is used to call a data bricks job. The data bricks job is designed to call a notebook. The notebook should use the attributes which are sent by control-m. Can someone help me in this scenario o...

  • 994 Views
  • 4 replies
  • 2 kudos
Latest Reply
Witold
Honored Contributor
  • 2 kudos

@Venkat369 What is wrong with the link I provided? It actually shows you how to do it. If not, please be more precise.

  • 2 kudos
3 More Replies
guangyi
by Contributor III
  • 1040 Views
  • 1 replies
  • 0 kudos

Resolved! Is there a way to let the DLT pipeline retry by itself?

I know I can make the workflow job retry automatically by adding following properties in the YAML file: max_retries or min_retry_interval_millis.However, I cannot find similar attributes in any DLT pipeline document. When I ask copilot it gives this ...

Screenshot 2024-07-26 at 14.03.50.png
  • 1040 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @guangyi ,In DLT you have following two properties that you can set:pipelines.maxFlowRetryAttemptsType: intThe maximum number of attempts to retry a flow before failing a pipeline update when a retryable failure occurs.The default value is two. By...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels