Data Engineering

Forum Posts

Sorted by:

by Data_Engineer3 • Contributor III

03-04-2024 8:35:34 AM

2065 Views
1 replies
0 kudos

Identify the associated notenook for the application running from the spark UI

In spark UI, I can see the application running with the application ID, from this spark UI, could I able to see the which notebook is running with that applications is this possible?I am interested in learning more about the jobs, stage how it works ...

Data Engineering

Databricks

2065 Views
1 replies
0 kudos

03-04-2024 8:35:34 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-05-2024 6:49:14 AM

0 kudos

https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.SparkContext.setJobDescription.htmlspark.setJobDescription("my name") will make your life easier. Just put it in the notebook.You should also put it after each action (show, count, ...

0 kudos

03-05-2024 6:49:14 AM

by Govind3331 • Databricks Partner

03-04-2024 4:34:19 AM

2523 Views
1 replies
0 kudos

How to capture/Identify Incremental rows when No primary key columns in tables

Q1. My source is SQL server tables, I want to identify only latest records(incremental rows) and load those into BRNZE layer. Instead of full load to ADLS, we want to capture only incremental rows and load into ADLS for further processing. NOTE: Prob...

Data Engineering

2523 Views
1 replies
0 kudos

03-04-2024 4:34:19 AM

View Replies

Latest Reply

Slaw
New Contributor II

03-05-2024 6:34:55 AM

0 kudos

Hi, what kind of SQL source is it? MS SQL, MySQL, PostgreSQL?

0 kudos

03-05-2024 6:34:55 AM

by Etyr • Contributor II

03-04-2024 6:57:25 AM

3954 Views
2 replies
0 kudos

Can not change databricks-connect port

I have a Databricks cluster with 10.4 runtime, when I configure databricks-connect configure I put all the information needed and using the default port 15001, databricks-connect test works.But changing the port to 443 does not work, I tried to do a ...

Data Engineering

databricks-connect

port

pyspark

spark

3954 Views
2 replies
0 kudos

03-04-2024 6:57:25 AM

View Replies

Latest Reply

Etyr
Contributor II

03-05-2024 2:00:57 AM

0 kudos

@daniel_sahal Thank you for the reply, indeed port 443 is used by a lot of applucations and could be problematic. But I also tried port `15002` and it didn't work. No other ports than default one works

0 kudos

03-05-2024 2:00:57 AM

1 More Replies

by Olaoye_Somide • New Contributor III

03-04-2024 3:21:04 AM

3120 Views
1 replies
1 kudos

Avoiding Duplicate Ingestion with Autoloader and Migrated S3 Data

Hi Team,We recently migrated event files from our previous S3 bucket to a new one. While utilizing Autoloader for batch ingestion, we've encountered an issue where the migrated data is being processed as new events. This leads to duplicate records in...

Data Engineering

autoloader

RocksDB

3120 Views
1 replies
1 kudos

03-04-2024 3:21:04 AM

View Replies

Latest Reply

daniel_sahal
Databricks MVP

03-05-2024 2:00:56 AM

1 kudos

@Olaoye_Somide Changing the source means that Autoloader discovers the files as a new (technically - they are on a new location, so they are new indeed).To overcome the issue you can use modifiedAfter property

1 kudos

03-05-2024 2:00:56 AM

by joss • New Contributor II

03-04-2024 1:40:42 AM

1450 Views
1 replies
1 kudos

NPE on CreateJacksonParser and Databricks 14.3LTS with Spark StructuredStreaming

hello,I have a spark StructuredStreaming job : the source is a kafka topic in json.it work find with databricks 14.2, but when a change to 14.3LTS, I have a NPE in CreateJacksonParser:Caused by: NullPointerException: at org.apache.spark.sql.catalys...

Data Engineering

1450 Views
1 replies
1 kudos

03-04-2024 1:40:42 AM

View Replies

Latest Reply

joss
New Contributor II

03-05-2024 1:07:50 AM

1 kudos

Hi ,thank you for your quick reply,i found the problem : val newSchema = spark.read.json(df.select("data").as[String]).schemaif "data" have 1 value to null, in 14.2 it work, but with 14.3LTS this function return a NPEI don't know if it is a bug

1 kudos

03-05-2024 1:07:50 AM

by lawrence009 • Contributor

12-04-2022 2:32:19 PM

4806 Views
5 replies
1 kudos

Contact Support re Billing Error

How do I contact billing support? I am billed through AWS Marketplace and noticed last month the SQL Pro discount is not being reflected in my statement.

Data Engineering

4806 Views
5 replies
1 kudos

12-04-2022 2:32:19 PM

View Replies

Latest Reply

santiagortiiz
Databricks Partner

03-04-2024 3:34:19 PM

1 kudos

Hi, could anybody provide a contact email? I have sent emails to many contacts described in the support page here and in AWS, but no response from any channel. My problem is that databricks charged me by the resources used during a free trial, what i...

1 kudos

03-04-2024 3:34:19 PM

4 More Replies

by LukeD • New Contributor II

02-20-2024 11:42:23 PM

3283 Views
3 replies
1 kudos

Billing support contact

Hi,What is the best way to contact Databricks support? I see the differences between AWS billing and Databricks report and I'm looking for explanation of that. I've send 3 messages last week by this form https://www.databricks.com/company/contact but...

Data Engineering

3283 Views
3 replies
1 kudos

02-20-2024 11:42:23 PM

View Replies

Latest Reply

santiagortiiz
Databricks Partner

03-04-2024 3:21:25 PM

1 kudos

Hi, I'm facing the same issue with signing in my workspace, and I have a billing error, databricks charged me for a free trial, and I have sent a lot of emails, posted a topic in the community, I contacted people in AWS and they said that it must be ...

1 kudos

03-04-2024 3:21:25 PM

2 More Replies

by MCosta • New Contributor III

08-20-2021 10:23:46 AM

16969 Views
10 replies
19 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

Data Engineering

16969 Views
10 replies
19 kudos

08-20-2021 10:23:46 AM

View Replies

Latest Reply

petern
New Contributor II

03-04-2024 1:06:47 PM

19 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

19 kudos

03-04-2024 1:06:47 PM

9 More Replies

by DatBoi • Contributor

02-29-2024 1:13:21 PM

7046 Views
2 replies
2 kudos

Resolved! How big should a delta table be to benefit from liquid clustering?

My questions is pretty straightforward - how big should a delta table be to benefit from liquid clustering? I know the answer will most likely depend on the details of how you are querying the data, but what is the recommendation?I know Databricks re...

Data Engineering

7046 Views
2 replies
2 kudos

02-29-2024 1:13:21 PM

View Replies

Latest Reply

daniel_sahal
Databricks MVP

03-04-2024 12:51:48 AM

2 kudos

@DatBoi Once you watch this video you'll understand more about Liquid Clustering https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLakeLong story short:I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB ...

2 kudos

03-04-2024 12:51:48 AM

1 More Replies

by demost11 • New Contributor II

03-04-2024 6:49:08 AM

1081 Views
0 replies
0 kudos

Tracking DBMS CDC

We're using Databricks to incrementally extract data from SQL Server tables into S3. The data contains a timestamp column. We need a place to store the maximum retrieved timestamp per table so it can retrieved during the next run.Does Databricks cont...

Data Engineering

1081 Views
0 replies
0 kudos

03-04-2024 6:49:08 AM

by Jagan_etl • New Contributor II

02-27-2024 7:32:24 AM

2459 Views
3 replies
0 kudos

Avro file format generation

Hi All,We are using cluster with 9.1 run time version, I'm getting "incompatible schema exception" error while writing the data into avro file. Fields in Avro schema are more compared to dataframe output Fields. I tried the same in community edition ...

Data Engineering

2459 Views
3 replies
0 kudos

02-27-2024 7:32:24 AM

View Replies

Latest Reply

Jagan_etl
New Contributor II

03-04-2024 5:28:21 AM

0 kudos

Hi All,Any suggestions on this.

0 kudos

03-04-2024 5:28:21 AM

2 More Replies

by BhaveshPatel • New Contributor

02-28-2024 6:08:01 PM

2166 Views
1 replies
1 kudos

Auto loader

Suppose I have 1000's of historical .csv files stored from Jan, 2022 in a folder of my azure blob storage container. I want to use auto loader to read files beginning only on 1st, Oct, 2023 and ignoring all the files before this date to build a pipel...

Data Engineering

2166 Views
1 replies
1 kudos

02-28-2024 6:08:01 PM

View Replies

Latest Reply

daniel_sahal
Databricks MVP

03-04-2024 12:59:11 AM

1 kudos

@BhaveshPatel Three things that you can do:- Move the files to the separate folder,- Use a filter on metadata fields to filter out the unnecessary files,- Use a pathGlobFilter to filter only on the files you need

1 kudos

03-04-2024 12:59:11 AM

by Bharathi7 • Databricks Partner

02-23-2024 12:25:19 AM

2731 Views
3 replies
0 kudos

Python UDF fails with UNAVAILABLE: Channel shutdownNow invoked

I'm using a Python UDF to apply OCR to each row of a dataframe which contains the URL to a PDF document. This is how I define my UDF: def extract_text(url: str): ocr = MyOcr(url) extracted_text = ocr.get_text() return json.dumps(extracte...

Data Engineering

2731 Views
3 replies
0 kudos

02-23-2024 12:25:19 AM

View Replies

Latest Reply

daniel_sahal
Databricks MVP

02-23-2024 2:33:59 AM

0 kudos

@Bharathi7 It's really hard to determine what's going on without knowing what acutally MyOcr function does.Maybe there's some kind of timeout on the service side? To many parallell connections?

0 kudos

02-23-2024 2:33:59 AM

2 More Replies

by Poovarasan • Databricks Partner

03-01-2024 3:30:23 AM

2962 Views
1 replies
0 kudos

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Hi,Currently, I am using the below-mentioned query to create a materialized view. It was working fine until yesterday in the DLT pipeline, but from today on, the below-provided code throws an error (com.databricks.sql.transaction.tahoe.ColumnMappingE...

Data Engineering

ColumnMapping

dlt

2962 Views
1 replies
0 kudos

03-01-2024 3:30:23 AM

View Replies

by elgeo • Valued Contributor II

02-15-2023 5:56:42 AM

28765 Views
3 replies
2 kudos

Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

Data Engineering

28765 Views
3 replies
2 kudos

02-15-2023 5:56:42 AM

View Replies

Latest Reply

databricks31
Databricks Partner

03-03-2024 11:26:29 PM

2 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark ...

2 kudos

03-03-2024 11:26:29 PM

2 More Replies

Databricks Community

Forum Posts

Identify the associated notenook for the application running from the spark UI

How to capture/Identify Incremental rows when No primary key columns in tables

Can not change databricks-connect port

Avoiding Duplicate Ingestion with Autoloader and Migrated S3 Data

NPE on CreateJacksonParser and Databricks 14.3LTS with Spark StructuredStreaming

Contact Support re Billing Error

Billing support contact

Resolved! Debugging!

Resolved! How big should a delta table be to benefit from liquid clustering?

Tracking DBMS CDC

Avro file format generation

Auto loader

Python UDF fails with UNAVAILABLE: Channel shutdownNow invoked

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Data type length enforcement

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template