cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Data_Engineer3
by Contributor III
  • 2065 Views
  • 1 replies
  • 0 kudos

Identify the associated notenook for the application running from the spark UI

In spark UI, I can see the application running with the application ID, from this spark UI, could I able to see the which notebook is running with that applications is this possible?I am interested in learning more about the jobs, stage how it works ...

Data Engineering
Databricks
  • 2065 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.SparkContext.setJobDescription.htmlspark.setJobDescription("my name") will make your life easier. Just put it in the notebook.You should also put it after each action (show, count, ...

  • 0 kudos
Govind3331
by Databricks Partner
  • 2523 Views
  • 1 replies
  • 0 kudos

How to capture/Identify Incremental rows when No primary key columns in tables

Q1. My source is SQL server tables, I want to identify only latest records(incremental rows) and load those into BRNZE layer. Instead of full load to ADLS, we want to capture only incremental rows and load into ADLS for further processing. NOTE: Prob...

  • 2523 Views
  • 1 replies
  • 0 kudos
Latest Reply
Slaw
New Contributor II
  • 0 kudos

Hi, what kind of SQL source is it? MS SQL, MySQL, PostgreSQL?

  • 0 kudos
Etyr
by Contributor II
  • 3954 Views
  • 2 replies
  • 0 kudos

Can not change databricks-connect port

I have a Databricks cluster with 10.4 runtime, when I configure databricks-connect configure I put all the information needed and using the default port 15001, databricks-connect test works.But changing the port to 443 does not work, I tried to do a ...

Etyr_0-1709564343385.png
Data Engineering
databricks-connect
port
pyspark
spark
  • 3954 Views
  • 2 replies
  • 0 kudos
Latest Reply
Etyr
Contributor II
  • 0 kudos

@daniel_sahal Thank you for the reply, indeed port 443 is used by a lot of applucations and could be problematic. But I also tried port `15002` and it didn't work. No other ports than default one works

  • 0 kudos
1 More Replies
Olaoye_Somide
by New Contributor III
  • 3120 Views
  • 1 replies
  • 1 kudos

Avoiding Duplicate Ingestion with Autoloader and Migrated S3 Data

Hi Team,We recently migrated event files from our previous S3 bucket to a new one. While utilizing Autoloader for batch ingestion, we've encountered an issue where the migrated data is being processed as new events. This leads to duplicate records in...

Data Engineering
autoloader
RocksDB
S3
  • 3120 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 1 kudos

@Olaoye_Somide Changing the source means that Autoloader discovers the files as a new (technically - they are on a new location, so they are new indeed).To overcome the issue you can use modifiedAfter property

  • 1 kudos
joss
by New Contributor II
  • 1450 Views
  • 1 replies
  • 1 kudos

NPE on CreateJacksonParser and Databricks 14.3LTS with Spark StructuredStreaming

hello,I have a spark StructuredStreaming job : the source is a kafka topic in json.it work find with databricks 14.2, but when a change to 14.3LTS, I have a NPE in CreateJacksonParser:Caused by: NullPointerException: at org.apache.spark.sql.catalys...

  • 1450 Views
  • 1 replies
  • 1 kudos
Latest Reply
joss
New Contributor II
  • 1 kudos

Hi ,thank you for your quick reply,i found the problem :  val newSchema = spark.read.json(df.select("data").as[String]).schemaif "data" have 1 value to null, in 14.2  it work, but with 14.3LTS this function return a NPEI don't know if it is a bug

  • 1 kudos
lawrence009
by Contributor
  • 4806 Views
  • 5 replies
  • 1 kudos

Contact Support re Billing Error

How do I contact billing support? I am billed through AWS Marketplace and noticed last month the SQL Pro discount is not being reflected in my statement.

  • 4806 Views
  • 5 replies
  • 1 kudos
Latest Reply
santiagortiiz
Databricks Partner
  • 1 kudos

Hi, could anybody provide a contact email? I have sent emails to many contacts described in the support page here and in AWS, but no response from any channel. My problem is that databricks charged me by the resources used during a free trial, what i...

  • 1 kudos
4 More Replies
LukeD
by New Contributor II
  • 3283 Views
  • 3 replies
  • 1 kudos

Billing support contact

Hi,What is the best way to contact Databricks support? I see the differences between AWS billing and Databricks report and I'm looking for explanation of that. I've send 3 messages last week by this form https://www.databricks.com/company/contact but...

  • 3283 Views
  • 3 replies
  • 1 kudos
Latest Reply
santiagortiiz
Databricks Partner
  • 1 kudos

Hi, I'm facing the same issue with signing in my workspace, and I have a billing error, databricks charged me for a free trial, and I have sent a lot of emails, posted a topic in the community, I contacted people in AWS and they said that it must be ...

  • 1 kudos
2 More Replies
MCosta
by New Contributor III
  • 16969 Views
  • 10 replies
  • 19 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

  • 16969 Views
  • 10 replies
  • 19 kudos
Latest Reply
petern
New Contributor II
  • 19 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

  • 19 kudos
9 More Replies
DatBoi
by Contributor
  • 7046 Views
  • 2 replies
  • 2 kudos

Resolved! How big should a delta table be to benefit from liquid clustering?

My questions is pretty straightforward - how big should a delta table be to benefit from liquid clustering? I know the answer will most likely depend on the details of how you are querying the data, but what is the recommendation?I know Databricks re...

  • 7046 Views
  • 2 replies
  • 2 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 2 kudos

@DatBoi Once you watch this video you'll understand more about Liquid Clustering https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLakeLong story short:I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB ...

  • 2 kudos
1 More Replies
demost11
by New Contributor II
  • 1081 Views
  • 0 replies
  • 0 kudos

Tracking DBMS CDC

We're using Databricks to incrementally extract data from SQL Server tables into S3. The data contains a timestamp column. We need a place to store the maximum retrieved timestamp per table so it can retrieved during the next run.Does Databricks cont...

  • 1081 Views
  • 0 replies
  • 0 kudos
Jagan_etl
by New Contributor II
  • 2459 Views
  • 3 replies
  • 0 kudos

Avro file format generation

Hi All,We are using cluster with 9.1 run time version, I'm getting "incompatible schema exception" error while writing the data into avro file. Fields in Avro schema are more compared to dataframe output Fields. I tried the same in community edition ...

  • 2459 Views
  • 3 replies
  • 0 kudos
Latest Reply
Jagan_etl
New Contributor II
  • 0 kudos

Hi All,Any suggestions on this.

  • 0 kudos
2 More Replies
BhaveshPatel
by New Contributor
  • 2166 Views
  • 1 replies
  • 1 kudos

Auto loader

Suppose I have 1000's of historical .csv files stored from Jan, 2022 in a folder of my azure blob storage container. I want to use auto loader to read files beginning only on 1st, Oct, 2023 and ignoring all the files before this date to build a pipel...

  • 2166 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 1 kudos

@BhaveshPatel Three things that you can do:- Move the files to the separate folder,- Use a filter on metadata fields to filter out the unnecessary files,- Use a pathGlobFilter to filter only on the files you need

  • 1 kudos
Bharathi7
by Databricks Partner
  • 2731 Views
  • 3 replies
  • 0 kudos

Python UDF fails with UNAVAILABLE: Channel shutdownNow invoked

I'm using a Python UDF to apply OCR to each row of a dataframe which contains the URL to a PDF document. This is how I define my UDF:  def extract_text(url: str): ocr = MyOcr(url) extracted_text = ocr.get_text() return json.dumps(extracte...

  • 2731 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 0 kudos

@Bharathi7 It's really hard to determine what's going on without knowing what acutally MyOcr function does.Maybe there's some kind of timeout on the service side? To many parallell connections?

  • 0 kudos
2 More Replies
Poovarasan
by Databricks Partner
  • 2962 Views
  • 1 replies
  • 0 kudos

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Hi,Currently, I am using the below-mentioned query to create a materialized view. It was working fine until yesterday in the DLT pipeline, but from today on, the below-provided code throws an error (com.databricks.sql.transaction.tahoe.ColumnMappingE...

Data Engineering
ColumnMapping
dlt
  • 2962 Views
  • 1 replies
  • 0 kudos
elgeo
by Valued Contributor II
  • 28765 Views
  • 3 replies
  • 2 kudos

Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

  • 28765 Views
  • 3 replies
  • 2 kudos
Latest Reply
databricks31
Databricks Partner
  • 2 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark            ...

  • 2 kudos
2 More Replies
Labels