cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jwilliam
by Contributor
  • 1102 Views
  • 3 replies
  • 1 kudos

Resolved! [BUG] Databricks install WHL as JAR in Python Wheel Task?

I'm using Python Wheel Task in Databricks job with WHEEL dependencies. However, the cluster installed the dependencies as JAR instead of WHEEL. Is this an expected behavior or a bug?

  • 1102 Views
  • 3 replies
  • 1 kudos
Latest Reply
AndréSalvati
New Contributor III
  • 1 kudos

There you can see a complete template project with a python wheel task and Databricks Asset Bundles. Please, follow the instructions for deployment.https://github.com/andre-salvati/databricks-template

  • 1 kudos
2 More Replies
GGG_P
by New Contributor III
  • 2263 Views
  • 3 replies
  • 0 kudos

Databricks Tasks Python wheel : How access to JobID & runID ?

I'm using Python (as Python wheel application) on Databricks.I deploy & run my jobs using dbx.I defined some Databricks Workflow using Python wheel tasks.Everything is working fine, but I'm having issue to extract "databricks_job_id" & "databricks_ru...

  • 2263 Views
  • 3 replies
  • 0 kudos
Latest Reply
AndréSalvati
New Contributor III
  • 0 kudos

There you can see a complete template project with Databricks Asset Bundles and python wheel task. Please, follow the instructions for deployment.https://github.com/andre-salvati/databricks-templateIn particular, take a look at the workflow definitio...

  • 0 kudos
2 More Replies
ac0
by New Contributor III
  • 495 Views
  • 2 replies
  • 0 kudos

"Fatal error: The Python kernel is unresponsive." DBR 14.3

Running almost any notebook with a merge statement in Databricks with DBR 14.3 I get the following error and the notebook exists:"Fatal error: The Python kernel is unresponsive."I would provide more code, but like I said, it is pretty much anything w...

  • 495 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Honored Contributor
  • 0 kudos

Hi @ac0 , Hope you are doing well!  The "Fatal error: The Python kernel is unresponsive" error generally means the Jupyter Kernel is still running but is not responsive within a specific time frame. Please try to increase the timeout by specifying co...

  • 0 kudos
1 More Replies
Oliver_Angelil
by Valued Contributor II
  • 1558 Views
  • 2 replies
  • 2 kudos

Resolved! Cell by cell execution of notebooks with VS code

I have the Databricks VS code extension setup to develop and run jobs remotely. (with Databricks Connect).I enjoy working on notebooks within the native Databricks workspace, especially for exploratory work because I can execute blocks of code step b...

  • 1558 Views
  • 2 replies
  • 2 kudos
Latest Reply
awadhesh14
New Contributor II
  • 2 kudos

Hi Folks,Is there a version upgrade for the resolution to this?

  • 2 kudos
1 More Replies
Abishek_317176
by New Contributor
  • 323 Views
  • 2 replies
  • 1 kudos

Can i get 2 different type of Source file (csv and Json) in a single build

Need to understand this i got 2 sources which currently writes to the Single S3 bucket in csv and json format daily is it possible to ingest both the file types in a single build to ingest in Databricks, file size <300 mb ?or Alternately i should cre...

  • 323 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 1 kudos
1 More Replies
joss
by New Contributor II
  • 334 Views
  • 3 replies
  • 1 kudos

NPE on CreateJacksonParser and Databricks 14.3LTS with Spark StructuredStreaming

hello,I have a spark StructuredStreaming job : the source is a kafka topic in json.it work find with databricks 14.2, but when a change to 14.3LTS, I have a NPE in CreateJacksonParser:Caused by: NullPointerException: at org.apache.spark.sql.catalys...

  • 334 Views
  • 3 replies
  • 1 kudos
Latest Reply
joss
New Contributor II
  • 1 kudos

Hi ,thank you for your quick reply,i found the problem :  val newSchema = spark.read.json(df.select("data").as[String]).schemaif "data" have 1 value to null, in 14.2  it work, but with 14.3LTS this function return a NPEI don't know if it is a bug

  • 1 kudos
2 More Replies
Olaoye_Somide
by New Contributor
  • 464 Views
  • 2 replies
  • 1 kudos

Avoiding Duplicate Ingestion with Autoloader and Migrated S3 Data

Hi Team,We recently migrated event files from our previous S3 bucket to a new one. While utilizing Autoloader for batch ingestion, we've encountered an issue where the migrated data is being processed as new events. This leads to duplicate records in...

Data Engineering
autoloader
RocksDB
S3
  • 464 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 1 kudos
1 More Replies
Govind3331
by New Contributor
  • 492 Views
  • 2 replies
  • 0 kudos

How to capture/Identify Incremental rows when No primary key columns in tables

Q1. My source is SQL server tables, I want to identify only latest records(incremental rows) and load those into BRNZE layer. Instead of full load to ADLS, we want to capture only incremental rows and load into ADLS for further processing. NOTE: Prob...

  • 492 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Govind3331, Handling incremental data without a primary key or date column can be challenging, but there are some strategies you can consider.    Row Number Approach: You can use the ROW_NUMBER() function to assign a unique number to each row bas...

  • 0 kudos
1 More Replies
demost11
by New Contributor II
  • 311 Views
  • 1 replies
  • 0 kudos

Resolved! Tracking DBMS CDC

We're using Databricks to incrementally extract data from SQL Server tables into S3. The data contains a timestamp column. We need a place to store the maximum retrieved timestamp per table so it can retrieved during the next run.Does Databricks cont...

  • 311 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @demost11, Databricks provides several options for managing state or storing metadata.  Databricks Secrets:Databricks allows you to create and manage secrets, which are key-value pairs storing secret material (such as API keys, passwords, etc.). E...

  • 0 kudos
Data_Engineer3
by Contributor II
  • 476 Views
  • 2 replies
  • 1 kudos

Identify the associated notenook for the application running from the spark UI

In spark UI, I can see the application running with the application ID, from this spark UI, could I able to see the which notebook is running with that applications is this possible?I am interested in learning more about the jobs, stage how it works ...

Data Engineering
Databricks
  • 476 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 1 kudos
1 More Replies
srinivas_001
by New Contributor III
  • 582 Views
  • 3 replies
  • 2 kudos

Autoloader configuration with data type casting

Hi1: I am reading a parquet file from AWS s3 storage using spark.read.parquet(<s3 path>) 2: An autoloader job has been configured to load this data into a external delta table.3: But before loading into this autoloader I need to do some typecasting o...

  • 582 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @srinivas_001 , When working with Parquet files in Spark, you can read the file, perform typecasting on the columns, and then write the transformed data directly to your external Delta table.    Let’s break down the steps:   Read the Parquet file:...

  • 2 kudos
2 More Replies
ahab
by New Contributor
  • 443 Views
  • 1 replies
  • 0 kudos

Error deploying DLT DAB: Validation failed for cluster_type, the value must be dlt (is "job")

Hello.I'm getting a cluster validation error while trying to deploy a DLT pipeline via DAB.See attached screenshots for config and error.Hoping someone has run into this before and can guide me.Thanks.

  • 443 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @ahab, Deploying a Delta Live Tables (DLT) pipeline via Databricks can sometimes be tricky, especially when dealing with cluster validation errors. Let’s troubleshoot this together. Cluster Validation Error:When encountering a cluster validation e...

  • 0 kudos
Yulei
by New Contributor III
  • 2960 Views
  • 4 replies
  • 1 kudos

Resolved! Could not reach driver of cluster

 Hi, Rencently, I am seeing issue Could not reach driver of cluster <some_id> with my structure streaming job when migrating to unity catalog and found this when checking the traceback:Traceback (most recent call last):File "/databricks/python_shell/...

  • 2960 Views
  • 4 replies
  • 1 kudos
Latest Reply
Yulei
New Contributor III
  • 1 kudos

@Latonya86Dodson , Thank you for the reply. I have done a test, and it seems that double the memory of the driver cluster and change to use a instance with bigger memory works for this issue. However I do question why is this happen after I swap to p...

  • 1 kudos
3 More Replies
DylanStout
by New Contributor III
  • 2163 Views
  • 9 replies
  • 2 kudos

Resolved! Problem with tables not showing

When I use the current "result table" option it does not show the table results. This occurs when running SQL commands and the display() function for DataFrames.It is not linked to a Databricks runtime, since it occurs on all runtimes. I am not allow...

  • 2163 Views
  • 9 replies
  • 2 kudos
Latest Reply
DylanStout
New Contributor III
  • 2 kudos

Resizing the table causes the table to show its records in the cell 

  • 2 kudos
8 More Replies
raghu2
by New Contributor III
  • 444 Views
  • 2 replies
  • 0 kudos

Resolved! Error deploying a DAB

I followed steps listed in this article.After creating and validation of bundle with default template, during deployment using this command:databricks bundle deploy -t dev --profile zzI get this message:Building mySecPrj...Error: build failed mySecPr...

  • 444 Views
  • 2 replies
  • 0 kudos
Latest Reply
raghu2
New Contributor III
  • 0 kudos

Thanks @daniel_sahal. That worked!!

  • 0 kudos
1 More Replies
Labels
Top Kudoed Authors