cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

htu
by Visitor
  • 17 Views
  • 1 replies
  • 0 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

  • 17 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @htu, When you install Databricks Connect, it modifies the behaviour of PySpark in a way that prevents it from working with the local master node. This can be frustrating, especially when you’re trying to run unit tests for Spark-related code w...

  • 0 kudos
Kayl669
by New Contributor III
  • 154 Views
  • 5 replies
  • 0 kudos

SQL code against tables with '>' in headers suddenly failing?

Just want to post this issue we're experiencing here in case other people are facing something similar. Below is the wording of the support ticket request I've raised:SQL code that has been working is suddenly failing due to syntax errors today. Ther...

  • 154 Views
  • 5 replies
  • 0 kudos
Latest Reply
Kayl669
New Contributor III
  • 0 kudos

The point that we've got to with this is that MS Support / Databricks have acknowledged that they did something and are working on a fix. "The issue occurred due to the regression in the recent DBR maintenance release...Our engineering team is workin...

  • 0 kudos
4 More Replies
erigaud
by Honored Contributor
  • 73 Views
  • 2 replies
  • 1 kudos

Pass Dataframe to child job in "Run Job" task

Hello,I have a Job A that runs a Job B, and Job A defines a globalTempView and I would like to somehow access it in the child job. Is that in anyway possible ? Can the same cluster be used for both jobs ? If it is not possible, does someone know of a...

  • 73 Views
  • 2 replies
  • 1 kudos
Latest Reply
erigaud
Honored Contributor
  • 1 kudos

Hello  @Kaniz ,thank you for the very detailed answer. If I understand correctly there is no way to do this using temp views and using a Job Cluster ? I need in the case to use the same All-purpose for all my tasks in order to remain in the same spar...

  • 1 kudos
1 More Replies
Mathias_Peters
by New Contributor II
  • 88 Views
  • 1 replies
  • 0 kudos

On the fly transformations on DLT tables

Hi, I am loading data from a kinesis data stream using DLT. CREATE STREAMING TABLE Consumers_kinesis_2 ( ..., unbase64(data) String, ... ) AS SELECT * FROM STREAM read_kinesis (...) Is it possible to directly cast, unbase64, and/or transform the resu...

  • 88 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Mathias_Peters, When working with Amazon Kinesis Data Analytics, you can indeed transform data before writing it into a streaming table. Let’s explore some options: Unbase64 Transformation: To decode Base64-encoded data, you can use the unba...

  • 0 kudos
stevenayers-bge
by New Contributor II
  • 69 Views
  • 1 replies
  • 0 kudos

Autoloader: Read old version of file. Read modification time is X, latest modification time is X

I'm recieving this error from autoloader. It seems to be stuck on this one file. I don't care when it was read and last modified, I just want to ingest it. Any ideas?java.io.IOException: Read old version of file s3a://<file-path>.json. Read modificat...

  • 69 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @stevenayers-bge, The error message indicates that the file you’re trying to read is an old version, and there’s a discrepancy between the read modification time and the latest modification time. Let’s explore some potential solutions based on ...

  • 0 kudos
jainshasha
by New Contributor
  • 54 Views
  • 1 replies
  • 0 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

  • 54 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @jainshasha, Running multiple workflows in parallel with their own job clusters in Databricks can be achieved by following the right configuration. Let’s explore some options: Shared Job Clusters: To optimize resource usage with jobs that orch...

  • 0 kudos
LeoGaller
by New Contributor
  • 89 Views
  • 1 replies
  • 0 kudos

What are the options for "spark_conf.spark.databricks.cluster.profile"?

Hey guys, I'm trying to find what are the options we can pass to spark_conf.spark.databricks.cluster.profileI know looking around that some of the available configs are singleNode and serverless, but there are others?Where is the documentation of it?...

  • 89 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @LeoGaller , The spark_conf.spark.databricks.cluster.profile configuration in Databricks allows you to specify the profile for a cluster.   Let’s explore the available options and where you can find the documentation. Available Profiles: Sing...

  • 0 kudos
stevenayers-bge
by New Contributor II
  • 79 Views
  • 2 replies
  • 1 kudos

Bug: Shallow Clone `create or replace` causing [TABLE_OR_VIEW_NOT_FOUND]

I am having an issue where when I do a shallow clone using :create or replace table `catalog_a_test`.`schema_a`.`table_a` shallow clone `catalog_a`.`schema_a`.`table_a` I get:[TABLE_OR_VIEW_NOT_FOUND] The table or view catalog_a_test.schema_a.table_a...

  • 79 Views
  • 2 replies
  • 1 kudos
Latest Reply
Omar_hamdan
Community Manager
  • 1 kudos

Hi StevenThis is really a strange issue. First let's exclude some possible causes for this. We need to check the following:- The permission to table A and Catalog B. take a look at the following link to check what permission is needed: https://docs.d...

  • 1 kudos
1 More Replies
Red1
by New Contributor III
  • 918 Views
  • 8 replies
  • 2 kudos

Autoingest not working with Unity Catalog in DLT pipeline

Hey Everyone,I've built a very simple pipeline with a single DLT using auto ingest, and it works, provided I don't specify the output location. When I build the same pipeline but set UC as the output location, it fails when setting up S3 notification...

  • 918 Views
  • 8 replies
  • 2 kudos
Latest Reply
Red1
New Contributor III
  • 2 kudos

Hey @Babu_Krishnan I was! I had to reach out to my Databricks support engineer directly and the resolution was to add "cloudfiles.awsAccessKey" and "cloudfiles.awsSecretKey" to the params as in the screenshot below (apologies, i don't know why the sc...

  • 2 kudos
7 More Replies
Mado
by Valued Contributor II
  • 7643 Views
  • 4 replies
  • 2 kudos

Resolved! Using "Select Expr" and "Stack" to Unpivot PySpark DataFrame doesn't produce expected results

I am trying to unpivot a PySpark DataFrame, but I don't get the correct results.Sample dataset:# Prepare Data data = [("Spain", 101, 201, 301), \ ("Taiwan", 102, 202, 302), \ ("Italy", 103, 203, 303), \ ("China", 104, 204, 304...

image image
  • 7643 Views
  • 4 replies
  • 2 kudos
Latest Reply
lukeoz
Visitor
  • 2 kudos

You can also use backticks around the column names that would otherwise be recognised as numbers.from pyspark.sql import functions as F   unpivotExpr = "stack(3, '2018', `2018`, '2019', `2019`, '2020', `2020`) as (Year, CPI)" unPivotDF = df.select("C...

  • 2 kudos
3 More Replies
amitkmaurya
by Visitor
  • 58 Views
  • 1 replies
  • 0 kudos

How to increase executor memory in Databricks jobs

May be I am new to Databricks that's why I have confusion.Suppose I have worker memory of 64gb in Databricks job max 12 nodes...and my job is failing due to Executor Lost due to 137 (OOM if found on internet).So, to fix this I need to increase execut...

  • 58 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
New Contributor III
  • 0 kudos

Hello @amitkmaurya , Increasing compute resources may not always be the best strategy. To gain more insights into each executor's memory usage, check the cluster metrics tab and Spark UI for your cluster. If one executor has a much higher memory usag...

  • 0 kudos
Devsql
by Visitor
  • 59 Views
  • 1 replies
  • 0 kudos

How to find that given Parquet file got imported into Bronze Layer ?

Hi Team,Recently we had created new Databricks project/solution (based on Medallion architecture) having Bronze-Silver-Gold Layer based tables. So we have created Delta-Live-Table based pipeline for Bronze-Layer implementation. Source files are Parqu...

Data Engineering
Azure Databricks
Bronze Job
Delta Live Table
Delta Live Table Pipeline
  • 59 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
New Contributor III
  • 0 kudos

Hello @Devsql , It appears that you are creating DLT bronze tables using a standard spark.read operation. This may explain why the DLT table doesn't include "new files" during a REFRESH operation. For incremental ingestion of bronze layer data into y...

  • 0 kudos
6502
by New Contributor III
  • 57 Views
  • 1 replies
  • 0 kudos

Delete on streaming table and starting startingVersion

I deleted for mistake some records from a streaming table, and of course, the streaming job stopped working. So I restored the table at the version before the delete was done, and attempted to restart the job using the startingVersion to the new vers...

  • 57 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
New Contributor III
  • 0 kudos

Hello @6502, It appears you've used the `startingVersion` parameter in your streaming query, which causes the stream to begin processing data from the version prior to the DELETE operation version. However, the DELETE operation will still be processe...

  • 0 kudos
Erik_L
by Contributor II
  • 54 Views
  • 0 replies
  • 0 kudos

BUG: Unity Catalog kills UDF

We have UDFs in a few locations and today we noticed they died in performance. This seems to be caused by Unity Catalog.Test environment 1:Databricks Runtime Environment: 14.3 / 15.1Compute: 1 master, 4 nodesPolicy: UnrestrictedAccess Mode: SharedTes...

  • 54 Views
  • 0 replies
  • 0 kudos
nilton
by Visitor
  • 80 Views
  • 2 replies
  • 0 kudos

Query table based on table_name from information_schema

Hi,I have one table that changes the name every 60 days. The name simple increases the number version, for example:* Firtst 60 days: table_name_v1. After 60 days: table_name_v2 and so on.What i want is to query the table wich name returned in the que...

  • 80 Views
  • 2 replies
  • 0 kudos
Latest Reply
radothede
Visitor
  • 0 kudos

The simpliest way would be propably using spark.sql%py tbl_name = 'table_v1' df = spark.sql(f'select * from {tbl_name}') display(df) From there, You can simply create temporary view:%py df.createOrReplaceTempView('table_act')and query it using SQL st...

  • 0 kudos
1 More Replies
Labels
Top Kudoed Authors