cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MarkusFra
by New Contributor III
  • 7334 Views
  • 2 replies
  • 2 kudos

Re-establish SparkSession using Databricks connect after cluster restart

Hello,when developing locally using Databricks connect how do I re-establish the SparkSession when the Cluster restarted? getOrCreate() seems to get the old invalid SparkSession even after Cluster restart instead of creating a new one or am I missing...

Data Engineering
databricks-connect
  • 7334 Views
  • 2 replies
  • 2 kudos
Latest Reply
Michael_Chein
New Contributor II
  • 2 kudos

If anyone encounters this problem, the solution that worked for me was to restart the Jupyter kernel. 

  • 2 kudos
1 More Replies
prabhu26
by New Contributor
  • 1462 Views
  • 1 replies
  • 0 kudos

Unable to enforce schema on data read from jsonl file in Azure Databricks using pyspark

I'm tring to build a ETL pipeline in which I'm reading the jsonl files from the azure blob storage, then trying to transform and load it to delta tables in databricks. I have created the below schema for loading my data :  schema = StructType([ S...

  • 1462 Views
  • 1 replies
  • 0 kudos
Latest Reply
DataEngineer
New Contributor II
  • 0 kudos

Try this.Add option("multiline","true")

  • 0 kudos
pshuk
by New Contributor III
  • 1514 Views
  • 2 replies
  • 1 kudos

upload file/table to delta table using CLI

Hi,I am using CLI to transfer local files to Databricks Volume. At the end of my upload, I want to create a meta table (storing file name, location, and some other information) and have it as a table on databricks Volume. I am not sure how to create ...

  • 1514 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 1 kudos

Hi @pshuk , Greetings!  We understand that you are looking for a CLI command to create a Table but at this moment Databricks doesn't support CLI command to create the table but you can use SQL Execution API -https://docs.databricks.com/api/workspace/...

  • 1 kudos
1 More Replies
JOFinancial
by New Contributor
  • 1478 Views
  • 1 replies
  • 0 kudos

No Data for External Table from Blob Storage

Hi All,I am trying to create an external table from a Azure Blob storage container.  I recieve no errors, but there is no data in the table.  The Blob Storage contains 4 csv files with the same columns and about 10k rows of data.  Am I missing someth...

  • 1478 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hkesharwani
Contributor II
  • 0 kudos

Hi, The code looks completely fine. please check if you have any other delimiter other than , .If your CSV files use a different delimiter, you can specify it in the table definition using the OPTIONS clause.Just to confirm I created a sample table a...

  • 0 kudos
dbal
by New Contributor III
  • 1865 Views
  • 2 replies
  • 0 kudos

withColumnRenamed does not work with databricks-connect 14.3.0

I am not able to run our unit tests suite due a possible bug in the databricks-connect library. The problem is with the Dataframe transformation withColumnRenamed. When I run it in a Databricks cluster (Databricks Runtime 14.3 LTS), the column is ren...

dbal_3-1715382511871.png dbal_4-1715382516217.png dbal_1-1715383269610.png
  • 1865 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@dbal - can you please try withColumnsRenamed() instead Reference: https://docs.databricks.com/en/release-notes/dbconnect/index.html#databricks-connect-1430-python

  • 0 kudos
1 More Replies
Dhruv-22
by New Contributor III
  • 731 Views
  • 0 replies
  • 0 kudos

NamedStruct fails in the 'IN' query

I've posted the same question on stackoverflow (link) as well. I will post any solution I get there.I was trying to understand using many columns in the IN query and came across this statement. SELECT (1, 2) IN (SELECT c1, c2 FROM VALUES(1, 2), (3, 4...

  • 731 Views
  • 0 replies
  • 0 kudos
Dhruv-22
by New Contributor III
  • 1416 Views
  • 1 replies
  • 0 kudos

Understanding least common type in databricks

I was reading the data type rules and found about least common type.I have a doubt. What is the least common type of STRING and INT? The referred link gives the following example saying the least common type is BIGINT.-- The least common type between...

  • 1416 Views
  • 1 replies
  • 0 kudos
210227
by New Contributor III
  • 2223 Views
  • 1 replies
  • 0 kudos

Resolved! External table from external location

Hi, I'm creating external table from existing external location and am a bit puzzled as to what permissions I need for it or what is the correct way of defining the S3 path with wildcards. This:create external table if not exists test_catalogue_dev.b...

  • 2223 Views
  • 1 replies
  • 0 kudos
Latest Reply
210227
New Contributor III
  • 0 kudos

Just for the reference, the wildcard is not needed in this case, just a misleading error message. In this case 's3://test-data/full/2023/01/' instead of 's3://test-data/full/2023/01/*/' was the correct PATH

  • 0 kudos
StephanKnox
by New Contributor III
  • 1738 Views
  • 0 replies
  • 0 kudos

Parametrized SQL - Pass column names as a parameter?

Hi all, Is there a way to pass a column name(not a value) in a parametrized Spark SQL query?I am trying to do it like so, however it does not work as I think column name get expanded like 'value' i.e. surrounded by single quotes: def count_nulls(df:D...

  • 1738 Views
  • 0 replies
  • 0 kudos
vvt1976
by New Contributor
  • 3821 Views
  • 1 replies
  • 0 kudos

Create table using a location

Hi,Databricks newbie here. I have copied delta files from my Synapse workspace into DBFS. To add them as a table, I executed.create table audit_payload using delta location '/dbfs/FileStore/data/general/audit_payload'The command executed properly. Ho...

Data Engineering
data engineering
  • 3821 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

can you read the delta lake files using spark.read.format("delta").load("path/to/delta/table")?If not, it is not a valid delta lake table, which is my guess as creating a table from delta lake is nothing more than a semantic wrapper around the actual...

  • 0 kudos
PankajMendi
by New Contributor
  • 1011 Views
  • 1 replies
  • 0 kudos

Error accessing Azure sql from Azure databricks using jdbc authentication=ActiveDirectoryInteractive

Getting below error while accessing Azure sql using jdbc from Azure databricks notebook,com.microsoft.sqlserver.jdbc.SQLServerException: Failed to authenticate the user p***** in Active Directory (Authentication=ActiveDirectoryInteractive). Unable to...

  • 1011 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

it seems you are trying to do MFA authentication using jdbc.The used driver might not support that. It could also be a OS issue (if you are not using Windows f.e.) or a browser issue (the browser will have to open a window/tab).Can you try to authent...

  • 0 kudos
anuintuceo
by New Contributor
  • 1261 Views
  • 1 replies
  • 0 kudos

unzip a password protected file using synapse notebook

I have a zipped file. It has 3 csv files.  It is password protected. When I tried extracting it manually it will extract only with 7zip. I moved my zipped file to ADLS automatically and want to extract it with the password. How to unzip the file and ...

  • 1261 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

 it is most probably possible.If you use python, the zipfile library can do it, something like this:with zipfile.ZipFile(zip_file_path, 'r') as zip_ref: zip_ref.extractall(path=extract_to, pwd=bytes(password,'utf-8')) In scala there is f....

  • 0 kudos
Sushmg
by New Contributor
  • 1428 Views
  • 1 replies
  • 0 kudos

Rest Api call

There is a requirement to create a pipeline that calls a rest Api and we have to store the data in datawarehouse which is the best was to do this operation?

  • 1428 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

there are several ways to do this.you could use python (or scala or ...) to do the call. then transform it and write it to the dwh.Or you could do the call, write the raw data and process it later on.Or you could use an ETL/ELT tool that can do the r...

  • 0 kudos
amitkmaurya
by Contributor
  • 3151 Views
  • 2 replies
  • 4 kudos

Resolved! How to increase executor memory in Databricks jobs

May be I am new to Databricks that's why I have confusion.Suppose I have worker memory of 64gb in Databricks job max 12 nodes...and my job is failing due to Executor Lost due to 137 (OOM if found on internet).So, to fix this I need to increase execut...

  • 3151 Views
  • 2 replies
  • 4 kudos
Latest Reply
amitkmaurya
Contributor
  • 4 kudos

Hi @raphaelblg ,I have solved this issue. Yes, in my case data skewness was the issue that was causing this executor OOM, so adding repartition just before writing resolved this skewness. I didn't change any workers or driver memory.Thanks for your h...

  • 4 kudos
1 More Replies
amitkmaurya
by Contributor
  • 4389 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks job keep getting failed due to executor lost.

Getting following error while saving a dataframe partitioned by two columns.Job aborted due to stage failure: Task 5774 in stage 33.0 failed 4 times, most recent failure: Lost task 5774.3 in stage 33.0 (TID 7736) (13.2.96.110 executor 7): ExecutorLos...

Data Engineering
databricks jobs
spark
  • 4389 Views
  • 1 replies
  • 1 kudos
Latest Reply
amitkmaurya
Contributor
  • 1 kudos

Hi, I have solved the problem with the same workers and driver.In my case data skewness was the problem.Adding repartition to the dataframe just before writing, evenly distributed the data across the nodes and this stage failure resolved.Thanks @Reti...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels