cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HoussemBL
by New Contributor III
  • 6559 Views
  • 11 replies
  • 3 kudos

DLT Pipeline & Automatic Liquid Clustering Syntax

Hi everyone,I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.However, I'm having trouble figuring o...

  • 6559 Views
  • 11 replies
  • 3 kudos
Latest Reply
jsturgeon
New Contributor II
  • 3 kudos

Is there a resolution to this? I am having the same problem. I can create tables with cluster by auto, but the MVs are failing saying I need to enable PO. This was working yesterday and is working in other environments.

  • 3 kudos
10 More Replies
jeremy98
by Honored Contributor
  • 1074 Views
  • 6 replies
  • 1 kudos

Is there a way to discover in the next task if the previous for loop task has some...

Hi community,As the title suggests, I'm looking for a smart way to determine which runs in a for-loop task succeeded and which didn’t, so I can use that information in the next task.Summary:I have a for-loop task that runs multiple items (e.g., run1,...

  • 1074 Views
  • 6 replies
  • 1 kudos
Latest Reply
SebastianRowan
Contributor
  • 1 kudos

Easiest way is to log each loop’s status with `dbutils.jobs.taskValues.set` then just grab those in the next task and only work with the ones that passed.

  • 1 kudos
5 More Replies
SebastianRowan
by Contributor
  • 2502 Views
  • 8 replies
  • 6 kudos

Resolved! Batch jobs suddenly slow down?

What to do when sometimes batch jobs take way longer even though the data size hasn’t changed. What causes this? And do you use any tool for that??

  • 2502 Views
  • 8 replies
  • 6 kudos
Latest Reply
SebastianRowan
Contributor
  • 6 kudos

Thanks for the AMAZING response!

  • 6 kudos
7 More Replies
Hasiok1337
by New Contributor II
  • 1794 Views
  • 3 replies
  • 2 kudos

Transport Data from Sharepoint Excel file to Databrics Table

Hello Is there a way in a Databricks notebook to pull data from an Excel file stored on SharePoint and upload it into my table in Databricks?I have a situation where I maintain a few tables on SharePoint and a few tables with the same data in Databri...

  • 1794 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @Hasiok1337 ,Your approach is valid in my opinion. But you can also check SharePoint connector. Is in beta currently, but should work. It gives you out of the box way to extract files from SharePoint into databricks 

  • 2 kudos
2 More Replies
ckarrasexo
by New Contributor III
  • 28846 Views
  • 9 replies
  • 5 kudos

pyspark.sql.connect.dataframe.DataFrame vs pyspark.sql.DataFrame

I noticed that on some Databricks 14.3 clusters, I get DataFrames with type pyspark.sql.connect.dataframe.DataFrame, while on other clusters also with Databricks 14.3, the exact same code gets DataFrames of type pyspark.sql.DataFramepyspark.sql.conne...

  • 28846 Views
  • 9 replies
  • 5 kudos
Latest Reply
Gleydson404
New Contributor II
  • 5 kudos

I have found a work around for this issue. Basically, I create a dummy_df and then I check if the dataframe I want to check has the same type as the dummy_df.def get_dummy_df() -> DataFrame: """ Generates a dummy DataFrame with a range of int...

  • 5 kudos
8 More Replies
sharukh_lodhi
by New Contributor III
  • 6139 Views
  • 5 replies
  • 3 kudos

Azure IMDS is not accesbile selecting shared compute policy

Hi, Databricks community,I recently encountered an issue while using the 'azure.identity' Python library on a cluster set to the personal compute policy in Databricks. In this case, Databricks successfully returns the Azure Databricks managed user id...

image.png
Data Engineering
azure IMDS
DefaultAzureCredential
  • 6139 Views
  • 5 replies
  • 3 kudos
Latest Reply
Malthe
Valued Contributor II
  • 3 kudos

How does this work with serverless (for example with DLT pipelines) which runs in standard access mode:Serverless compute is based on Databricks standard access mode compute architecture (formerly called shared access mode).To my understanding, from ...

  • 3 kudos
4 More Replies
der
by Contributor III
  • 2458 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks JDBC Driver 2.7.3 with OAuth2 M2M on Databricks

We have an application implemented in Java and installed as JAR on the cluster. The application reads data from unity catalog over Databricks JDBC Driver.We used PAT Tokens for the Service Principal in the past and everything worked fine. Now we chan...

  • 2458 Views
  • 2 replies
  • 0 kudos
Latest Reply
der
Contributor III
  • 0 kudos

According support team. I had to set the JDBC parameter OAuthEnabledIPAddressRanges. The range of the IP should be the resolved private link IP (usually starting with 10.x) of the hostname for the Databricks workspace URL. 

  • 0 kudos
1 More Replies
chexa_Wee
by New Contributor III
  • 4546 Views
  • 8 replies
  • 5 kudos

error creating catalog in Unity Catalog – EXTERNAL_LOCATION_DOES_NOT_EXIST and admin console storage

Hi all, I’m trying to create a new catalog in Azure Databricks Unity Catalog but I’m running into issues. When I tried to add a default path in the Admin Console → Metastore settings, I got this error: “Metastore storage root URL does not exist. Plea...

  • 4546 Views
  • 8 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @chexa_Wee ,Starting from November 9, 2023, Databricks by default won't configure metastore-level storage for managed tables and volumes. Databricks recommends that you create a separate managed storage location for each catalog in your metastore....

  • 5 kudos
7 More Replies
FRB1984
by New Contributor II
  • 620 Views
  • 1 replies
  • 1 kudos

Different behavior on personal cluster vs job cluster

Hi guys!I am facing a weird bug here!I own a notebook that runs perfectly on personal cluster. Just as example, I´ve made some prints of the data output during the extraction :code :cursor.execute(sql) results = cursor.fetchall() cols = [desc[0] fo...

  • 620 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 1 kudos

Hi team,In interactive notebooks on personal clusters, you’re attached directly to the Spark driver inside the cluster. Spark session is the legacy PySpark session.In job clusters, especially when running with newer runtimes (e.g. DBR 14.x+ or SQL wa...

  • 1 kudos
CzarR
by New Contributor III
  • 946 Views
  • 3 replies
  • 1 kudos

Resolved! Dynamic cluster via ADF vs standalone Databricks cluster processing issue

I have a databricks notebook that writes data from a parquet file with 4 million records into a new delta table. Simple script. It works fine when I run it from the Databricks notebook using the cluster with config in the screenshot below. But I run ...

CzarR_1-1755703065807.png CzarR_0-1755702746336.png
  • 946 Views
  • 3 replies
  • 1 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 1 kudos

Hello @CzarR ,From first glance it looks like offheap memory issue and thats why you would see a "GC overhead limit exceeded" error. Can you try enabling and adjusting the offheap memory size in the linked service where you define the cluster spark c...

  • 1 kudos
2 More Replies
Coffee77
by Honored Contributor II
  • 1012 Views
  • 3 replies
  • 3 kudos

Resolved! Introduction to Databricks 🇪🇸

Here is the first episode of a serie of simple videos on Introduction to Databricks for beginners in Spanish :https://youtu.be/kvglz79Ob-M?si=KnyCH74_HQ8jiO7SIt contains previous and basic concepts to master before moving forward with Databricks. 

  • 1012 Views
  • 3 replies
  • 3 kudos
Latest Reply
WiliamRosa
Databricks Partner
  • 3 kudos

No problem, I did the same thing the first time as well 

  • 3 kudos
2 More Replies
wi11iamr
by New Contributor II
  • 4598 Views
  • 6 replies
  • 0 kudos

PowerBI Connection: Possible to use ADOMDClient (or alternative)?

I wish to extract from PowerBI Datasets the metadata of all Measures, Relationships and Entities.In VSCode I have a python script that connects to the PowerBI API using the Pyadomd module connecting via the XMLA endpoint. After much trial and error I...

  • 4598 Views
  • 6 replies
  • 0 kudos
Latest Reply
Rajesh007
New Contributor II
  • 0 kudos

you've any luck? i have same requirement, wanna read some datasets from powerbi datamodel to my databricks workspace and store in datalake.

  • 0 kudos
5 More Replies
SanthanaSelvi06
by New Contributor III
  • 2526 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks App - Streamlit file Upload issue

I used this code snippet from cookbook and created a custom databricks streamlit app to upload files to the volume but i am getting the following error even before start uploading it to volume. Using file_uploader in streamlit while uploading the fil...

  • 2526 Views
  • 3 replies
  • 1 kudos
Latest Reply
SanthanaSelvi06
New Contributor III
  • 1 kudos

I am able to upload the file after whitelisting the app url

  • 1 kudos
2 More Replies
a_t_h_i
by New Contributor II
  • 5112 Views
  • 4 replies
  • 2 kudos

Move managed DLT table from one schema to another schema in Databricks

I have a DLT table in schema A which is being loaded by DLT pipeline.I want to move the table from schema A to schema B, and repoint my existing DLT pipeline to table in schema B. also I need to avoid full reload in DLT pipeline on table in Schema B....

Data Engineering
delta-live-table
deltalivetable
deltatable
dlt
  • 5112 Views
  • 4 replies
  • 2 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 2 kudos

Have you tried the belowPause or Stop the DLT PipelinePrevent new writes while moving the table.2.Move the Table in Metastore DLT uses Delta tables under the hood, so you can move the table in the metastore without copying data:ALTER TABLE schemaA.ta...

  • 2 kudos
3 More Replies
Nandini
by New Contributor II
  • 18666 Views
  • 12 replies
  • 7 kudos

Pyspark: You cannot use dbutils within a spark job

I am trying to parallelise the execution of file copy in Databricks. Making use of multiple executors is one way. So, this is the piece of code that I wrote in pyspark.def parallel_copy_execution(src_path: str, target_path: str): files_in_path = db...

  • 18666 Views
  • 12 replies
  • 7 kudos
Latest Reply
Etyr
Contributor II
  • 7 kudos

If you have spark session, you can use Spark hidden File System:# Get FileSystem from SparkSession fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) # Get Path class to convert string path to FS path path = spark._...

  • 7 kudos
11 More Replies
Labels