cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ckarrasexo
by New Contributor III
  • 26408 Views
  • 9 replies
  • 5 kudos

pyspark.sql.connect.dataframe.DataFrame vs pyspark.sql.DataFrame

I noticed that on some Databricks 14.3 clusters, I get DataFrames with type pyspark.sql.connect.dataframe.DataFrame, while on other clusters also with Databricks 14.3, the exact same code gets DataFrames of type pyspark.sql.DataFramepyspark.sql.conne...

  • 26408 Views
  • 9 replies
  • 5 kudos
Latest Reply
Gleydson404
New Contributor II
  • 5 kudos

I have found a work around for this issue. Basically, I create a dummy_df and then I check if the dataframe I want to check has the same type as the dummy_df.def get_dummy_df() -> DataFrame: """ Generates a dummy DataFrame with a range of int...

  • 5 kudos
8 More Replies
sharukh_lodhi
by New Contributor III
  • 5503 Views
  • 5 replies
  • 3 kudos

Azure IMDS is not accesbile selecting shared compute policy

Hi, Databricks community,I recently encountered an issue while using the 'azure.identity' Python library on a cluster set to the personal compute policy in Databricks. In this case, Databricks successfully returns the Azure Databricks managed user id...

image.png
Data Engineering
azure IMDS
DefaultAzureCredential
  • 5503 Views
  • 5 replies
  • 3 kudos
Latest Reply
Malthe
Valued Contributor
  • 3 kudos

How does this work with serverless (for example with DLT pipelines) which runs in standard access mode:Serverless compute is based on Databricks standard access mode compute architecture (formerly called shared access mode).To my understanding, from ...

  • 3 kudos
4 More Replies
der
by Contributor III
  • 2224 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks JDBC Driver 2.7.3 with OAuth2 M2M on Databricks

We have an application implemented in Java and installed as JAR on the cluster. The application reads data from unity catalog over Databricks JDBC Driver.We used PAT Tokens for the Service Principal in the past and everything worked fine. Now we chan...

  • 2224 Views
  • 2 replies
  • 0 kudos
Latest Reply
der
Contributor III
  • 0 kudos

According support team. I had to set the JDBC parameter OAuthEnabledIPAddressRanges. The range of the IP should be the resolved private link IP (usually starting with 10.x) of the hostname for the Databricks workspace URL. 

  • 0 kudos
1 More Replies
chexa_Wee
by New Contributor III
  • 3392 Views
  • 8 replies
  • 5 kudos

error creating catalog in Unity Catalog – EXTERNAL_LOCATION_DOES_NOT_EXIST and admin console storage

Hi all, I’m trying to create a new catalog in Azure Databricks Unity Catalog but I’m running into issues. When I tried to add a default path in the Admin Console → Metastore settings, I got this error: “Metastore storage root URL does not exist. Plea...

  • 3392 Views
  • 8 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @chexa_Wee ,Starting from November 9, 2023, Databricks by default won't configure metastore-level storage for managed tables and volumes. Databricks recommends that you create a separate managed storage location for each catalog in your metastore....

  • 5 kudos
7 More Replies
FRB1984
by New Contributor II
  • 495 Views
  • 1 replies
  • 1 kudos

Different behavior on personal cluster vs job cluster

Hi guys!I am facing a weird bug here!I own a notebook that runs perfectly on personal cluster. Just as example, I´ve made some prints of the data output during the extraction :code :cursor.execute(sql) results = cursor.fetchall() cols = [desc[0] fo...

  • 495 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 1 kudos

Hi team,In interactive notebooks on personal clusters, you’re attached directly to the Spark driver inside the cluster. Spark session is the legacy PySpark session.In job clusters, especially when running with newer runtimes (e.g. DBR 14.x+ or SQL wa...

  • 1 kudos
CzarR
by New Contributor III
  • 757 Views
  • 3 replies
  • 1 kudos

Resolved! Dynamic cluster via ADF vs standalone Databricks cluster processing issue

I have a databricks notebook that writes data from a parquet file with 4 million records into a new delta table. Simple script. It works fine when I run it from the Databricks notebook using the cluster with config in the screenshot below. But I run ...

CzarR_1-1755703065807.png CzarR_0-1755702746336.png
  • 757 Views
  • 3 replies
  • 1 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 1 kudos

Hello @CzarR ,From first glance it looks like offheap memory issue and thats why you would see a "GC overhead limit exceeded" error. Can you try enabling and adjusting the offheap memory size in the linked service where you define the cluster spark c...

  • 1 kudos
2 More Replies
Coffee77
by Honored Contributor II
  • 839 Views
  • 3 replies
  • 3 kudos

Resolved! Introduction to Databricks 🇪🇸

Here is the first episode of a serie of simple videos on Introduction to Databricks for beginners in Spanish :https://youtu.be/kvglz79Ob-M?si=KnyCH74_HQ8jiO7SIt contains previous and basic concepts to master before moving forward with Databricks. 

  • 839 Views
  • 3 replies
  • 3 kudos
Latest Reply
WiliamRosa
Honored Contributor III
  • 3 kudos

No problem, I did the same thing the first time as well 

  • 3 kudos
2 More Replies
wi11iamr
by New Contributor II
  • 4007 Views
  • 6 replies
  • 0 kudos

PowerBI Connection: Possible to use ADOMDClient (or alternative)?

I wish to extract from PowerBI Datasets the metadata of all Measures, Relationships and Entities.In VSCode I have a python script that connects to the PowerBI API using the Pyadomd module connecting via the XMLA endpoint. After much trial and error I...

  • 4007 Views
  • 6 replies
  • 0 kudos
Latest Reply
Rajesh007
New Contributor II
  • 0 kudos

you've any luck? i have same requirement, wanna read some datasets from powerbi datamodel to my databricks workspace and store in datalake.

  • 0 kudos
5 More Replies
SanthanaSelvi06
by New Contributor III
  • 2063 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks App - Streamlit file Upload issue

I used this code snippet from cookbook and created a custom databricks streamlit app to upload files to the volume but i am getting the following error even before start uploading it to volume. Using file_uploader in streamlit while uploading the fil...

  • 2063 Views
  • 3 replies
  • 1 kudos
Latest Reply
SanthanaSelvi06
New Contributor III
  • 1 kudos

I am able to upload the file after whitelisting the app url

  • 1 kudos
2 More Replies
a_t_h_i
by New Contributor II
  • 4822 Views
  • 4 replies
  • 2 kudos

Move managed DLT table from one schema to another schema in Databricks

I have a DLT table in schema A which is being loaded by DLT pipeline.I want to move the table from schema A to schema B, and repoint my existing DLT pipeline to table in schema B. also I need to avoid full reload in DLT pipeline on table in Schema B....

Data Engineering
delta-live-table
deltalivetable
deltatable
dlt
  • 4822 Views
  • 4 replies
  • 2 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 2 kudos

Have you tried the belowPause or Stop the DLT PipelinePrevent new writes while moving the table.2.Move the Table in Metastore DLT uses Delta tables under the hood, so you can move the table in the metastore without copying data:ALTER TABLE schemaA.ta...

  • 2 kudos
3 More Replies
Nandini
by New Contributor II
  • 17919 Views
  • 12 replies
  • 7 kudos

Pyspark: You cannot use dbutils within a spark job

I am trying to parallelise the execution of file copy in Databricks. Making use of multiple executors is one way. So, this is the piece of code that I wrote in pyspark.def parallel_copy_execution(src_path: str, target_path: str): files_in_path = db...

  • 17919 Views
  • 12 replies
  • 7 kudos
Latest Reply
Etyr
Contributor II
  • 7 kudos

If you have spark session, you can use Spark hidden File System:# Get FileSystem from SparkSession fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) # Get Path class to convert string path to FS path path = spark._...

  • 7 kudos
11 More Replies
b-baran
by New Contributor III
  • 1161 Views
  • 3 replies
  • 1 kudos

Resolved! How to define a column tag in a table schema definition?

Setting a tag for a specific column can be done using the SQL command:https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-set-tagIs there another possible way to define a column tag?For example, it is possible to add a column co...

  • 1161 Views
  • 3 replies
  • 1 kudos
Latest Reply
WiliamRosa
Honored Contributor III
  • 1 kudos

You’re welcome @b-baran ! If you feel my answer addressed your question, could you please mark it as the solution to the post? Thank you very much!

  • 1 kudos
2 More Replies
GeertR
by New Contributor
  • 403 Views
  • 1 replies
  • 1 kudos

Is CREATE STREAMING LIVE VIEW deprecated?

Hi,I'm trying to learn Lakeflow Pipelines (DLT) and found some examples online where the CREATE STREAMING LIVE VIEW statement is used. When I try to search for it in the Databricks document, there is nothing really I can find on them.https://docs.dat...

  • 403 Views
  • 1 replies
  • 1 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 1 kudos

Hello @GeertR ,The LIVE virtual schema is a legacy feature of Lakeflow Declarative Pipelines (DLT) and is deprecated. You can still use it with pipelines that were created in legacy publishing mode, but the pipeline configuration UI no longer lets yo...

  • 1 kudos
zoeyazimi
by New Contributor
  • 2412 Views
  • 2 replies
  • 0 kudos

importing files from streamlit app on databricks to dbfs

I am building a Streamlit-based app on Databricks that allows users to:Upload Excel scenario filesStore them in DBFS (e.g.,/FileStore/SCO/scenarios/)Trigger a simulation/optimization model using the uploaded/stored file as input to the model Store th...

  • 2412 Views
  • 2 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Here is an example for uploading files to a Volume, a download example is there, too

  • 0 kudos
1 More Replies
avidex180899
by New Contributor III
  • 16668 Views
  • 4 replies
  • 4 kudos

Resolved! UUID/GUID Datatype in Databricks SQL

Hi all,I am trying to create a table with a GUID column.I have tried using GUID, UUID; but both of them are not working.Can someone help me with the syntax for adding a GUID column?Thanks!

  • 16668 Views
  • 4 replies
  • 4 kudos
Latest Reply
rswarnkar5
New Contributor III
  • 4 kudos

> What ANSI SQL data structure to use for UUID or GUID?I had similar question. The answer was `STRING`. 

  • 4 kudos
3 More Replies
Labels