cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

famous_jt33
by New Contributor
  • 1305 Views
  • 2 replies
  • 2 kudos

SQL UDFs for DLT pipelines

I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separat...

  • 1305 Views
  • 2 replies
  • 2 kudos
Latest Reply
6502
New Contributor III
  • 2 kudos

You can't. The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL cod...

  • 2 kudos
1 More Replies
sher
by Valued Contributor II
  • 4194 Views
  • 4 replies
  • 3 kudos

how do we use delta sharing between databricks to snowflake

Hi all,Is there any way to implement delta sharing in databricks to snowflake direct connect ?

  • 4194 Views
  • 4 replies
  • 3 kudos
Latest Reply
NateAnth
Valued Contributor
  • 3 kudos

I don't think that Snowflake has implemented the ability to read from a table via Delta Sharing as of December 2023. Please reach out to your Snowflake representatives and urge them to consider this feature from their side.  Alternatively, you can qu...

  • 3 kudos
3 More Replies
PrasSabb_97245
by New Contributor II
  • 3898 Views
  • 2 replies
  • 0 kudos

AWS S3 External Location Size in Unity Catalog

Hi,I am trying to get the raw size (total size)  of delta table. I could get delta table size from DeltaTable api but that gives only latest version size. I need to find the actual S3 size the tables takes on S3.Is there any way, to find the S3 size ...

  • 3898 Views
  • 2 replies
  • 0 kudos
Latest Reply
PrasSabb_97245
New Contributor II
  • 0 kudos

Hi Kaniz,Thank you for your suggestions. As per my understanding, the "snapshot.sizeInBytes" gives only current snapshot size. But I am looking for total size (all versions) of the table on S3.  

  • 0 kudos
1 More Replies
erigaud
by Honored Contributor
  • 2456 Views
  • 4 replies
  • 0 kudos

The operation CHANGE DATA FEED is not allowed on Streaming Tables.

Hello everyone,I have a workflow that starts by reading the CDF data for a change data feed.The syntax is exactly the following : (spark.readStream  .format("delta")  .option("readChangeFeed", "true")   .option("startingVersion", 10)   .table("my.str...

  • 2456 Views
  • 4 replies
  • 0 kudos
Latest Reply
afk
New Contributor III
  • 0 kudos

Hi, this seems to be related to the issue I've been getting around the same time here: Change data feed from target tables of APPLY CHANG... - Databricks - 54436Would be great to get an explanation for the sudden change in behaviour.

  • 0 kudos
3 More Replies
Jules
by New Contributor
  • 580 Views
  • 0 replies
  • 0 kudos

Access from DBT job to Azure DevOps repository using Service Principal

Hi,We are using Databricks bundles to deploy our DBT project. Everything is set up to deploy and run as a Service Principal.The DBT job is connected to an Azure DevOps repository. The problem is that we cannot find a way to properly authenticate the ...

Data Engineering
azure devops
bundles
dbt
  • 580 Views
  • 0 replies
  • 0 kudos
NLearn
by New Contributor II
  • 888 Views
  • 2 replies
  • 0 kudos

Save default language of notebook into variable dynamically

 For one of the requirements of project, I want to save default language of notebook into variable based on notebook path mentioned dynamically.For eg: if first notebook given by user in widget is having default language as Python then variable value...

  • 888 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @NLearn, To change the default language of a notebook in Databricks, you can select File -> Change default cell language. This will affect all the cells in the notebook that use the same language as the default one. You can also use magic commands...

  • 0 kudos
1 More Replies
harvey-c
by New Contributor III
  • 1002 Views
  • 1 replies
  • 0 kudos

Wrong FS: abfss://....., expected: dbfs:/ Error in DLT pipeline

Dear Databricks community members:SymptomReceived the error for a delta load, after a successful initial load with a  Unity Catalog Volume as a data source.org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, runId...

  • 1002 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @harvey-c, The file system path of the data source has changed from dbfs:/ to abfss:// after a previous successful load. This might confuse the Spark streaming query and cause it to fail with a wrong file system exception.   One possible solution ...

  • 0 kudos
GijsM
by New Contributor
  • 1078 Views
  • 1 replies
  • 0 kudos

Thousands of ETL pipelines with long execution times and small dataset sizes

Hi,I work for a small company, we're mostly focussing on small retail and e-commerce customers. We provides data analysis and automated data connections between their platforms. Most of our datasets are things like order data, google ads click data, ...

  • 1078 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Valued Contributor
  • 0 kudos

Hi, Thanks for the information, there is a lot to unpack and some assumptions that need to be made without fully understanding the details, so here are a few thoughts: If the cluster start times longer because of the libraries you're installing, can ...

  • 0 kudos
Fz1
by New Contributor III
  • 7356 Views
  • 5 replies
  • 3 kudos

Resolved! SQL Warehouse Serverless - Not able to access the external tables in the hive_metastore

I have DLT tables created under the hive_metastore with external data stored in ADL gen2.The ADL blob storage is mounted into /mnt/<storage-account>The tables are successfully created and accessible from my notebooks, as well the ADL storage.I have c...

  • 7356 Views
  • 5 replies
  • 3 kudos
Latest Reply
TjommeV-Vlaio
New Contributor II
  • 3 kudos

Can this be done using Terraform as well?

  • 3 kudos
4 More Replies
Phani1
by Valued Contributor II
  • 1330 Views
  • 1 replies
  • 0 kudos

Query Delta table from .net

Hi Team,How can expose data stored in delta table through API like exposing sql data through .net api?

Data Engineering
delta
dotnet
  • 1330 Views
  • 1 replies
  • 0 kudos
Latest Reply
BjarkeM
New Contributor II
  • 0 kudos

You can use the SQL Statement Execution API.At energinet.dk we have created this open-source .NET client, which we use internally in the company.

  • 0 kudos
-werners-
by Esteemed Contributor III
  • 4016 Views
  • 3 replies
  • 3 kudos

Resolved! best way to store config files in a Unity workspace (Scala/typesafe)

We use typesafe (scala) to read configuration values from hocon files.When not using Unity, we read the configuration files from /dbfs/...  works fine.However, with Unity, usage of dbfs is frowned upon.So I started looking into alternatives.And unfor...

  • 4016 Views
  • 3 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

In the end we will continue to use dbfs.  Maybe in the future when volumes are supported by scala io we can re-evaluate, but for now dbfs seems the way to go.

  • 3 kudos
2 More Replies
mudholkar
by New Contributor III
  • 2208 Views
  • 1 replies
  • 6 kudos

I am getting an SSLError: HTTPSConnectionPool while making a call to https restapis from azure databricks I have tried to set a verify=false parameter in the call too.

response = requests.request("POST", url, verify=False, headers=headers, data=payload)   SSLError: HTTPSConnectionPool(host='dcs.adobedc.net', port=443): Max retries exceeded with url: /collection/d99e6dfcffb0b5aeaec2cf76cd3bc2b9e9c414b0c74a528d13dd39...

  • 2208 Views
  • 1 replies
  • 6 kudos
Latest Reply
JFG
New Contributor II
  • 6 kudos

Any luck with this?

  • 6 kudos
mkrish28
by New Contributor II
  • 1365 Views
  • 2 replies
  • 0 kudos

Resolved! Regarding Exam got suspended

Hello Team,I had a disappointing experience while attempting my first DataBricks certification. Abruptly, the proctor asked me to show my desk, and after complying. Eventually, they suspended my exam, citing excessive eye movement and other practices...

  • 1365 Views
  • 2 replies
  • 0 kudos
Latest Reply
Cert-Team
Esteemed Contributor
  • 0 kudos

@mkrish28 I'm sorry to hear you had this experience. Thank you for logging at ticket with the support team. They have informed me they have rescheduled your exam. Good luck!

  • 0 kudos
1 More Replies
Oliver_Angelil
by Valued Contributor II
  • 7569 Views
  • 8 replies
  • 1 kudos

How to use the git CLI in databricks?

After making some changes in my feature branch, I have committed and pushed (to Azure Devops) some work (note I have not yet raised a PR or merge to any other branch). Many of the files I committed are data files and so I would like to reverse the co...

  • 7569 Views
  • 8 replies
  • 1 kudos
Latest Reply
Kayla
Valued Contributor
  • 1 kudos

I'm also curious about this question - does anyone have an answer? Being able to use the full repertoire of git commands inside Databricks would be quite useful.

  • 1 kudos
7 More Replies
samur
by New Contributor II
  • 1469 Views
  • 2 replies
  • 1 kudos

DBR 14.1 - foreachBatch in Spark Connect Shared Clusters are not supported in Unity Catalog.

I am getting this error on DBR 14.1AnalysisException: [UC_COMMAND_NOT_SUPPORTED.WITHOUT_RECOMMENDATION] The command(s): foreachBatch in Spark Connect Shared Clusters are not supported in Unity Catalog.This is the code: wstream = df.writeStream.foreac...

  • 1469 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @samur, The error message you’re encountering in DBR 14.1 indicates that the command foreachBatch within Spark Connect Shared Clusters is not supported in the Unity Catalog. This limitation is specific to the version you’re using. If you need to w...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels