cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shagun
by New Contributor III
  • 7827 Views
  • 3 replies
  • 0 kudos

Resolved! Delta live tables target schema

The first time i run my delta live table pipeline after setup, I get this error on starting it :-------------------------------------org.apache.spark.sql.catalyst.parser.ParseException: Possibly unquoted identifier my-schema-name detected. Please con...

  • 7827 Views
  • 3 replies
  • 0 kudos
Latest Reply
BenTendo
New Contributor II
  • 0 kudos

This still errors on internal databricks spark/python code likedeltaTable.history()@shagun wrote:The first time i run my delta live table pipeline after setup, I get this error on starting it :-------------------------------------org.apache.spark.sql...

  • 0 kudos
2 More Replies
ashdam
by New Contributor III
  • 5741 Views
  • 1 replies
  • 2 kudos

Databricks asset bundles use cluster depending on target (environment) is possible?

Here is my bundle definition  Spoiler# This is a Databricks asset bundle definition for my_project.# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.experimental:   python_wheel_wrapper: truebundle:  name: my_projectinc...

Data Engineering
Databricks Asset Bundles
  • 5741 Views
  • 1 replies
  • 2 kudos
Nastasia
by New Contributor II
  • 7576 Views
  • 2 replies
  • 1 kudos

Why is Spark creating multiple jobs for one action?

I noticed that when launching this bunch of code with only one action, I have three jobs that are launched.from pyspark.sql import DataFrame from pyspark.sql.types import StructType, StructField, StringType from pyspark.sql.functions import avgdata:...

https___i.stack.imgur.com_xfYDe.png LTHBM DdfHN
  • 7576 Views
  • 2 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

The above code will create two jobs.JOB-1. dataframe: DataFrame = spark.createDataFrame(data=data,schema=schema)The createDataFrame function is responsible for inferring the schema from the provided data or using the specified schema.Depending on the...

  • 1 kudos
1 More Replies
bathulaj
by New Contributor
  • 934 Views
  • 0 replies
  • 0 kudos

NonDeterministic UDF's making multiple invocations

Hi,Even after defining my UDF's as nonDeterministic like in here ..testUDF = udf(testMthod, StringType()).asNondeterministic()is still making multiple invocations. Is there any thing that I am missing here ?TIA,-John B

  • 934 Views
  • 0 replies
  • 0 kudos
famous_jt33
by New Contributor
  • 2404 Views
  • 2 replies
  • 2 kudos

SQL UDFs for DLT pipelines

I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separat...

  • 2404 Views
  • 2 replies
  • 2 kudos
Latest Reply
6502
New Contributor III
  • 2 kudos

You can't. The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL cod...

  • 2 kudos
1 More Replies
sher
by Valued Contributor II
  • 12503 Views
  • 3 replies
  • 2 kudos

how do we use delta sharing between databricks to snowflake

Hi all,Is there any way to implement delta sharing in databricks to snowflake direct connect ?

  • 12503 Views
  • 3 replies
  • 2 kudos
Latest Reply
NateAnth
Databricks Employee
  • 2 kudos

I don't think that Snowflake has implemented the ability to read from a table via Delta Sharing as of December 2023. Please reach out to your Snowflake representatives and urge them to consider this feature from their side.  Alternatively, you can qu...

  • 2 kudos
2 More Replies
PrasSabb_97245
by New Contributor II
  • 5995 Views
  • 1 replies
  • 0 kudos

AWS S3 External Location Size in Unity Catalog

Hi,I am trying to get the raw size (total size)  of delta table. I could get delta table size from DeltaTable api but that gives only latest version size. I need to find the actual S3 size the tables takes on S3.Is there any way, to find the S3 size ...

  • 5995 Views
  • 1 replies
  • 0 kudos
Latest Reply
PrasSabb_97245
New Contributor II
  • 0 kudos

Hi Kaniz,Thank you for your suggestions. As per my understanding, the "snapshot.sizeInBytes" gives only current snapshot size. But I am looking for total size (all versions) of the table on S3.  

  • 0 kudos
erigaud
by Honored Contributor
  • 4797 Views
  • 3 replies
  • 0 kudos

The operation CHANGE DATA FEED is not allowed on Streaming Tables.

Hello everyone,I have a workflow that starts by reading the CDF data for a change data feed.The syntax is exactly the following : (spark.readStream  .format("delta")  .option("readChangeFeed", "true")   .option("startingVersion", 10)   .table("my.str...

  • 4797 Views
  • 3 replies
  • 0 kudos
Latest Reply
afk
Databricks Partner
  • 0 kudos

Hi, this seems to be related to the issue I've been getting around the same time here: Change data feed from target tables of APPLY CHANG... - Databricks - 54436Would be great to get an explanation for the sudden change in behaviour.

  • 0 kudos
2 More Replies
Jules
by New Contributor
  • 1500 Views
  • 0 replies
  • 0 kudos

Access from DBT job to Azure DevOps repository using Service Principal

Hi,We are using Databricks bundles to deploy our DBT project. Everything is set up to deploy and run as a Service Principal.The DBT job is connected to an Azure DevOps repository. The problem is that we cannot find a way to properly authenticate the ...

Data Engineering
azure devops
bundles
dbt
  • 1500 Views
  • 0 replies
  • 0 kudos
harvey-c
by New Contributor III
  • 1827 Views
  • 0 replies
  • 0 kudos

Wrong FS: abfss://....., expected: dbfs:/ Error in DLT pipeline

Dear Databricks community members:SymptomReceived the error for a delta load, after a successful initial load with a  Unity Catalog Volume as a data source.org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, runId...

  • 1827 Views
  • 0 replies
  • 0 kudos
GijsM
by New Contributor
  • 4327 Views
  • 1 replies
  • 0 kudos

Thousands of ETL pipelines with long execution times and small dataset sizes

Hi,I work for a small company, we're mostly focussing on small retail and e-commerce customers. We provides data analysis and automated data connections between their platforms. Most of our datasets are things like order data, google ads click data, ...

  • 4327 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi, Thanks for the information, there is a lot to unpack and some assumptions that need to be made without fully understanding the details, so here are a few thoughts: If the cluster start times longer because of the libraries you're installing, can ...

  • 0 kudos
Phani1
by Databricks MVP
  • 3479 Views
  • 1 replies
  • 0 kudos

Query Delta table from .net

Hi Team,How can expose data stored in delta table through API like exposing sql data through .net api?

Data Engineering
delta
dotnet
  • 3479 Views
  • 1 replies
  • 0 kudos
Latest Reply
BjarkeM
New Contributor III
  • 0 kudos

You can use the SQL Statement Execution API.At energinet.dk we have created this open-source .NET client, which we use internally in the company.

  • 0 kudos
-werners-
by Esteemed Contributor III
  • 8470 Views
  • 2 replies
  • 3 kudos

Resolved! best way to store config files in a Unity workspace (Scala/typesafe)

We use typesafe (scala) to read configuration values from hocon files.When not using Unity, we read the configuration files from /dbfs/...  works fine.However, with Unity, usage of dbfs is frowned upon.So I started looking into alternatives.And unfor...

  • 8470 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

In the end we will continue to use dbfs.  Maybe in the future when volumes are supported by scala io we can re-evaluate, but for now dbfs seems the way to go.

  • 3 kudos
1 More Replies
mudholkar
by New Contributor III
  • 3979 Views
  • 1 replies
  • 6 kudos

I am getting an SSLError: HTTPSConnectionPool while making a call to https restapis from azure databricks I have tried to set a verify=false parameter in the call too.

response = requests.request("POST", url, verify=False, headers=headers, data=payload)   SSLError: HTTPSConnectionPool(host='dcs.adobedc.net', port=443): Max retries exceeded with url: /collection/d99e6dfcffb0b5aeaec2cf76cd3bc2b9e9c414b0c74a528d13dd39...

  • 3979 Views
  • 1 replies
  • 6 kudos
Latest Reply
JFG
New Contributor II
  • 6 kudos

Any luck with this?

  • 6 kudos
Labels