cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jyo777
by Contributor
  • 8457 Views
  • 7 replies
  • 4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

  • 8457 Views
  • 7 replies
  • 4 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 4 kudos

Not a comparison, but there is a DB-SQL cheatsheet at https://www.databricks.com/sites/default/files/2023-09/databricks-sql-cheatsheet.pdf/

  • 4 kudos
6 More Replies
hukel
by Contributor
  • 874 Views
  • 1 replies
  • 0 kudos

Python function using Splunk SDK works in Python notebook but not in SQL notebook

Background:I've created a small function in a notebook that uses Splunk's splunk-sdk  package.  The original intention was to call Splunk to execute a search/query, but for the sake of simplicity while testing this issue,  the function only prints pr...

  • 874 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

When you run a Python function in a Python cell, it executes in the local Python environment of the notebook. However, when you call a Python function from a SQL cell, it runs as a UDF within the Spark execution environment.  You need to define the f...

  • 0 kudos
Miguel_Salas
by New Contributor II
  • 1042 Views
  • 1 replies
  • 1 kudos

Last file in S3 folder using autoloader

Nowadays we already use the autoloader with checkpoint location, but I still wanted to know if it is possible to read only the last updated file within a folder. I know it somewhat loses the purpose of checkpoint locatioAnother question is it possibl...

  • 1042 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

Auto loader's scope is limited to incrementally loading files from storage, and there is no such functionality to just load the latest file from a group of files, you'd likely want to have this kind of "last updated" logic in a different layer or in ...

  • 1 kudos
smit_tw
by New Contributor III
  • 1020 Views
  • 2 replies
  • 1 kudos

APPLY AS DELETE without operation

We are performing a full load of API data into the Bronze table (append only), and then using the APPLY CHANGES INTO query to move data from Bronze to Silver using Stream to get only new records. How can we also utilize the APPLY AS DELETE functional...

  • 1020 Views
  • 2 replies
  • 1 kudos
Latest Reply
smit_tw
New Contributor III
  • 1 kudos

@szymon_dybczak is it possible to do directly in Silver layer? We do not have option to go fo Gold layer. I tried to create a TEMP VIEW in Silver DLT pipeline but it gives error for circular dependency as I am comparing data from Silver it self and a...

  • 1 kudos
1 More Replies
Gilg
by Contributor II
  • 3563 Views
  • 4 replies
  • 1 kudos

Multiple Autoloader reading the same directory path

HiOriginally, I only have 1 pipeline looking to a directory. Now as a test, I cloned the existing pipeline and edited the settings to a different catalog. Now both pipelines is basically reading the same directory path and running continuous mode.Que...

  • 3563 Views
  • 4 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

To answer the original question, autoloader does not use locks when reading files. You are however limited by the underlying storage system, ADLS in this example. Going by what has been mentioned (long batch times, but spark jobs finish really fast) ...

  • 1 kudos
3 More Replies
kmorton
by New Contributor
  • 2437 Views
  • 1 replies
  • 0 kudos

Autoloader start and end date for ingestion

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date t...

Data Engineering
autoloader
backfill
ETL
ingestion
  • 2437 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

If the files have already been loaded by autoloader (like same name and path), this can be tricky. I recommend starting a separate autoloader stream and specifying filters on it to match your start and end dates. If you'd instead like to rely on the ...

  • 0 kudos
gupta_tanmay
by New Contributor II
  • 1040 Views
  • 1 replies
  • 0 kudos

How to Connect Pyspark to Unity Catalog on Kubernetes with Data Stored in MinIO?

https://stackoverflow.com/questions/79177219/how-to-connect-spark-to-unity-catalog-on-kubernetes-with-data-stored-in-minio? I have posted the question on stack overflow.I am trying to register catalog using pyspark. 

Data Engineering
unity_catalog
  • 1040 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question! Although it shouldn't be necessary, could you please try the following: Set the spark.databricks.sql.initial.catalog.name configuration to my_catalog in your Spark session to ensure the correct catalog is initialized. Use ...

  • 0 kudos
sunilp
by New Contributor
  • 1128 Views
  • 1 replies
  • 0 kudos

Generate and Deploy Wheel file to the Databricks Cluster from VS Code

I have a scenario where I need to generate a versioned Wheel file and upload it automatically to a Databricks cluster using VS Code, without creating workflows or jobs like those in Databricks Bundles.My use case is to later use the Wheel file as an ...

  • 1128 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question! To automate generating and deploying a versioned Wheel file to a Databricks cluster directly from VS Code, you can try and follow these steps: Use VS Code Tasks[1]: Configure a tasks.json file in your project’s .vscode dire...

  • 0 kudos
Erik
by Valued Contributor III
  • 7048 Views
  • 1 replies
  • 0 kudos

Use unity catalog access connector for autoloader file notification events

We have a databricks access connector, and we have granted it access to file events.  But how do we now use that access connector in cloudfiles/autoloader with file-notifications? If I provide the id in the "cloudFiles.clientId" option, I am asked to...

  • 7048 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question!  If the access connector is still prompting for a secret or certificate when used in cloudFiles.clientId, this typically indicates that the authentication method is not being properly recognized. Here's what to check: Access...

  • 0 kudos
vbajaj1
by New Contributor II
  • 806 Views
  • 1 replies
  • 1 kudos

Resolved! Integrating Databricks Table with Web Page

Hi Guys,We need to integrate Databricks table with Web Page. Where I want to read Databricks table and show it in grid in WebPage and also want to give ability to update this table from web page. Databricks table is in Unity catalog. Have anyone trie...

  • 806 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

Thank you for your question! To integrate a Databricks table with a web page, you can follow these basic steps: Read the Table: Use a Databricks SQL endpoint or JDBC/ODBC driver to query the table from your web application backend. For example, if y...

  • 1 kudos
MattHeidebrecht
by New Contributor II
  • 3156 Views
  • 3 replies
  • 1 kudos

Resolved! Translations from T-SQL: TOP 1 OUTER APPLY or LEFT JOIN

Hi All,I am wondering how you would go about translating either of the below to Spark SQL in Databricks.  They are more or less equivalent statements in T-SQL.Please note that I am attempting to pair each unique Policy (IPI_ID) record with its highes...

  • 3156 Views
  • 3 replies
  • 1 kudos
Latest Reply
MattHeidebrecht
New Contributor II
  • 1 kudos

Thanks filipniziol!  I'll start running with that when I run into cases where I need an embedded TOP 1.

  • 1 kudos
2 More Replies
smit_tw
by New Contributor III
  • 1344 Views
  • 3 replies
  • 2 kudos

Resolved! Creating a Databricks Asset Bundle with Sequential Pipelines and Workflow using YAML

Is it possible to create a repository with a Databricks asset bundle that includes the following pipelines?Test1 (Delta Live Table Pipeline)Test2 (Delta Live Table Pipeline)Test3 (Delta Live Table Pipeline)Workflow JobWorkflow to execute the above pi...

  • 1344 Views
  • 3 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @smit_tw,Great! If this resolves your question, please consider marking it as the solution. It helps others in the community find answers more easily.

  • 2 kudos
2 More Replies
satyasamal
by New Contributor II
  • 2100 Views
  • 1 replies
  • 1 kudos

Resolved! org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows

Hello All,My Dataframe has 1 million records and it Contain XML files as column value . I am trying to parse the XML using Xpath function . It working fine for small records count . But it failed while trying to run 1 million records.Error Message : ...

  • 2100 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

Thank you for your question. The error is likely caused by memory issues or inefficient processing of the large dataset. Parsing XML with XPath is resource-intensive, and handling 1 million records requires optimization. You can try df = df.repartiti...

  • 1 kudos
fzlrfk
by New Contributor II
  • 1206 Views
  • 2 replies
  • 0 kudos

Data bricks Internal error

I am new to data bricks. I am trying to debug an existing notebook which failsintermittently (few times a month). Once it is re-run it runs fine. Any help will be appreciated. I have attached sample code. environment:Data bricks Runtime Version : 14....

  • 1206 Views
  • 2 replies
  • 0 kudos
Latest Reply
fzlrfk
New Contributor II
  • 0 kudos

HiThanks for your response. As I said the assistant give recommendation on changing the code. Would changing the code help ? When will a fix be released ? error: NoSuchElementException: None.get at scala.None$.get(Option.scala:527)assistant fix:The e...

  • 0 kudos
1 More Replies
ashraf1395
by Honored Contributor
  • 670 Views
  • 2 replies
  • 0 kudos

Issue while Migration from hive metastrore to ucx

tables : 4 tables in the schema :databricks_log that are not being migrated error showing they are in dbfs root location and cannot be managed while they are in dbfs mount locationExample this model_notebook _logs :   its location is dbfs:/mnt but in...

ashraf1395_1-1733480572318.jpeg ashraf1395_0-1733480572316.jpeg
  • 670 Views
  • 2 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@ashraf1395 To migrate managed tables stored at DBFS root to UC, you can do it through Deep Clone or Create Table As Select (CTAS). This also means that the HMS table data needs to be moved to the cloud storage location governed by UC. Please ensure ...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels