cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Eren_DE
by New Contributor
  • 1439 Views
  • 2 replies
  • 0 kudos

legacy Git integration has been removed from Notebook

How to integrate a notebook saved in workspace folder with git repos with feature branch? 

  • 1439 Views
  • 2 replies
  • 0 kudos
Latest Reply
FierceSkirtsist
New Contributor II
  • 0 kudos

Integrating a Databricks notebook with a Git repository using a feature branch sounds like a clean workflow for version control. The step-by-step process makes it straightforward to collaborate and track changes effectively. It's great that Databrick...

  • 0 kudos
1 More Replies
Leigh_Turner
by New Contributor
  • 1251 Views
  • 1 replies
  • 0 kudos

dataframe checkpoint when checkpoint location on abfss

 I'm trying to switch checkpoint locations from dbfs to abfss and i have noticed the following behaviour.The spark.sparkContext.setCheckpointDir will fail unless I call...dbutils.fs.mkdirs(checkpoint_dir) in the same cell.On top of this, the df = df....

  • 1251 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

In DBFS, the checkpoint directory is automatically created when you set it using spark.sparkContext.setCheckpointDir(checkpoint_dir). This means that you do not need to explicitly create the directory beforehand using dbutils.fs.mkdirs(checkpoint_dir...

  • 0 kudos
SagarJi
by New Contributor II
  • 989 Views
  • 2 replies
  • 0 kudos

Data skipping statistics column datatype constraint

Is there any column datatype constraint for the first 32 columns used for the stats that help data skipping?

  • 989 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

There are no specific column datatype constraints for the first 32 columns used for the statistics that help with data skipping in Databricks. However, data skipping is not supported for partition columns.

  • 0 kudos
1 More Replies
pinaki1
by New Contributor III
  • 983 Views
  • 2 replies
  • 0 kudos

Datbricks delta sharing

Does caching works in delta sharing in databricks?

  • 983 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Delta Sharing in Databricks does support caching, specifically through the use of Delta Cache.Delta cache will be enabled by default on the recipient cluster. Delta Cache helps improve the performance of data reads by storing a copy of the data on th...

  • 0 kudos
1 More Replies
6502
by New Contributor III
  • 1198 Views
  • 1 replies
  • 0 kudos

UDF already defined error when using it into a DLT pipeline

I'm using Unity catalog and defined some UDFs in my catalog.database, as reported by show functions in main.default main.default.getgender main.default.tointlist main.default.tostrlistI can use them from a start warehouse pro:SELECT main.default.get_...

  • 1198 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Now the Python UDF support is currently in public preview, and it is supported in the current channel as well. Could you please try to re-run the code.  https://docs.databricks.com/en/delta-live-tables/unity-catalog.html

  • 0 kudos
elgeo
by Valued Contributor II
  • 2359 Views
  • 1 replies
  • 0 kudos

Interactive Notebook with widgets

Dear experts,We need to create a notebook in order the users to insert/update values in a specific table. We have created one using widgets. However the code performed per action selected is visible to the users. Is there a way to have in a the dropd...

  • 2359 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

You can separate the user interface (UI) from the code logic by using two notebooks. 1) The first notebook will contain the code that performs the actions based on the widget inputs. 2) The second notebook will contain the widgets that users interact...

  • 0 kudos
nggianno
by New Contributor III
  • 5717 Views
  • 3 replies
  • 10 kudos

How can I activate enzyme for delta live tables (or dlt serverless) ?

Hi!I am using Delta Live Tables and especially materialized views and i want to run a dlt pipeline but not rerun the whole view that cost time, but rather only update and add only the values that have been changed. I saw that Enzyme and does this job...

  • 5717 Views
  • 3 replies
  • 10 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 10 kudos

To achieve "PARTITION OVERWRITE" instead of "COMPLETE RECOMPUTE" when running a Delta Live Table (DLT) materialized view, you need to configure your pipeline to use incremental refresh. This can be done by setting up your DLT pipeline to use serverle...

  • 10 kudos
2 More Replies
zero234
by New Contributor III
  • 1982 Views
  • 1 replies
  • 0 kudos

Data is not loaded when creating two different streaming table from one delta live table pipeline

 i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.where as when i try to run each table indiv...

Data Engineering
dlt
spark
STREAMINGTABLE
  • 1982 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

Delta Live Tables (DLT) can indeed process multiple streaming tables within a single pipeline.Here are a few things to check:1) Verify that each streaming table has a unique checkpoint location. Checkpointing is crucial for maintaining the state of s...

  • 0 kudos
Jyo777
by Contributor
  • 6242 Views
  • 7 replies
  • 4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

  • 6242 Views
  • 7 replies
  • 4 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 4 kudos

Not a comparison, but there is a DB-SQL cheatsheet at https://www.databricks.com/sites/default/files/2023-09/databricks-sql-cheatsheet.pdf/

  • 4 kudos
6 More Replies
hukel
by Contributor
  • 424 Views
  • 1 replies
  • 0 kudos

Python function using Splunk SDK works in Python notebook but not in SQL notebook

Background:I've created a small function in a notebook that uses Splunk's splunk-sdk  package.  The original intention was to call Splunk to execute a search/query, but for the sake of simplicity while testing this issue,  the function only prints pr...

  • 424 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

When you run a Python function in a Python cell, it executes in the local Python environment of the notebook. However, when you call a Python function from a SQL cell, it runs as a UDF within the Spark execution environment.  You need to define the f...

  • 0 kudos
Miguel_Salas
by New Contributor II
  • 640 Views
  • 1 replies
  • 1 kudos

Last file in S3 folder using autoloader

Nowadays we already use the autoloader with checkpoint location, but I still wanted to know if it is possible to read only the last updated file within a folder. I know it somewhat loses the purpose of checkpoint locatioAnother question is it possibl...

  • 640 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

Auto loader's scope is limited to incrementally loading files from storage, and there is no such functionality to just load the latest file from a group of files, you'd likely want to have this kind of "last updated" logic in a different layer or in ...

  • 1 kudos
smit_tw
by New Contributor III
  • 434 Views
  • 2 replies
  • 1 kudos

APPLY AS DELETE without operation

We are performing a full load of API data into the Bronze table (append only), and then using the APPLY CHANGES INTO query to move data from Bronze to Silver using Stream to get only new records. How can we also utilize the APPLY AS DELETE functional...

  • 434 Views
  • 2 replies
  • 1 kudos
Latest Reply
smit_tw
New Contributor III
  • 1 kudos

@szymon_dybczak is it possible to do directly in Silver layer? We do not have option to go fo Gold layer. I tried to create a TEMP VIEW in Silver DLT pipeline but it gives error for circular dependency as I am comparing data from Silver it self and a...

  • 1 kudos
1 More Replies
Gilg
by Contributor II
  • 2668 Views
  • 4 replies
  • 1 kudos

Multiple Autoloader reading the same directory path

HiOriginally, I only have 1 pipeline looking to a directory. Now as a test, I cloned the existing pipeline and edited the settings to a different catalog. Now both pipelines is basically reading the same directory path and running continuous mode.Que...

  • 2668 Views
  • 4 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

To answer the original question, autoloader does not use locks when reading files. You are however limited by the underlying storage system, ADLS in this example. Going by what has been mentioned (long batch times, but spark jobs finish really fast) ...

  • 1 kudos
3 More Replies
kmorton
by New Contributor
  • 1880 Views
  • 1 replies
  • 0 kudos

Autoloader start and end date for ingestion

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date t...

Data Engineering
autoloader
backfill
ETL
ingestion
  • 1880 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

If the files have already been loaded by autoloader (like same name and path), this can be tricky. I recommend starting a separate autoloader stream and specifying filters on it to match your start and end dates. If you'd instead like to rely on the ...

  • 0 kudos
gupta_tanmay
by New Contributor II
  • 545 Views
  • 1 replies
  • 0 kudos

How to Connect Pyspark to Unity Catalog on Kubernetes with Data Stored in MinIO?

https://stackoverflow.com/questions/79177219/how-to-connect-spark-to-unity-catalog-on-kubernetes-with-data-stored-in-minio? I have posted the question on stack overflow.I am trying to register catalog using pyspark. 

Data Engineering
unity_catalog
  • 545 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Thanks for your question! Although it shouldn't be necessary, could you please try the following: Set the spark.databricks.sql.initial.catalog.name configuration to my_catalog in your Spark session to ensure the correct catalog is initialized. Use ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels