cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 21217 Views
  • 3 replies
  • 1 kudos
  • 21217 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hugh_Ku
New Contributor II
  • 1 kudos

I've also run into the same issue, customised docker image does not give DATABRICKS_RUNTIME_VERSION as env. I believe there are still many issues in how customised docker image is used in databricks cluster.Can anyone from databricks help answer it?

  • 1 kudos
2 More Replies
varshini_reddy
by New Contributor III
  • 2958 Views
  • 6 replies
  • 0 kudos

Databricks UC enabled but Lineage not found for one table

Databricks UC enabled but Lineage not found for one table whereas i can see the lineage for the other two, any idea on why is it?. Im performing few transformations to bronze data , taking good_data_transformed as a dataframe, creating a temp view fo...

  • 2958 Views
  • 6 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

It is because of using temp view. To debug further you would need to write all the source tables, transformations, target tables, actual lineage and expected lineage, but as a rule of thumb if the lineage is lost when using temp view.Lineage is captu...

  • 0 kudos
5 More Replies
karolinalbinsso
by New Contributor II
  • 4455 Views
  • 2 replies
  • 3 kudos

Resolved! How to access the job-Scheduling Date from within the notebook?

I have created a job that contains a notebook that reads a file from Azure Storage. The file-name contains the date of when the file was transferred to the storage. A new file arrives every Monday, and the read-job is scheduled to run every Monday. I...

  • 4455 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

Hi, I guess the files are in the same directory structure so that you can use cloud files autoloader. It will incrementally read only new files https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loaderSo it will ...

  • 3 kudos
1 More Replies
csmcpherson
by New Contributor III
  • 1574 Views
  • 1 replies
  • 1 kudos

Resolved! Workflow file watch - capture filename trigger

With respect to the file watch trigger in workflows, how can we capture what files and or path was identified as raising the trigger?  I'd like to use this information to set parameters based upon the file name and the file path  Thank you!  https://...

  • 1574 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @csmcpherson ,This is currently not supported, but databricks team is working on that idea according to below thread:Solved: File information is not passed to trigger job on f... - Databricks Community - 39266As a workaround, if you use autoloader...

  • 1 kudos
Govardhana
by New Contributor
  • 5874 Views
  • 1 replies
  • 1 kudos

Interview question for ADF

Hello,I am trying to attending interviews for Data Engineer i have 3 years Experince I am looking realtime interview question ,If any one have could you please share.Thank you,Govardhana

  • 5874 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Govardhana ,There are plenty of those question on the internet. Below, is the one that's actually quite good:- Top 40+ Azure Data Factory Interview Questions 2024 (k21academy.com)

  • 1 kudos
Djelany
by New Contributor II
  • 5184 Views
  • 3 replies
  • 1 kudos

Resolved! DLT Event Logs

Hi,Does anyone know what details:planning_information:technique_information[0]:cost under planning_information event type means in my DLT workflow system event logs? For context, I'm trying to track the cost per run of my DLT workflow and I do not ha...

  • 5184 Views
  • 3 replies
  • 1 kudos
Latest Reply
adriennn
Valued Contributor
  • 1 kudos

you can enable system.billing schema and see the costs of the runs from the usage table.

  • 1 kudos
2 More Replies
jay971
by New Contributor II
  • 2951 Views
  • 3 replies
  • 0 kudos

Error: Cannot use legacy parameters because the job has job parameters configured.

I created a job which has two Job Parameters. How can I use Databricks CLI to pass different values to those parameters. 

  • 2951 Views
  • 3 replies
  • 0 kudos
Latest Reply
jay971
New Contributor II
  • 0 kudos

The job ran but did not pick up the values from the CLI.

  • 0 kudos
2 More Replies
Saf4Databricks
by New Contributor III
  • 3532 Views
  • 2 replies
  • 0 kudos

Reading single file from Databricks DBFS

I have a Test.csv file in FileStore of DBFS in Databricks Community edition. When I try to read the file using With Open, I get the following error:FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/Test.csv' import os wi...

  • 3532 Views
  • 2 replies
  • 0 kudos
Latest Reply
Saf4Databricks
New Contributor III
  • 0 kudos

@EricRM It should work. Please see the accepted response from this same forum here. So, we still need to find a cause of the error. Following is the detailed error message. Maybe, this will help readers understand the issue better and help it resolve...

  • 0 kudos
1 More Replies
databicky
by Contributor II
  • 23671 Views
  • 13 replies
  • 4 kudos
  • 23671 Views
  • 13 replies
  • 4 kudos
Latest Reply
FerArribas
Contributor
  • 4 kudos

Hi @Hubert Dudek​,​Pandas API doesn't support abfss protocol.You have three options:​If you need to use pandas, you can write the excel to the local file system (dbfs) and then move it to ABFSS (for example with dbutils)Write as csv directly in abfss...

  • 4 kudos
12 More Replies
sakuraDev
by New Contributor II
  • 1371 Views
  • 1 replies
  • 2 kudos

Resolved! how does autoloader handle source outage

Hey guys,I've been looking for some docs on how autoloader manages the source outage, I am currently running the following code: dfBronze = (spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .schema(json_schema_b...

sakuraDev_0-1725478024362.png
  • 1371 Views
  • 1 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @sakuraDev ,1. Using the availableNow trigger to process all available data immediately and then stop the query. As you noticed your data was processed once and now you need to trigger the process once again to process new files.2. Changing the tr...

  • 2 kudos
Soma
by Valued Contributor
  • 5588 Views
  • 6 replies
  • 3 kudos

Resolved! Dynamically supplying partitions to autoloader

We are having a streaming use case and we see a lot of time in listing from azure.Is it possible to supply partition to autoloader dynamically on the fly

  • 5588 Views
  • 6 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@somanath Sankaran​ - Thank you for posting your solution. Would you be happy to mark your answer as best so that other members may find it more quickly?

  • 3 kudos
5 More Replies
188386
by New Contributor II
  • 2050 Views
  • 2 replies
  • 0 kudos

Databricks Learning - Get Started with Databricks for Data Engineering -> Next button not active

Hi,Databricks Learning - Get Started with Databricks for Data Engineering (ID: E-03ZW80) got stuck at lesson where file "get-started-with-data-engineering-on-databricks-2.1.zip" is downloaded. The "Next" button is not active - see attached picture.Li...

  • 2050 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 0 kudos

Hi @188386 As I can see you have skipped multiple lesson(Introduction to Data on Databricks) first complete it in sequence then next button will be enabled for you.

  • 0 kudos
1 More Replies
IN
by New Contributor II
  • 1099 Views
  • 1 replies
  • 1 kudos

Connect to remote SQL Server (add databricks cluster IP to the whitelist)

Hi, I would need to connect from a notebook on workspaces to a remote SQL server instance. This server is protected by a firewall, thus I would need to add an IP address to the whitelist. Ideally, if it would be possible to setup/allocate static IP a...

  • 1099 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @IN ,In Azure, you can deploy Workspace in VNET Injection mode and attach NAT Gateway to your VNET.NAT GW require Public IP.  This IP will be your static egress IP for all Cluster in for this Workspace.I've never worked with GCP,  but I think you ...

  • 1 kudos
Henrik_
by New Contributor III
  • 3275 Views
  • 1 replies
  • 1 kudos

Resolved! Optimizing recursive joins on group and UNION-operations.

The code snippet below takes each group (based on id) and perform recursive joins to build parent-child relations  (id1 and id2) within a group. The code produce the correct output, an array in column 'path'.However, in my real world use-case, this c...

  • 3275 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

The recursive join is definitely a performance killer.  It will blow up the query plan.So I would advice against using it.Alternatives?  Well, a fixed amount of joins for example, if that is an option of course.Using a graph algorithm is also an opti...

  • 1 kudos
n_joy
by Contributor
  • 7003 Views
  • 6 replies
  • 2 kudos

Resolved! Change data feed for tables with allowColumnDefaults property "enabled"

I have a Delta table already created, with both enabled the #enableChangeDataFeed option and #allowColumnDefaults properties. However when writing to CDC table with streaming queries it fails with the following error [CREATE TABLE command because it ...

  • 7003 Views
  • 6 replies
  • 2 kudos
Latest Reply
n_joy
Contributor
  • 2 kudos

@filipniziolYes, that is what I do  Thanks for feedback ! 

  • 2 kudos
5 More Replies
Labels