cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DavidS1
by New Contributor
  • 818 Views
  • 1 replies
  • 0 kudos

Cost comparison of DLT to custom pipeline

Hello, our company currently has a number of custom pipelines written in python for ETL, and I want to do an evaluation of DLT to see if that will make things more efficient.  A problem is that there is a restriction on using DLT "because it is too e...

  • 818 Views
  • 1 replies
  • 0 kudos
Latest Reply
Zume
New Contributor II
  • 0 kudos

DLT is expensive in my opinion. I tried to run a simple notebook that just reads a parquet file into a dataframe and write it out to a cloud storage and I got an error that i hit my CPU instance limit for my azure subscription. I just gave up after t...

  • 0 kudos
joedata
by New Contributor
  • 2724 Views
  • 0 replies
  • 0 kudos

pywin32

A python module called pywin32 enables users to read an excel file, make changes to specific cells, execute a Refresh All which refreshes all the data connections, and save the changes made to an excel file. This cannot be used on databricks because ...

  • 2724 Views
  • 0 replies
  • 0 kudos
ksenija
by Contributor
  • 4005 Views
  • 8 replies
  • 1 kudos

DLT pipeline error key not found: user

When I try to create a DLT pipeline from a foreign catalog (BigQuery), I get this error: java.util.NoSuchElementException: key not found: user.I used the same script to copy Salesforce data and that worked completely fine.

  • 4005 Views
  • 8 replies
  • 1 kudos
Latest Reply
ksenija
Contributor
  • 1 kudos

Hi @lucasrocha ,Any luck with this error? I guess it's something with connection to BigQuery, but I didn't find anything.Best regards,Ksenija

  • 1 kudos
7 More Replies
Chinu
by New Contributor III
  • 3343 Views
  • 3 replies
  • 2 kudos

How do I access to DLT advanced configuration from python notebook?

Hi Team, Im trying to get DLT Advanced Configuration value from the python dlt notebook. For example, I set "something": "some path" in Advanced configuration in DLT and I want to get the value from my dlt notebook. I tried "dbutils.widgets.get("some...

  • 3343 Views
  • 3 replies
  • 2 kudos
Latest Reply
Mo
Databricks Employee
  • 2 kudos

here you can find the documentation on how to use parameters in dlt (sql and python): https://docs.databricks.com/en/delta-live-tables/settings.html#parameterize-dataset-declarations-in-python-or-sql

  • 2 kudos
2 More Replies
vanagnostopoulo
by New Contributor III
  • 2014 Views
  • 5 replies
  • 0 kudos

validate bundle does not work on windows 10 PRO x64

I use the databricks clidatabricks_cli_0.221.1_windows_amd64-signedand when I rundatabricks bundle validatein my project I get"Error: no shell found" In git-bash it works but I have other problems there.

  • 2014 Views
  • 5 replies
  • 0 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 0 kudos

That's strange. Just checking, are you running it from the right folder? (location of databricks.yml file)? 

  • 0 kudos
4 More Replies
WynanddB
by New Contributor III
  • 6063 Views
  • 4 replies
  • 3 kudos

Invalid characters in column name

I get the following error   com.databricks.sql.transaction.tahoe.DeltaAnalysisException: [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.It's a new instance of databricks a...

  • 6063 Views
  • 4 replies
  • 3 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 3 kudos

My guess is you have a new line character (\n) in one of the CSV header columns. You don't very easily spot them. Have you checked for that? You can also try .option("header","true") so Spark doesn't think of your header as content. Might also want t...

  • 3 kudos
3 More Replies
Nastia
by New Contributor III
  • 2374 Views
  • 7 replies
  • 2 kudos

Resolved! I keep getting dataset from spark.table command (instead of dataframe)

I am trying to create a simple dlt pipeline: @dlt.tabledef today_latest_execution():  return spark.sql("SELECT * FROM LIVE.last_execution") @on_event_hookdef write_events_to_x(event  if (     today_latest_execution().count() == 0      try:       ... ...

  • 2374 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

what if you do:return spark.sql("SELECT * FROM LIVE.last_execution").toDF()

  • 2 kudos
6 More Replies
RobinK
by Contributor
  • 11963 Views
  • 12 replies
  • 14 kudos

Resolved! Databricks Jobs do not run on job compute but on shared compute

Hello,since last night none of our ETL jobs in Databricks are running anymore, although we have not made any code changes.The identical jobs (deployed with Databricks asset bundles) run on an all-purpose cluster, but fail on a job cluster. We have no...

  • 11963 Views
  • 12 replies
  • 14 kudos
Latest Reply
jcap
New Contributor II
  • 14 kudos

I do not believe this is solved, similar to a comment over here:https://community.databricks.com/t5/data-engineering/databrickssession-broken-for-15-1/td-p/70585We are also seeing this error in 14.3 LTS from a simple example:from pyspark.sql.function...

  • 14 kudos
11 More Replies
Przemk00
by New Contributor II
  • 713 Views
  • 1 replies
  • 0 kudos

Facilitate if/else condition in conjuction with parameters

The current state: I have a working workflow with 3 tasks with several parameters.The change: I want to modify the workflow to add 4 tasks - if/else so that based on one of the parameters (call it xyz) the workflow will not proceed after 1st task.The...

  • 713 Views
  • 1 replies
  • 0 kudos
Latest Reply
Przemk00
New Contributor II
  • 0 kudos

The logic should be simple if the xyz parameter equals 1000 then run the other 2 tasks otherwise, do not run the rest.

  • 0 kudos
WWoman
by Contributor
  • 903 Views
  • 2 replies
  • 1 kudos

Is there a way to create a local CSV file by creating a local external table?

Hello,I have a user that would like to create a CSV file on their local file system by creating an external table (USING CSV) and specifying a local file for the path parameter using SQL. They will be running this command from a local client (DbVisua...

  • 903 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

not sure if this would work, but you could run Unity Catalog locally (possible since last week) and define the csv file as a table in that local UC. then query it.

  • 1 kudos
1 More Replies
aap_scott
by New Contributor
  • 640 Views
  • 1 replies
  • 0 kudos

Cannot navigate to workspace directory in multi-node cluster

When I open a terminal on a multi-node cluster, I cannot navigate to the workspace directoryHowever, on a single node cluster, it works fineThanks in advance 

aap_scott_0-1718313043369.png aap_scott_1-1718313219551.png
  • 640 Views
  • 1 replies
  • 0 kudos
Latest Reply
NateAnth
Databricks Employee
  • 0 kudos

If this cluster is backed by an AWS Graviton instance, there is currently a limitation with the web terminal not being able to interact with the Workspace Filesystem.  Please give it a try in the notebook cell with the %sh magic command or switch to ...

  • 0 kudos
vamsivarun007
by New Contributor II
  • 41721 Views
  • 5 replies
  • 2 kudos

Driver is up but is not responsive, likely due to GC.

Hi all, "Driver is up but is not responsive, likely due to GC." This is the message in cluster event logs. Can anyone help me with this. What does GC means? Garbage collection? Can we control it externally?

  • 41721 Views
  • 5 replies
  • 2 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 2 kudos

9/10 times GC is due to out of memory exceptions.@Jaron spark.catalog.clearCache() is not a configurable option, but rather a command to submit.

  • 2 kudos
4 More Replies
mysteryuser000
by New Contributor
  • 610 Views
  • 0 replies
  • 0 kudos

dlt pipeline will not create live tables

I have created a dlt pipeline based four sql notebooks, each containing between 1 and 3 queries.  Each query begins with "create or refresh live table ..." yet each one outputs a materialized view.  I have tried deleting the materialized views and ru...

  • 610 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels