cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cmathieu
by New Contributor III
  • 785 Views
  • 4 replies
  • 0 kudos

DAB - All projects files deployed

I have an issue with DAB where all the project files, starting from root ., get deployed to the /files folder in the bundle. I would prefer being able to deploy certain util notebooks, but not all the files of the project. I'm able to not deploy any ...

  • 785 Views
  • 4 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

@cmathieu , It will support  deployment of whole directory and not others as well.

  • 0 kudos
3 More Replies
DylanStout
by Contributor
  • 386 Views
  • 2 replies
  • 0 kudos

Resolved! Error while reading file from Cloud Storage

The code we are executing: df = spark.read.format("parquet").load("/mnt/g/drb/HN/") df.write.mode('overwrite').saveAsTable("bronze.HN")the error it throws:org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 faile...

  • 386 Views
  • 2 replies
  • 0 kudos
Latest Reply
DylanStout
Contributor
  • 0 kudos

spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")

  • 0 kudos
1 More Replies
DylanStout
by Contributor
  • 657 Views
  • 0 replies
  • 0 kudos

Pyspark ML tools

Cluster policies not letting us use Pyspark ML toolsIssue details: We have clusters available in our Databricks environment and our plan was to use functions and classes from "pyspark.ml" to process data and train our model in parallel across cores/n...

  • 657 Views
  • 0 replies
  • 0 kudos
MarcoRezende
by New Contributor II
  • 683 Views
  • 0 replies
  • 0 kudos

Slow performance in REFRESH MATERIALIZED VIEW over CTAS

Hello guys, i have some materialized views created in my databricks workspace and after 1 change in one of them, it became 3x slower (9 minutes to 30 minutes). After some debugging i found that the bottleneck process in the execution plan is one call...

  • 683 Views
  • 0 replies
  • 0 kudos
Rajt1
by New Contributor
  • 196 Views
  • 1 replies
  • 0 kudos

Job , Task, Stage Creation

I am running below code -df = spark.read.json('xyz.json')df.countI want to understand the actual working of the spark. How many jobs & stages will be created. I want to understand the detailed & easier concept of how it works?

  • 196 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Rajt1! When you execute df = spark.read.json('xyz.json’), Spark does not read the file immediately. Data is only read when an action like count() is triggered. Job: df.count() triggers one job because it's an action.Stage: Reading JSON and cou...

  • 0 kudos
khangnguyen164
by New Contributor II
  • 352 Views
  • 3 replies
  • 0 kudos

Error "insert concurrent to Delta Lake" when 2 streaming merge data to same table at the same time

Hello everyone ,We currently have 2 streaming (Bronze job) created on 2 tasks in the same job, running the same compute job and both merge data into the same table (Silver table). If I create it like above, sometimes I get an error related to "insert...

  • 352 Views
  • 3 replies
  • 0 kudos
Latest Reply
khangnguyen164
New Contributor II
  • 0 kudos

 Anyone else can help me this case

  • 0 kudos
2 More Replies
YOUKE
by New Contributor III
  • 495 Views
  • 2 replies
  • 0 kudos

Resolved! Connecting to SQL on Databricks Using SQLAlchemy or pyodbc

On Databricks, when I try to connect to SQL using SQLAlchemy or pyodbc to run delete queries on a specific table, I get this error: (pyodbc.Error) ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not ...

  • 495 Views
  • 2 replies
  • 0 kudos
Latest Reply
YOUKE
New Contributor III
  • 0 kudos

I was able to solve the problem! the problem was because the driver was missing and so pyodbc or sqlAlchemy can't find it. So I used the native Java API and it is working.This is the example code:jdbcUsername = "username"jdbcPassword = "password"driv...

  • 0 kudos
1 More Replies
dc-rnc
by Contributor
  • 835 Views
  • 1 replies
  • 0 kudos

Writing to Delta Table and retrieving back the IDs doesn't work

Hi.I have a workflow in which I write few rows into a Delta Table with auto-generated IDs. Then, I need to retrieve them back just after they're written into the table to collect those generated IDs, so I read the table and I use two columns (one is ...

dcrnc_4-1743006179065.png
  • 835 Views
  • 1 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

I'm interested too in this problem.. someone could help?

  • 0 kudos
IGRACH
by New Contributor II
  • 320 Views
  • 1 replies
  • 1 kudos

Unable to delete a table

When I try to delete a table, I'm getting this error:[ErrorClass=INVALID_STATE] TABLE catalog.schema.table_name cannot be deleted because it is being shared via Delta Sharing.I have checked on the internet about it, but could not find any info about ...

  • 320 Views
  • 1 replies
  • 1 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 1 kudos

Hi @IGRACH ,You are facing this issue bcz I guess the table you want to delete is being shared by delta sharing. You can go to the shared object by following this dochttps://docs.databricks.com/aws/en/delta-sharing/create-share#update-sharesandThen, ...

  • 1 kudos
HoussemBL
by New Contributor III
  • 655 Views
  • 3 replies
  • 0 kudos

External tables in DLT pipelines

Hello community,I have implemented a DLT pipeline.In the "Destination" setting of the pipeline I have specified a unity catalog with target schema of type external referring to an S3 destination.My DLT pipeline works well. Yet, I noticed that all str...

  • 655 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sushil_saini
New Contributor II
  • 0 kudos

This won't work.best approach is create dlt sink to write to delta external table. This pipeline should only be 1 step. Read table and append flow using data sink. It works fine. 

  • 0 kudos
2 More Replies
a_user12
by New Contributor III
  • 393 Views
  • 1 replies
  • 0 kudos

Resolved! databricks bundle Deploy: exit code 0 even if an error occurs

We have a CI/CD pipeline where we run:databricks bundle deploy [...]The code works fine, however, if we missconfigure it, we see in the output an error message such asDeploying resources... Updating deployment state... Warning: Detected unresolved va...

Data Engineering
asset bundle
  • 393 Views
  • 1 replies
  • 0 kudos
Latest Reply
a_user12
New Contributor III
  • 0 kudos

you can close it: it was an ci/cd issue

  • 0 kudos
matanper
by New Contributor III
  • 4672 Views
  • 6 replies
  • 1 kudos

Custom docker image fails to initalize

I'm trying to use a custom docker image for my job. This is my docker file:FROM databricksruntime/standard:12.2-LTS COPY . . RUN /databricks/python3/bin/pip install -U pip RUN /databricks/python3/bin/pip install -r requirements.txt USER rootMy job ...

  • 4672 Views
  • 6 replies
  • 1 kudos
Latest Reply
mrstevegross
Contributor III
  • 1 kudos

Did y'all ever figure this out? I'm running in a similar issue.

  • 1 kudos
5 More Replies
badari_narayan
by New Contributor II
  • 288 Views
  • 1 replies
  • 0 kudos

Having an issue assigning databricks_current_metastore with terraform provider

I am trying to assign my databricks_current_metastore on terraform and I get the following error back as an output Error: cannot read current metastore: cannot get client current metastore: invalid Databricks Workspace configurationwith data.databric...

  • 288 Views
  • 1 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@badari_narayan Based on above terraform code, you are trying to use the databricks.accounts provider to read the current workspace metastore, which is incorrect — the databricks_current_metastore data source is a workspace-level resource, and must b...

  • 0 kudos
johschmidt42
by New Contributor II
  • 676 Views
  • 2 replies
  • 0 kudos

Autoloader cloudFiles.maxFilesPerTrigger ignored with .trigger(availableNow=True)?

Hi, I'm using the Auto Loader feature to read streaming data from Delta Lake files and process them in a batch. The trigger is set to availableNow to include all new data from the checkpoint offset but I limit the amount of delta files for the batch ...

  • 676 Views
  • 2 replies
  • 0 kudos
Latest Reply
p_romm
New Contributor III
  • 0 kudos

In doc it is: "cloudFiles.maxFilesPerTrigger" https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options

  • 0 kudos
1 More Replies
pt16
by New Contributor II
  • 526 Views
  • 2 replies
  • 0 kudos

Enable automatic identity management in Azure Databricks

We have Databricks account admin access but not able to see the option from Databricks admin console to enable automatic identity management.Using the Previews page wanted to enable and fallowed below steps:1. As an account admin, log in to the accou...

  • 526 Views
  • 2 replies
  • 0 kudos
Latest Reply
pt16
New Contributor II
  • 0 kudos

After raising Databrick ticket, today I am able to see the Automatic Identity Management  public preview option 

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels