cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yubin-apollo
by New Contributor II
  • 3316 Views
  • 4 replies
  • 0 kudos

COPY INTO skipRows FORMAT_OPTIONS does not work

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or ...

  • 3316 Views
  • 4 replies
  • 0 kudos
Latest Reply
karthik-kobai
New Contributor II
  • 0 kudos

@yubin-apollo: My bad - I had the skipRows in the COPY_OPTIONS and not in the FORMAT_OPTIONS. It works, please ignore my previous comment. Thanks

  • 0 kudos
3 More Replies
rchauhan
by New Contributor II
  • 18509 Views
  • 3 replies
  • 4 kudos

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4

When I am trying to read the data from sql server through jdbc connect , I get the below error while merging the data into databricks table . Can you please help whats the issue related to?  : org.apache.spark.SparkException: Job aborted due to stage...

  • 18509 Views
  • 3 replies
  • 4 kudos
Latest Reply
MDV
New Contributor III
  • 4 kudos

@rchauhan did you find a solution to the problem or know what settings caused the problem ?

  • 4 kudos
2 More Replies
SankaraiahNaray
by New Contributor II
  • 2464 Views
  • 4 replies
  • 0 kudos

OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE

I created 15 Million records as a Delta Table and i'm running a simple filter query on that table based on one column value - which will return only one record. Because all the values on that column are unique.Delta Table is not partitioned.Before en...

  • 2464 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

it seems that for this specific query Liquid Clustering has worse performance.  It does not have better performance for all queries.The following are examples of scenarios that benefit from clustering:Tables often filtered by high cardinality columns...

  • 0 kudos
3 More Replies
mvmiller
by New Contributor III
  • 3985 Views
  • 2 replies
  • 3 kudos

Module not found, despite it being installed on job cluster?

We observed the following error in a notebook which was running from a Databricks workflow: ModuleNotFoundError: No module named '<python package>'The error message speaks for itself - it obviously couldn't find the python package.  What is peculiar ...

  • 3985 Views
  • 2 replies
  • 3 kudos
Latest Reply
mvmiller
New Contributor III
  • 3 kudos

Thanks, @Walter_C.  Supposing that your second possible explanation, Cluster Initialization Timing, could be a factor, are there any best practices or recommendations for preventing this from being a recurring issue, down the road?

  • 3 kudos
1 More Replies
Etyr
by Contributor
  • 2876 Views
  • 2 replies
  • 1 kudos

[FinOps] Tagging queries in databricks

Hello,I see that it is possible to tag catalogs/databases/tables. But I did not find a way to tag a query for our finop use case.In Azure you can check billings dependings on tags.A concrete example: In Azure Machine Learning, I have a schedule that ...

  • 2876 Views
  • 2 replies
  • 1 kudos
Latest Reply
Etyr
Contributor
  • 1 kudos

@yoav Hello, sorry I am not interrested in a payed solution.

  • 1 kudos
1 More Replies
monojmckvie
by New Contributor II
  • 2899 Views
  • 2 replies
  • 1 kudos

Delta Live Table - Cannot redefine dataset

Hi,I am new to Delta Live Table.I am trying to create a delta live table from the databricks tutorial.I have created a notebook and attached an interactive cluster -DBR 14.3-LTS.I am running the below code.When I ran it for the 1st time it ran succes...

Data Engineering
Delta Live Table
dlt
  • 2899 Views
  • 2 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

The error message "AnalysisException: Cannot redefine dataset 'sales_orders_raw'" is indicating that you're trying to create a table that already exists. In Databricks, once a Delta Live Table (DLT) is defined, it cannot be redefined or overwritten. ...

  • 1 kudos
1 More Replies
Avi759787
by New Contributor
  • 2382 Views
  • 0 replies
  • 0 kudos

Driver is up but is not responsive, likely due to GC.

I am using Interactive cluster to run frequent (every 15min) batch job.After certain time (example: 6hours), the cluster continuously starts showing Driver is up but is not responsive, likely due to GC. in event log and all jobs starts failing.If the...

  • 2382 Views
  • 0 replies
  • 0 kudos
pgruetter
by Contributor
  • 19596 Views
  • 7 replies
  • 1 kudos

Resolved! How to use Service Principal to connect PowerBI to Databrick SQL Warehouse

Hi allI'm struggling to connect PowerBI service to a Databricks SQL Warehouse using a service principal. I'm following mostly this guide.I created a new app registration in the AAD and created a client secret for it.Now I'm particularly struggling wi...

  • 19596 Views
  • 7 replies
  • 1 kudos
Latest Reply
pgruetter
Contributor
  • 1 kudos

In the end, once the Service Principal is properly authorized on the Databricks side, I had to create a Personal Access Token for the Service Principal using the Databricks API. On the Power BI service side I then had to use username = 'token' and as...

  • 1 kudos
6 More Replies
WearBeard
by New Contributor
  • 1888 Views
  • 1 replies
  • 0 kudos

Consume updated data from the Materialized view and send it as append to a streaming table

Hello everyone! I'm using DLT and I'm pretty new to them. I'm trying to take the updates from a materialized view and send them to a streaming table as an append.For example, if I have a MV of 400 records, I want an append to be made to the streaming...

  • 1888 Views
  • 1 replies
  • 0 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 0 kudos

Hi @WearBeard By default, streaming tables require append-only sources. The encountered error is due to an update or delete operation on the 'streaming_table_test'. To fix this issue, perform a Full Refresh on the 'streaming_table_test' table. You ca...

  • 0 kudos
Sas
by New Contributor II
  • 1611 Views
  • 0 replies
  • 0 kudos

Not able to create mount point in Databricks

HiI am trying to create mount point in Azure Databricks, but mount point creation is failing with below error messageDBUtils.FS Handler.mount() got an unexpected keyword argument 'extra_config'I am using following codedef setup_mount(storage_account_...

  • 1611 Views
  • 0 replies
  • 0 kudos
badari_narayan
by New Contributor II
  • 1264 Views
  • 1 replies
  • 0 kudos

Exam got suspended without any reason

Hi Team,My Databricks Certified Associate Developer for Apache Spark 3.0 - Python exam got suspended on 7th March 2024I was there continuously in front of the camera and suddenly the alert appeared, and support person asked me to show the full table ...

  • 1264 Views
  • 1 replies
  • 0 kudos
Latest Reply
vinay076
New Contributor III
  • 0 kudos

hi @badari_narayan did you exam got rescheduled..i am also facing same issue my exam got suspemded 

  • 0 kudos
felix_counter
by New Contributor III
  • 2244 Views
  • 2 replies
  • 0 kudos

Fail to install package dependency located on private pypi server during .whl installation

Hello,I recently switched from DBR 12.2 LTS to DBR 13.3 LTS and observed the following behavior:My goal is to install a python library from a .whl file. I am using the UI for this task (Cluster settings -> Libraries -> Install new -> 'Python Whl' as ...

  • 2244 Views
  • 2 replies
  • 0 kudos
Latest Reply
robbe
New Contributor III
  • 0 kudos

Hey Felix, I have run into a similar issue recently (my wheel needs a Git HTTPS redirect that's specified in the init script - but I can install it fine from inside a notebook).I wonder whether you found a solution (perhaps moving a more recent DBR v...

  • 0 kudos
1 More Replies
mvmiller
by New Contributor III
  • 2321 Views
  • 1 replies
  • 0 kudos

How to ignore Writestream UnknownFieldException error

I have a parquet file that I am trying to write to a delta table:df.writeStream  .format("delta")  .option("checkpointLocation", f"{targetPath}/delta/{tableName}/__checkpoints")  .trigger(once=True)  .foreachBatch(processTable)  .outputMode("append")...

  • 2321 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@mvmiller - Per the below documentation, The stream will fail with unknownFieldException, the schema evolution mode by default is addNewColumns. so, Databricks recommends configuring Auto Loader streams with workflows to restart automatically after s...

  • 0 kudos
RTabur
by New Contributor II
  • 1307 Views
  • 2 replies
  • 0 kudos

[Bug] Orphan storage location

Hello,I'm not able to re-create an external location after removing its owner from Databricks Account. I'm getting the following error:Input path url 'abfss://foo@bar.dfs.core.windows.net/' overlaps with an existing external location within 'CreateEx...

  • 1307 Views
  • 2 replies
  • 0 kudos
Latest Reply
PL_db
Databricks Employee
  • 0 kudos

Your metastore admin can list all external locationsYour metastore admin can then drop the external location 

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels