cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ZacayDaushin
by New Contributor
  • 2122 Views
  • 1 replies
  • 0 kudos

spline agent in Databricks use

spline Agent I use spline agent to get lineage of Databricks notebooks and for that i put the following code - attached to the notebook But i get the error attached%scalaimport scala.util.parsing.json.JSONimport za.co.absa.spline.harvester.SparkLinea...

  • 2122 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

Could be me  but I do not see an error message?

  • 0 kudos
ktsoi
by New Contributor III
  • 5211 Views
  • 4 replies
  • 0 kudos

Resolved! INVALID_STATE: Storage configuration limit exceeded, only 11 storage configurations are allowed

Our team are trying to set up a new workspace (8th workspace), but failed to create the storage configurations required for the new workspace with an error of INVALID_STATE: Storage configuration limit exceeded, only 11 storage configurations are all...

  • 5211 Views
  • 4 replies
  • 0 kudos
Latest Reply
_Architect_
Databricks Partner
  • 0 kudos

I solved the issue by simply going into Cloud Resources in Databricks console and navigated to "Credential Configuration" and "Storage Configuration" and deleted all the configurations which are not needed anymore(belongining to deleted workspaces)I ...

  • 0 kudos
3 More Replies
Arinjay
by New Contributor
  • 2330 Views
  • 1 replies
  • 0 kudos

Can not add comment on table via create table statement

I am not able to add comment using this create table statement with as (query)

Arinjay_0-1711492175110.png
  • 2330 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

  CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source [ OPTIONS ( key1=val1, key2=val2, ... ) ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED B...

  • 0 kudos
Haylyon
by New Contributor II
  • 12934 Views
  • 3 replies
  • 3 kudos

Missing 'DBAcademy DLT' as a Cluster Policy when creating Delta Live Tables pipeline

I am currently in the middle of the Data Engineering Associate course on the Databricks Partner Academy. I am on module 4 - "Build Data Pipelines with Delta Live Tables", and trying to complete the lab "DE 4.1 - DLT UI Walkthrough". I have successful...

  • 12934 Views
  • 3 replies
  • 3 kudos
Latest Reply
SeRo
New Contributor II
  • 3 kudos

Policy will be available after running/Users/<YOUR USER NAME>/Data Engineering with Databricks - v3.1.4/Includes/Workspace-Setup

  • 3 kudos
2 More Replies
brian999
by Contributor
  • 4948 Views
  • 3 replies
  • 0 kudos

Writing to Snowflake from Databricks - sqlalchemy replacement?

I am trying to migrate some complex python load processes into databricks. Our load processes currently use pandas and we're hoping to refactor into Spark soon. For now, I need to figure out how to alter our functions that get sqlalchemy connection e...

  • 4948 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@brian999  -  spark-snowflake connector is inbuilt into the DBR. Please refer to the below article for examples.  https://docs.databricks.com/en/connect/external-systems/snowflake.html#read-and-write-data-from-snowflake Please let us know if this hel...

  • 0 kudos
2 More Replies
kmodelew
by New Contributor III
  • 3554 Views
  • 1 replies
  • 0 kudos

TaskSensor - check if task is succeded

Hi,I would like to check if the task within job is succeded (even the job is marked as failed because on of the tasks).I need to create dependency for tasks within other jobs. The case is that I have one job for loading all tables for one country. Re...

  • 3554 Views
  • 1 replies
  • 0 kudos
JoseMacedo
by New Contributor II
  • 3831 Views
  • 3 replies
  • 0 kudos

How to cache on 500 billion rows

Hello!I'm using a server less SQL cluster on Data bricks and I have a dataset on Delta Table that has 500 billion rows. I'm trying to filter to have around 7 billion and the cache that dataset to use it on other queries and make it run faster.When I ...

  • 3831 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I missed the 'serverless sql' part.  CACHE is for spark, I don´t think it works for serverless sql.Here is how caching works on DBSQL.

  • 0 kudos
2 More Replies
yubin-apollo
by New Contributor II
  • 5660 Views
  • 4 replies
  • 0 kudos

COPY INTO skipRows FORMAT_OPTIONS does not work

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or ...

  • 5660 Views
  • 4 replies
  • 0 kudos
Latest Reply
karthik-kobai
New Contributor II
  • 0 kudos

@yubin-apollo: My bad - I had the skipRows in the COPY_OPTIONS and not in the FORMAT_OPTIONS. It works, please ignore my previous comment. Thanks

  • 0 kudos
3 More Replies
rchauhan
by New Contributor II
  • 24862 Views
  • 3 replies
  • 4 kudos

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4

When I am trying to read the data from sql server through jdbc connect , I get the below error while merging the data into databricks table . Can you please help whats the issue related to?  : org.apache.spark.SparkException: Job aborted due to stage...

  • 24862 Views
  • 3 replies
  • 4 kudos
Latest Reply
MDV
Databricks Partner
  • 4 kudos

@rchauhan did you find a solution to the problem or know what settings caused the problem ?

  • 4 kudos
2 More Replies
SankaraiahNaray
by New Contributor II
  • 4173 Views
  • 4 replies
  • 0 kudos

OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE

I created 15 Million records as a Delta Table and i'm running a simple filter query on that table based on one column value - which will return only one record. Because all the values on that column are unique.Delta Table is not partitioned.Before en...

  • 4173 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

it seems that for this specific query Liquid Clustering has worse performance.  It does not have better performance for all queries.The following are examples of scenarios that benefit from clustering:Tables often filtered by high cardinality columns...

  • 0 kudos
3 More Replies
mvmiller
by New Contributor III
  • 7048 Views
  • 2 replies
  • 3 kudos

Module not found, despite it being installed on job cluster?

We observed the following error in a notebook which was running from a Databricks workflow: ModuleNotFoundError: No module named '<python package>'The error message speaks for itself - it obviously couldn't find the python package.  What is peculiar ...

  • 7048 Views
  • 2 replies
  • 3 kudos
Latest Reply
mvmiller
New Contributor III
  • 3 kudos

Thanks, @Walter_C.  Supposing that your second possible explanation, Cluster Initialization Timing, could be a factor, are there any best practices or recommendations for preventing this from being a recurring issue, down the road?

  • 3 kudos
1 More Replies
Etyr
by Contributor II
  • 4885 Views
  • 2 replies
  • 1 kudos

[FinOps] Tagging queries in databricks

Hello,I see that it is possible to tag catalogs/databases/tables. But I did not find a way to tag a query for our finop use case.In Azure you can check billings dependings on tags.A concrete example: In Azure Machine Learning, I have a schedule that ...

  • 4885 Views
  • 2 replies
  • 1 kudos
Latest Reply
Etyr
Contributor II
  • 1 kudos

@yoav Hello, sorry I am not interrested in a payed solution.

  • 1 kudos
1 More Replies
Avi759787
by New Contributor
  • 3332 Views
  • 0 replies
  • 0 kudos

Driver is up but is not responsive, likely due to GC.

I am using Interactive cluster to run frequent (every 15min) batch job.After certain time (example: 6hours), the cluster continuously starts showing Driver is up but is not responsive, likely due to GC. in event log and all jobs starts failing.If the...

  • 3332 Views
  • 0 replies
  • 0 kudos
WearBeard
by New Contributor
  • 4694 Views
  • 1 replies
  • 0 kudos

Consume updated data from the Materialized view and send it as append to a streaming table

Hello everyone! I'm using DLT and I'm pretty new to them. I'm trying to take the updates from a materialized view and send them to a streaming table as an append.For example, if I have a MV of 400 records, I want an append to be made to the streaming...

  • 4694 Views
  • 1 replies
  • 0 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 0 kudos

Hi @WearBeard By default, streaming tables require append-only sources. The encountered error is due to an update or delete operation on the 'streaming_table_test'. To fix this issue, perform a Full Refresh on the 'streaming_table_test' table. You ca...

  • 0 kudos
Labels