cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Frustrated_DE
by New Contributor III
  • 2284 Views
  • 4 replies
  • 0 kudos

Delta live tables multiple .csv diff schemas

Hi all,      I have a fairly straight-forward task whereby I am looking to ingest six .csv file all with different names, schema's and blob locations into individual tables on one bronze schema. I have the files in my landing zone under different fol...

  • 2284 Views
  • 4 replies
  • 0 kudos
Latest Reply
Frustrated_DE
New Contributor III
  • 0 kudos

The code follows similar pattern below to load the different tables. import dltimport reimport pyspark.sql.functions as Flanding_zone = '/Volumes/bronze_dev/landing_zone/'source = 'addresses'@Dlt.table(comment="addresses snapshot",name="addresses")de...

  • 0 kudos
3 More Replies
Braxx
by Contributor II
  • 12151 Views
  • 4 replies
  • 3 kudos

Resolved! cluster creation - access mode option

I am a bit lazy and trying to manually recreate a cluster I have in one workspace into another one. The cluster was created some time ago. Looking at the configuration, the access mode field is "custom": When trying to create a new cluster, I do not...

Captureaa Capturebb
  • 12151 Views
  • 4 replies
  • 3 kudos
Latest Reply
khushboo20
New Contributor II
  • 3 kudos

Hi All - I am new to databricks and trying to create my first workflow. For some reason, the cluster created is of type -"custom". I have not mentioned it anywhere in my asset bundle.Due to this - I cannot create get the Unity Catalog feature. Could ...

  • 3 kudos
3 More Replies
tonyd
by New Contributor II
  • 1066 Views
  • 1 replies
  • 0 kudos

Getting error "Serverless Generic Compute Cluster Not Supported For External Creators."

Getting the above mentioned error while creating serverless compute. this is the request curl --location 'https://adb.azuredatabricks.net/api/2.0/clusters/create' \--header 'Content-Type: application/json' \--header 'Authorization: ••••••' \--data '{...

  • 1066 Views
  • 1 replies
  • 0 kudos
Latest Reply
saikumar246
Databricks Employee
  • 0 kudos

Hi @tonyd Thank you for reaching out to the Databricks Community. You are trying to create a Serverless Generic Compute Cluster which is not supported. You cannot create a Serverless compute Cluster. As per the below link, if you observe, there is no...

  • 0 kudos
PushkarDeole
by New Contributor III
  • 2316 Views
  • 2 replies
  • 0 kudos

Unable to set shuffle partitions on DLT pipeline

Hello,We are using a 5 worker node DLT job compute for a continuous mode streaming pipeline. The worker configuration is Standard_D4ads_v5 i.e. 4 cores so total cores across 5 workers is 20 cores.We have wide transformation at some places in the pipe...

  • 2316 Views
  • 2 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Try setting  spark.sql.shuffle.partitions to auto

  • 0 kudos
1 More Replies
ckwan48
by New Contributor III
  • 25838 Views
  • 6 replies
  • 3 kudos

Resolved! How to prevent my cluster to shut down after inactivity

Currently, I am running a cluster that is set to terminate after 60 minutes of inactivity. However, in one of my notebooks, one of the cells is still running. How can I prevent this from happening, if want my notebook to run overnight without monito...

  • 25838 Views
  • 6 replies
  • 3 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 3 kudos

If a cell is already running ( I assume it's a streaming operation), then I think it doesn't mean that the cluster is inactive. The cluster should be running if a cell is running on it.On the other hand, if you want to keep running your clusters for ...

  • 3 kudos
5 More Replies
Check
by New Contributor
  • 3466 Views
  • 1 replies
  • 0 kudos

How to call azure databricks api from azure api management

Hi,Has anyone successfully configure azure apim to access databricks rest api ? If yes, appreciate  he can provide the setup guide for me as I am stuck at this point.  Thanks.

Check_0-1712215875654.png
  • 3466 Views
  • 1 replies
  • 0 kudos
Latest Reply
kkgupta
New Contributor II
  • 0 kudos

@Check Did you manage to complete the mosaic gateway URL in the Azure APIM .@Retired_mod  Do we have any other generic link like databricks document which we can refer .Thanks 

  • 0 kudos
CE
by New Contributor II
  • 1878 Views
  • 2 replies
  • 0 kudos

how to git integration multiple gitlab repo

I have 3 GitLab repos in my Databricks workspace.I have also generated personal tokens for these 3 repos. However, it seems that Databricks can only use one repo token at a time for Git integration.For example: I am currently using the token for repo...

CE_1-1727339329608.png CE_3-1727339430317.png CE_4-1727339467488.png
  • 1878 Views
  • 2 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Unfortunately this is expected behavior. We only support 1 Git credential at a time per user in the workspace. We are adding a sample notebook in this section to help you swap Git credentials more easily.  https://docs.databricks.com/en/repos/repos-s...

  • 0 kudos
1 More Replies
mppradeesh
by New Contributor
  • 1013 Views
  • 1 replies
  • 1 kudos

Connecting to redshift from databricks notebooks without password using IAM

Hello all,Have you ever tried to connect to redshift from databricks notebooks without password using IAM. Pradeesh M P  

  • 1013 Views
  • 1 replies
  • 1 kudos
Latest Reply
datastones
Contributor
  • 1 kudos

you should create an IAM assumed role, adding the principal as the aws account that hosts your databricks env. 

  • 1 kudos
lisaiyer
by New Contributor II
  • 1533 Views
  • 3 replies
  • 0 kudos

Resolved! fs ls lists files that i cannot navigate to an view

Hi Community - I have an issue and I did not find any effective solution so hoping someone can help here. When I use %fs ls "dbfs:/Workspace/Shared/" I see 2 folders but when i navigate to the folder I only see 1. Can someone help me with this issue....

lisaiyer_0-1727709884491.png lisaiyer_1-1727709994294.png lisaiyer_2-1727710024754.png
  • 1533 Views
  • 3 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Possible UI bug. Submit a Case with the Engineering team.  https://help.databricks.com/s/ (Right top SUBMIT CASE)

  • 0 kudos
2 More Replies
Srini41
by New Contributor
  • 1173 Views
  • 1 replies
  • 0 kudos

org.rocksdb.RocksDBException: No space left on device

Structured stream is failing intermittently with following message. org.rocksdb.RocksDBException: No space left on device .Do we have a settings on databricks to assigned limited diskpace to the checkpoint tracking?Appreciate any help with resolving ...

  • 1173 Views
  • 1 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Can you share details like the DBR version, and are you using anyForEachBatch?

  • 0 kudos
my_community2
by New Contributor III
  • 22821 Views
  • 9 replies
  • 6 kudos

Resolved! dropping a managed table does not remove the underlying files

the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL  table. An exception is thrown if the table does not exist.In case of an external table...

image.png
  • 22821 Views
  • 9 replies
  • 6 kudos
Latest Reply
MajdSAAD_7953
New Contributor II
  • 6 kudos

Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days 

  • 6 kudos
8 More Replies
PearceR
by New Contributor III
  • 17236 Views
  • 4 replies
  • 1 kudos

Resolved! custom upsert for delta live tables apply_changes()

Hello community :).I am currently implementing some pipelines using DLT. They are working great for my medalion architecture for landed json in bronze -> silver (using apply_changes) then materialized gold views ontop.However, I am attempting to crea...

  • 17236 Views
  • 4 replies
  • 1 kudos
Latest Reply
Harsh141220
New Contributor II
  • 1 kudos

Is it possible to have custom upserts for streaming tables in delta live tables?Im getting the error:pyspark.errors.exceptions.captured.AnalysisException: `blusmart_poc.information_schema.sessions` is not a Delta table.

  • 1 kudos
3 More Replies
Ambika
by New Contributor
  • 6297 Views
  • 2 replies
  • 1 kudos

Error while Resetting my Community Edition Password

I recently tried to create my account with Databricks Community Edition. I have singed up for it and received verification email. After that I have to reset my password. But while doing so I am always getting the following error. Can someone help me ...

image
  • 6297 Views
  • 2 replies
  • 1 kudos
Latest Reply
swredb
New Contributor II
  • 1 kudos

Receiving the same error when creating a new account - "An error has occurred. Please try again later."

  • 1 kudos
1 More Replies
pinaki1
by New Contributor III
  • 2355 Views
  • 1 replies
  • 0 kudos

PySparkRuntimeError: [CONTEXT_ONLY_VALID_ON_DRIVER] It appears that you are attempting to reference

Getting The above error for this lineresult_df.rdd.foreachPartition(self.process_partition)

  • 2355 Views
  • 1 replies
  • 0 kudos
Latest Reply
Pradeep54
Databricks Employee
  • 0 kudos

The error message "CONTEXT_ONLY_VALID_ON_DRIVER" indicates that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that runs on workers. This is ...

  • 0 kudos
Labels