cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Adrianj
by New Contributor III
  • 5168 Views
  • 10 replies
  • 6 kudos

Databricks Bundles - How to select which jobs resources to deploy per target?

Hello, My team and I are experimenting with bundles, we follow the pattern of having one main file Databricks.yml and each job definition specified in a separate yaml for modularization. We wonder if it is possible to select from the main Databricks....

  • 5168 Views
  • 10 replies
  • 6 kudos
Latest Reply
thibault
Contributor II
  • 6 kudos

Hi @Adrianj , which solution did you go for? I have 4 deployment targets so I would like to avoid having to create 4 folders with many duplicates.

  • 6 kudos
9 More Replies
Spencer_Kent
by New Contributor III
  • 7458 Views
  • 11 replies
  • 3 kudos

Shared cluster configuration that permits `dbutils.fs` commands

My workspace has a couple different types of clusters, and I'm having issues using the `dbutils` filesystem utilities when connected to a shared cluster. I'm hoping you can help me fix the configuration of the shared cluster so that I can actually us...

insufficient_permissions_on_shared_cluster shared_cluster_config individual_use_cluster
  • 7458 Views
  • 11 replies
  • 3 kudos
Latest Reply
jacovangelder
Contributor III
  • 3 kudos

Can you not use a No Isolation Shared cluster with Table access controls enabled on workspace level? 

  • 3 kudos
10 More Replies
bgerhardi
by New Contributor III
  • 6444 Views
  • 12 replies
  • 13 kudos

Surrogate Keys with Delta Live

We are considering moving to Delta Live tables from a traditional sql-based data warehouse. Worrying me is this FAQ on identity columns Delta Live Tables frequently asked questions | Databricks on AWS this seems to suggest that we basically can't cre...

  • 6444 Views
  • 12 replies
  • 13 kudos
Latest Reply
Anonymous
Not applicable
  • 13 kudos

Hi @Brett Gerhardi​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 13 kudos
11 More Replies
vinaykumar
by New Contributor III
  • 2385 Views
  • 4 replies
  • 1 kudos

Can define custom session variable for login user authentication in databricks for Row -Column level security .

can create custom session variable for login user authentication in databricks .Like HANA session Variables, we have scenarios like today’s spotfire where we use a single generic user to connect to HANA ( we don’t have single sign on enabled ) in th...

  • 2385 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @vinay kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 1 kudos
3 More Replies
Wolverine
by New Contributor III
  • 237 Views
  • 3 replies
  • 1 kudos

Databricks Magic Command

I am trying few commands what is the equivalent magic command of dbutils.fs.rm("dbfs:/sampledir",True) Actually I am looking how to use magic commands in same way as dbutils  . For Instance  dbutils.fs.head('dbfs:/FileStore/<<name>>.csv',10) Gives 10...

  • 237 Views
  • 3 replies
  • 1 kudos
Latest Reply
Witold
New Contributor III
  • 1 kudos

You could use shell commands, like %sh rm -r sampledir You need to check for the correct path before, I currently don't know where dbfs folders are exactly mounted

  • 1 kudos
2 More Replies
ajithgaade
by New Contributor III
  • 416 Views
  • 8 replies
  • 2 kudos

Autoloader includeExistingFiles with retry didn't update the schema

Hi,written in pyspark.databricks autoloader job with retry didn't merge/update the schema.spark.readStream.format("cloudFiles").option("cloudFiles.format", "parquet").option("cloudFiles.schemaLocation", checkpoint_path).option("cloudFiles.includeExis...

  • 416 Views
  • 8 replies
  • 2 kudos
Latest Reply
mtajmouati
New Contributor II
  • 2 kudos

Hello,Try this :  from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder \ .appName("Auto Loader Schema Evolution") \ .getOrCreate() # Source and checkpoint paths source_path = "s3://path" checkpoint_pa...

  • 2 kudos
7 More Replies
ksenija
by Contributor
  • 371 Views
  • 5 replies
  • 0 kudos

DLT pipeline - DebeziumJDBCMicroBatchProvider not found

Hi!I created DLT pipeline and I'm getting this error:[STREAM_FAILED] Query [id = ***, runId = ***] terminated with exception: object com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found.I'm using Serverless.How to verify that the require...

Data Engineering
DebeziumJDBCMicroBatchProvider
dlt
  • 371 Views
  • 5 replies
  • 0 kudos
Latest Reply
ksenija
Contributor
  • 0 kudos

@Dnirmania, @jlachniet I didn’t manage to resolve this issue, but I created a regular notebook and I’m using MERGE statement. If you can’t merge all data at once, you can use a loop with hourly intervals

  • 0 kudos
4 More Replies
EDDatabricks
by Contributor
  • 1105 Views
  • 3 replies
  • 0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering
Concurrency
DBR 14.3
delta
MERGE
  • 1105 Views
  • 3 replies
  • 0 kudos
Latest Reply
Dilisha
New Contributor II
  • 0 kudos

Hi @EDDatabricks  - were you able to find the fix for this? I am also facing a similar issue. Added more details here  - Getting concurrent Append exception after upgradin... - Databricks Community - 76521

  • 0 kudos
2 More Replies
Yulei
by New Contributor III
  • 7786 Views
  • 5 replies
  • 1 kudos

Resolved! Could not reach driver of cluster

 Hi, Rencently, I am seeing issue Could not reach driver of cluster <some_id> with my structure streaming job when migrating to unity catalog and found this when checking the traceback:Traceback (most recent call last):File "/databricks/python_shell/...

  • 7786 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kub4S
New Contributor II
  • 1 kudos

To expand on the same error "Could not reach driver of cluster XX" but different cause;the reason in my case (ADF triggered databricks job which runs into this error) was a problem with a numpy library version, where solution is to downgrade the libr...

  • 1 kudos
4 More Replies
FhSpZ
by New Contributor II
  • 137 Views
  • 2 replies
  • 0 kudos

Error AgnosticEncoder.isStruct() in Intellij using Scala locally.

I've been trying to execute a connect to Azure Databricks from Intellij using Scala locally, but I've got this error below: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.AgnosticEncoder.isStruct()Zat o...

  • 137 Views
  • 2 replies
  • 0 kudos
Latest Reply
FhSpZ
New Contributor II
  • 0 kudos

Hi @Kaniz_Fatma,I ensured that I was using the correct Spark version that matched the version of my databricks runtime, which was the same. But I tried use the Spark version 3.5.1 locally in the .sbt dependencies, then this worked, kind strange.Anywa...

  • 0 kudos
1 More Replies
avrm91
by New Contributor III
  • 462 Views
  • 4 replies
  • 1 kudos

How to load xlsx Files to Delta Live Tables (DLT)?

I want to load a .xlsx file to DLT but struggling as it is not available with Autoloader.With the Assistant I tried to load the .xlsx first to a data frame and then send it to DLT.  import dlt from pyspark.sql import SparkSession # Load xlsx file in...

  • 462 Views
  • 4 replies
  • 1 kudos
Latest Reply
avrm91
New Contributor III
  • 1 kudos

Added a feature request into Azure Community PortalXLSX - DLT Autoloader · Community (azure.com)

  • 1 kudos
3 More Replies
avrm91
by New Contributor III
  • 146 Views
  • 2 replies
  • 1 kudos

XBRL File Format

I was searching for some XBRL documentation with Databricks as it is a business reporting standard format.Especially for DLT and Autoloader. Is there anything in the development pipeline? 

  • 146 Views
  • 2 replies
  • 1 kudos
Latest Reply
avrm91
New Contributor III
  • 1 kudos

I added the XBRL to the Azure communityXBRL · Community (azure.com)

  • 1 kudos
1 More Replies
aothman
by New Contributor
  • 101 Views
  • 1 replies
  • 0 kudos

Databricks on AWS Outposts rack

We need to know if Databricks can be implemented on a private cloud infrastructure or on AWS Outposts rack  

  • 101 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @aothman, Databricks currently supports public cloud providers such as Azure, GCP, and AWS, but it does not offer direct support for private cloud implementations beyond these.While Databricks itself does not directly support Outposts, you ca...

  • 0 kudos
hfyhn
by New Contributor
  • 118 Views
  • 1 replies
  • 0 kudos

DLT, combine LIVE table with data masking and row filter

I need to apply data masking and row filters to my table. At the same time I would like to use DLT Live tables. However, as far as I can see, DLT Live tables are not compatble with Live tables. What are my options? Move the tables from out of the mat...

  • 118 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @hfyhn, You’re correct that DLT Live tables don’t inherently support writing data from multiple live tables into a single target Delta table. This means you cannot directly configure two or more DLT pipelines to update the same table simultaneousl...

  • 0 kudos
FrankTa
by New Contributor II
  • 443 Views
  • 2 replies
  • 2 kudos

Resolved! Unstable workflow runs lately

Hi!We are using Databricks on Azure on production since about 3 months. A big part of what we use Databricks for is processing data using a workflow with various Python notebooks. We run the workflow on a 'Pools' cluster and on a 'All-purpose compute...

  • 443 Views
  • 2 replies
  • 2 kudos
Latest Reply
FrankTa
New Contributor II
  • 2 kudos

Hi holly,Thanks for your reply, good to hear that the 403 errors are on the radar and due to be fixed. I will reach out to support in case of further issues.

  • 2 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels