cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 40621 Views
  • 7 replies
  • 0 kudos

Resolved! Tuning shuffle partitions

Is the best practice for tuning shuffle partitions to have the config "autoOptimizeShuffle.enabled" on? I see it is not switched on by default. Why is that?

  • 40621 Views
  • 7 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

AQE applies to all queries that are:Non-streamingContain at least one exchange (usually when there’s a join, aggregate, or window), one sub-query, or both.Not all AQE-applied queries are necessarily re-optimized. The re-optimization might or might no...

  • 0 kudos
6 More Replies
ayush25091995
by New Contributor III
  • 1414 Views
  • 1 replies
  • 0 kudos

how to pass page_token while calling API to get query history in SQL warehouse

Hi,I am getting each queryid getting duplicated in next page when calling API query history for SQL warehouse in next page, how ever page token is different for different pages.how should we pass Page token ?since in databricks doc, it is mentioned w...

  • 1414 Views
  • 1 replies
  • 0 kudos
Latest Reply
ayush25091995
New Contributor III
  • 0 kudos

any help on this plz?

  • 0 kudos
oakhill
by New Contributor III
  • 2294 Views
  • 4 replies
  • 0 kudos

Optimal process for loading data where the full dataset is provided every day?

We receive several datasets where the full dump is delivered daily or weekly. What is the best way to ingest this into Databricks using DLT or basic PySpark while adhering to the medallion?1. If we use AutoLoader into Bronze, We'd end up with increme...

  • 2294 Views
  • 4 replies
  • 0 kudos
Latest Reply
dbrx_user
Databricks Partner
  • 0 kudos

Agree with @Witold to apply CDC as early as possible. Depending on where the initial files get deposited, I'd recommend having an initial raw layer to your medallion which is just your cloud storage account - so each day or week the files get deposit...

  • 0 kudos
3 More Replies
Spencer_Kent
by New Contributor III
  • 24269 Views
  • 10 replies
  • 7 kudos

Shared cluster configuration that permits `dbutils.fs` commands

My workspace has a couple different types of clusters, and I'm having issues using the `dbutils` filesystem utilities when connected to a shared cluster. I'm hoping you can help me fix the configuration of the shared cluster so that I can actually us...

insufficient_permissions_on_shared_cluster shared_cluster_config individual_use_cluster
  • 24269 Views
  • 10 replies
  • 7 kudos
Latest Reply
jacovangelder
Databricks MVP
  • 7 kudos

Can you not use a No Isolation Shared cluster with Table access controls enabled on workspace level? 

  • 7 kudos
9 More Replies
vinaykumar
by Databricks Partner
  • 5427 Views
  • 4 replies
  • 1 kudos

Can define custom session variable for login user authentication in databricks for Row -Column level security .

can create custom session variable for login user authentication in databricks .Like HANA session Variables, we have scenarios like today’s spotfire where we use a single generic user to connect to HANA ( we don’t have single sign on enabled ) in th...

  • 5427 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @vinay kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 1 kudos
3 More Replies
Wolverine
by New Contributor III
  • 2393 Views
  • 3 replies
  • 1 kudos

Databricks Magic Command

I am trying few commands what is the equivalent magic command of dbutils.fs.rm("dbfs:/sampledir",True) Actually I am looking how to use magic commands in same way as dbutils  . For Instance  dbutils.fs.head('dbfs:/FileStore/<<name>>.csv',10) Gives 10...

  • 2393 Views
  • 3 replies
  • 1 kudos
Latest Reply
Witold
Databricks Partner
  • 1 kudos

You could use shell commands, like %sh rm -r sampledir You need to check for the correct path before, I currently don't know where dbfs folders are exactly mounted

  • 1 kudos
2 More Replies
ajithgaade
by New Contributor III
  • 4240 Views
  • 8 replies
  • 2 kudos

Autoloader includeExistingFiles with retry didn't update the schema

Hi,written in pyspark.databricks autoloader job with retry didn't merge/update the schema.spark.readStream.format("cloudFiles").option("cloudFiles.format", "parquet").option("cloudFiles.schemaLocation", checkpoint_path).option("cloudFiles.includeExis...

  • 4240 Views
  • 8 replies
  • 2 kudos
Latest Reply
mtajmouati
Contributor
  • 2 kudos

Hello,Try this :  from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder \ .appName("Auto Loader Schema Evolution") \ .getOrCreate() # Source and checkpoint paths source_path = "s3://path" checkpoint_pa...

  • 2 kudos
7 More Replies
ksenija
by Contributor
  • 2385 Views
  • 4 replies
  • 0 kudos

DLT pipeline - DebeziumJDBCMicroBatchProvider not found

Hi!I created DLT pipeline and I'm getting this error:[STREAM_FAILED] Query [id = ***, runId = ***] terminated with exception: object com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found.I'm using Serverless.How to verify that the require...

Data Engineering
DebeziumJDBCMicroBatchProvider
dlt
  • 2385 Views
  • 4 replies
  • 0 kudos
Latest Reply
ksenija
Contributor
  • 0 kudos

@Dnirmania, @jlachniet I didn’t manage to resolve this issue, but I created a regular notebook and I’m using MERGE statement. If you can’t merge all data at once, you can use a loop with hourly intervals

  • 0 kudos
3 More Replies
EDDatabricks
by Databricks Partner
  • 3872 Views
  • 2 replies
  • 0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering
Concurrency
DBR 14.3
delta
MERGE
  • 3872 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dilisha
New Contributor II
  • 0 kudos

Hi @EDDatabricks  - were you able to find the fix for this? I am also facing a similar issue. Added more details here  - Getting concurrent Append exception after upgradin... - Databricks Community - 76521

  • 0 kudos
1 More Replies
FhSpZ
by New Contributor II
  • 1299 Views
  • 1 replies
  • 0 kudos

Error AgnosticEncoder.isStruct() in Intellij using Scala locally.

I've been trying to execute a connect to Azure Databricks from Intellij using Scala locally, but I've got this error below: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.AgnosticEncoder.isStruct()Zat o...

  • 1299 Views
  • 1 replies
  • 0 kudos
Latest Reply
FhSpZ
New Contributor II
  • 0 kudos

Hi @Retired_mod,I ensured that I was using the correct Spark version that matched the version of my databricks runtime, which was the same. But I tried use the Spark version 3.5.1 locally in the .sbt dependencies, then this worked, kind strange.Anywa...

  • 0 kudos
avrm91
by Databricks Partner
  • 4515 Views
  • 4 replies
  • 1 kudos

How to load xlsx Files to Delta Live Tables (DLT)?

I want to load a .xlsx file to DLT but struggling as it is not available with Autoloader.With the Assistant I tried to load the .xlsx first to a data frame and then send it to DLT.  import dlt from pyspark.sql import SparkSession # Load xlsx file in...

  • 4515 Views
  • 4 replies
  • 1 kudos
Latest Reply
avrm91
Databricks Partner
  • 1 kudos

Added a feature request into Azure Community PortalXLSX - DLT Autoloader · Community (azure.com)

  • 1 kudos
3 More Replies
avrm91
by Databricks Partner
  • 2023 Views
  • 2 replies
  • 2 kudos

Resolved! XBRL File Format

I was searching for some XBRL documentation with Databricks as it is a business reporting standard format.Especially for DLT and Autoloader. Is there anything in the development pipeline? 

  • 2023 Views
  • 2 replies
  • 2 kudos
Latest Reply
avrm91
Databricks Partner
  • 2 kudos

I added the XBRL to the Azure communityXBRL · Community (azure.com)

  • 2 kudos
1 More Replies
FrankTa
by Databricks Partner
  • 3620 Views
  • 2 replies
  • 2 kudos

Resolved! Unstable workflow runs lately

Hi!We are using Databricks on Azure on production since about 3 months. A big part of what we use Databricks for is processing data using a workflow with various Python notebooks. We run the workflow on a 'Pools' cluster and on a 'All-purpose compute...

  • 3620 Views
  • 2 replies
  • 2 kudos
Latest Reply
FrankTa
Databricks Partner
  • 2 kudos

Hi holly,Thanks for your reply, good to hear that the 403 errors are on the radar and due to be fixed. I will reach out to support in case of further issues.

  • 2 kudos
1 More Replies
Labels