cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TheoDeSo
by New Contributor III
  • 15828 Views
  • 8 replies
  • 5 kudos

Resolved! Error on Azure-Databricks write output to blob storage account

Hello,After implementing the use of Secret Scope to store Secrets in an azure key vault, i faced a problem.When writting an output to the blob i get the following error:shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access con...

  • 15828 Views
  • 8 replies
  • 5 kudos
Latest Reply
nguyenthuymo
New Contributor III
  • 5 kudos

Hi all,Is it correct that Azure-Databricks only support to write data to Azure Data Lake Gen2 and does not support for Azure Storage Blob (StorageV2 - general purpose) ?In my case, I can read the data from Azure Storage Blob (StorageV2 - general purp...

  • 5 kudos
7 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 2488 Views
  • 1 replies
  • 5 kudos

Notebook cell output results limit increased- 10,000 rows or 2 MB. Hi all, Now, databricks start showing the first 10000 rows instead of 1000 rows.Tha...

Notebook cell output results limit increased- 10,000 rows or 2 MB.Hi all,Now, databricks start showing the first 10000 rows instead of 1000 rows.That will reduce the time of re-execution while working on fewer sizes of data that have rows between 100...

  • 2488 Views
  • 1 replies
  • 5 kudos
Latest Reply
F_Goudarzi
New Contributor III
  • 5 kudos

Hi Ajay,Is there any way to increase this limit?Thanks, Fatima

  • 5 kudos
ac0
by Contributor
  • 8588 Views
  • 3 replies
  • 3 kudos

"Fatal error: The Python kernel is unresponsive." DBR 14.3

Running almost any notebook with a merge statement in Databricks with DBR 14.3 I get the following error and the notebook exists:"Fatal error: The Python kernel is unresponsive."I would provide more code, but like I said, it is pretty much anything w...

  • 8588 Views
  • 3 replies
  • 3 kudos
Latest Reply
markthepaz
New Contributor II
  • 3 kudos

Same thing, not finding any documentation out there around "spark.databricks.driver.python.pythonHealthCheckTimeoutSec". @ac0 or @Ayushi_Suthar any more details you found on this?

  • 3 kudos
2 More Replies
ChristianRRL
by Valued Contributor II
  • 545 Views
  • 2 replies
  • 2 kudos

Resolved! PKEY Upserting Pattern With Older Runtimes

Hi there,I'm aware that nowadays newer runtimes of Databricks support some great features, including primary and foreign key constraints. I'm wondering, if we have clusters that are running older runtime versions, are there Upserting patterns that ha...

  • 545 Views
  • 2 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

For clusters running older Databricks runtime versions, such as 13.3, you can still implement upserting patterns effectively, even though they may not support the latest features like primary and foreign key constraints available in newer runtimes. O...

  • 2 kudos
1 More Replies
Sega2
by New Contributor III
  • 1553 Views
  • 6 replies
  • 1 kudos

Resolved! Debugging a workflow

Hi All,With the new availability of debugging notebooks are really great. But how do you debug a workflow? Any suggestions on recommended practices on how to debug workflows? Best R, Thomas

  • 1553 Views
  • 6 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

@Sega2  Please refer to this doc and confirm if it's helpfulhttps://docs.databricks.com/en/jobs/repair-job-failures.html#re-run-failed-and-skipped-tasks

  • 1 kudos
5 More Replies
surajitDE
by New Contributor III
  • 1204 Views
  • 3 replies
  • 0 kudos

How to Schedule stop and start for a Continuous DLT pipeline

I have a use case where in there is a DLT pipeline in continuous mode, the requirement is to run this pipeline on scheduled basis i.e everyday it starts at 8am and stops at 8pm. I see a schedule option to start the job but don't see any option to sto...

  • 1204 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mike_Szklarczyk
Contributor
  • 0 kudos

You can terminate any Job with REST API.I recommend to use Python SDK jobs.cancel_all_runs() methodhttps://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html#databricks.sdk.service.jobs.JobsExt.cancel_all_runs

  • 0 kudos
2 More Replies
Zeruno
by New Contributor II
  • 841 Views
  • 1 replies
  • 1 kudos

How to use DLT Expectations for uniqueness checks on a dataset?

I am using dlt through python to build a DLT pipeline. One of things I would like to do is to check that each incoming row does not exist in the target table; i want to be sure that each row is unique.I am confused because it seems like this is not p...

  • 841 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mauro
New Contributor II
  • 1 kudos

I also have the same doubt, about the implementation of the uniqueness rule

  • 1 kudos
NehaR
by New Contributor III
  • 1058 Views
  • 3 replies
  • 1 kudos

Way to enforce partition column in where clause

Hi All,I want to know if is it possible to enforce that all queries must include a partition filter if the delta table is a partition table in databricks?I tried the below option and set the required property but it doesn't work and I can still query...

Data Engineering
databricks delta table
Delta table
partition
  • 1058 Views
  • 3 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

At the moment there is not ETA, but will keep you posted!

  • 1 kudos
2 More Replies
rvo19941
by New Contributor II
  • 2435 Views
  • 1 replies
  • 0 kudos

Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream

Dear,I am working on a real-time use case and am therefore using Auto Loader with file notification to ingest json files from a Gen2 Azure Storage Account in real-time. Full refreshes of my table work fine but I noticed Auto Loader was not picking up...

Data Engineering
ADLS
Auto Loader
Event Subscription
File Notification
Queue Storage
  • 2435 Views
  • 1 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@rvo19941 -  Can you share your autoloder config.

  • 0 kudos
Tiwarisk
by New Contributor III
  • 1230 Views
  • 6 replies
  • 0 kudos

Dynamic IP address in databricks

Everytime I am running a script in databricks which fetches data from a sql server(different Azure resource group) I am getting this error.com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open server 'proddatabase' requested by the login. Clie...

Tiwarisk_0-1732001112472.png
  • 1230 Views
  • 6 replies
  • 0 kudos
Latest Reply
ameet9257
Contributor
  • 0 kudos

@Tiwarisk ,If your Databricks is under the secure VNET then whitelist the Private VNET address range. 

  • 0 kudos
5 More Replies
genevive_mdonça
by Databricks Employee
  • 1746 Views
  • 4 replies
  • 4 kudos

Spark Optimization

Optimizing Shuffle Partition Size in Spark for Large Joins I am working on a Spark join between two tables of sizes 300 GB and 5 GB, respectively. After analyzing the Spark UI, I noticed the following:- The average shuffle write partition size for th...

  • 1746 Views
  • 4 replies
  • 4 kudos
Latest Reply
Lakshay
Databricks Employee
  • 4 kudos

Have you tried using spark.sql.files.maxPartitionBytes=209715200

  • 4 kudos
3 More Replies
guangyi
by Contributor III
  • 1496 Views
  • 4 replies
  • 0 kudos

Resolved! What is the correct way to measure the performance of a Databrick notebook?

Here is my code for converting one column field of a data frame to time data type:  col_value = df.select(df.columns[0]).first()[0] start_time = time.time() col_value = datetime.strftime(col_value, "%Y-%m-%d %H:%M:%S") \ if isinstance(co...

  • 1496 Views
  • 4 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

How many columns do you have?

  • 0 kudos
3 More Replies
Vetrivel
by Contributor
  • 2831 Views
  • 7 replies
  • 2 kudos

Connection Challenges with Azure Databricks and SQL Server On VM in Serverless compute

We have established an Azure Databricks workspace within our central subscription, which hosts all common platform resources. Additionally, we have a SQL Server running on a virtual machine in a separate sandbox subscription, containing data that nee...

  • 2831 Views
  • 7 replies
  • 2 kudos
Latest Reply
Vetrivel
Contributor
  • 2 kudos

@Mo I have tried and got below error.Private access to resource type 'Microsoft.Compute/virtualMachines' is not supported with group id 'sqlserver'.I hope it supports only if the destinations are Blob, ADLS and Azure SQL.

  • 2 kudos
6 More Replies
Erik_L
by Contributor II
  • 5834 Views
  • 4 replies
  • 4 kudos

Resolved! Support for Parquet brotli compression or a work around

Spark 3.3.1 supports the brotli compression codec, but when I use it to read parquet files from S3, I get:INVALID_ARGUMENT: Unsupported codec for Parquet page: BROTLIExample code:df = (spark.read.format("parquet") .option("compression", "brotli")...

  • 5834 Views
  • 4 replies
  • 4 kudos
Latest Reply
Erik_L
Contributor II
  • 4 kudos

Given the new information I appended, I looked into the Delta caching and I can disable it:.option("spark.databricks.io.cache.enabled", False)This works as a work around while I read these files in to save them locally in DBFS, but does it have perfo...

  • 4 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels