cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ehpogue
by New Contributor II
  • 11715 Views
  • 9 replies
  • 3 kudos

how do i re-enable tab complete / autocomplete?

yesterday all of my notebooks seemingly changed to have python formatting (which seems to be in this week's release), but the unintended consequence is that shift + tab (which used to show docstrings in python) now just un-indents code, and tab inser...

  • 11715 Views
  • 9 replies
  • 3 kudos
Latest Reply
Data_33
New Contributor II
  • 3 kudos

i also facing the same in databricks now.

  • 3 kudos
8 More Replies
rahulgulati89
by New Contributor II
  • 1770 Views
  • 2 replies
  • 0 kudos

Unable to connect to secured schema registry from Azure Databricks

Hi,I am unable to connect to secure schema registry(running on https) as it is breaking with below mentioned error.SCHEMA_REGISTRY_CONFIGURATION_ERROR] Schema from schema registry could not be initialized. Error while fetching schema for subject 'env...

rahulgulati89_0-1694516966345.png
  • 1770 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @rahulgulati89, The error message you're encountering indicates that the Java process cannot validate the SSL certificate presented by the Schema Registry. This is a common issue when the Schema Registry uses a self-signed certificate or a certif...

  • 0 kudos
1 More Replies
DataEng1
by New Contributor
  • 2049 Views
  • 1 replies
  • 0 kudos

data type that cannot participate in a columnstore index Error

Hi AllI am trying to insert DF into Synapse table. I need to insert string type columns in DF into Nvarchar fields in Synapse table. I am getting the error ' data type that cannot participate in a columnstore index Error'   Can someone guide on the i...

  • 2049 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @DataEng1, you are trying to insert a DataFrame with string type columns into a Synapse table with NVARCHAR fields and encountering a ’data type that cannot participate in a columnstore index Error’. The issue is likely occurring because you are t...

  • 0 kudos
shivalanka
by New Contributor
  • 850 Views
  • 1 replies
  • 0 kudos

Unable to create workspace after deleting existing one without deleting clusters and other resources

Unable to create workspace after deleting existing one without deleting clusters and other resources. 

  • 850 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @shivalanka, it seems you are encountering issues due to not deleting clusters and other resources associated with the workspace before deleting the workspace. Databricks recommends terminating all clusters and instance pools associated with a wo...

  • 0 kudos
DeltaTrain
by New Contributor II
  • 1016 Views
  • 1 replies
  • 0 kudos

Access Control in hive_metastore Based on Cluster Type

Hello Databricks Community, I asked the same question on the Get Started Discussion page but feels like here is the right place for this question. I'm reaching out with a query regarding access control in the hive_metastore. I've encountered behavior...

DeltaTrain_0-1691618617261.png DeltaTrain_1-1691618617263.png
  • 1016 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752239289
Valued Contributor
  • 0 kudos

That is expected. The single user mode is the legacy standard + UC ACL enabled. https://docs.databricks.com/en/archive/compute/cluster-ui-preview.html#how-does-backward-compatibility-work-with-these-changes For your case, you need the hive table acl ...

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 11025 Views
  • 1 replies
  • 3 kudos

Workflow timeout

Always set a timeout for your jobs! It not only safeguards against unforeseen hang-ups but also optimizes resource utilization. Equally essential is to consider having a threshold warning. This can alert you before a potential failure, allowing proac...

ezgif-2-283506cee0.gif
  • 11025 Views
  • 1 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Thank you for sharing this @Hubert-Dudek 

  • 3 kudos
YSDPrasad
by New Contributor III
  • 4334 Views
  • 4 replies
  • 3 kudos

Resolved! NoClassDefFoundError: scala/Product$class

import com.microsoft.azure.sqldb.spark.config.Configimport com.microsoft.azure.sqldb.spark.connect._import com.microsoft.azure.sqldb.spark.query._val query = "Truncate table tablename"val config = Config(Map( "url"     -> dbutils.secrets.get(scope = ...

  • 4334 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Someswara Durga Prasad Yaralgadda​ (Customer)​, We haven’t heard from you since the last response from @Suteja Kanuri​  (Customer)​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it w...

  • 3 kudos
3 More Replies
adriennn
by Contributor
  • 1453 Views
  • 2 replies
  • 1 kudos

Resolved! Delay when updating Bronze and Silver tables in the same notebook (DBR 13.1)

I created a notebook that uses Autoloader to load data from storage and append it to a bronze table in the first cell, this works fine and Autoloader picks up new data when it arrives (the notebook is ran using a Job).In the same notebook, a few cell...

  • 1453 Views
  • 2 replies
  • 1 kudos
Latest Reply
adriennn
Contributor
  • 1 kudos

Thanks @Kaniz_Fatma, in a case where it's not possible or not practical to implement a pipeline  with DLTs, what would be that "retry mechanism" based on ? I.e., is there an API other that the table history that can be leveraged to retry until "it wo...

  • 1 kudos
1 More Replies
Nino
by Contributor
  • 1145 Views
  • 2 replies
  • 1 kudos

cluster nodes unavailable scenarios

Concerning job cluster configuration, I'm trying to figure out what happens if AWS node type availability is smaller than the minimum number of workers specified in the configuration json (either availabilty<num_workers or, for autoscaling, availabil...

  • 1145 Views
  • 2 replies
  • 1 kudos
Latest Reply
Nino
Contributor
  • 1 kudos

thanks, @Kaniz_Fatma , useful info!My specific scenario is running a notebook task with Job Clusters, and I've noticed that I get the best overall notebook run time by going without Autoscaling, setting the cluster configuration with a fixed `num_wor...

  • 1 kudos
1 More Replies
DE-cat
by New Contributor III
  • 1296 Views
  • 1 replies
  • 1 kudos

Resolved! DatabricksStreamingQueryListener Stopping the stream

I am running the following structured streaming Scala code in DB 13.3LTS job:  val query = spark.readStream.format("delta") .option("ignoreDeletes", "true") .option("maxFilesPerTrigger", maxEqlPerBatch) .load(tblPath) .writeStream .qu...

  • 1296 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @DE-cat ,  • The given code is a structured streaming Scala code that reads data from a Delta table, processes it, and writes the output to a streaming sink.• The job gets cancelled around 30 minutes after starting with error messages like DAGSche...

  • 1 kudos
Fiona
by New Contributor II
  • 3154 Views
  • 3 replies
  • 1 kudos

Resolved! Reading a protobuf file in a Databricks notebook

I have proto files (offline data storage) that I'd like to read from a Databricks notebook. I found this documentation (https://docs.databricks.com/structured-streaming/protocol-buffers.html), but it only covers how to read the protobuf data once the...

  • 3154 Views
  • 3 replies
  • 1 kudos
Latest Reply
StephanK
New Contributor II
  • 1 kudos

If you have proto files in offline data storage, you should be able to read them with:input_df = spark.read.format("binaryFile").load(data_path) 

  • 1 kudos
2 More Replies
DE-cat
by New Contributor III
  • 1279 Views
  • 2 replies
  • 0 kudos

err:setfacl: Option -m: Invalid argument LibraryDownloadManager error

When starting a DB job using 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12) cluster, I am seeing a lots of these errors in log4j output. Any ideas? Thx23/09/11 13:24:14 ERROR CommandLineHelper$: Command [REDACTED] failed with exit code 2 out: err...

Data Engineering
LibraryDownloadManager
  • 1279 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @DE-cat , To configure an AWS instance connection in Databricks, you need to follow these steps:1. Create an access policy and a user with access keys in the AWS Console:  - Go to the IAM service.  - Click the Policies tab in the sidebar.  - Click...

  • 0 kudos
1 More Replies
DBUser2
by New Contributor II
  • 1007 Views
  • 2 replies
  • 0 kudos

Databricks sql using odbc issue

Hi,I'm connecting to a Databricks instance on Azure from a Windows Application using Simba ODBC driver, and when running SQL statements on delta tables, like INSERT, UPDATE, DELETE commands using Execute, the result doesn't indicate the no. of rows a...

  • 1007 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @DBUser2 ,  When using the Simba ODBC driver to connect to Databricks on Azure and running SQL statements like INSERT, UPDATE, or DELETE, it's common to encounter a result of -1 for the number of rows affected. This behaviour is not specific to th...

  • 0 kudos
1 More Replies
yzhang
by New Contributor III
  • 1774 Views
  • 3 replies
  • 0 kudos

How to trigger a "Git provider" job with commit?

I have "Git provider" job created and running fine on the remote git. The problem is that I have to manually trigger it. Is there a way to run the job automatically whenever a new commit to the branch? (In "Schedules & Triggers section", I can find a...

  • 1774 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @yzhang, To automatically trigger a job whenever there is a new commit to the branch in a remote Git repository, you can follow these steps: 1. Go to your job's "Schedules and Triggers" section.2. Click on the "Add Trigger" button.3. In the trigge...

  • 0 kudos
2 More Replies
Ludo
by New Contributor III
  • 4384 Views
  • 7 replies
  • 2 kudos

Resolved! Jobs with multi-tasking are failing to retry; how to fix this issue?

Hello,This is question on our platform with `Databricks Runtime 11.3 LTS`.I'm running a Job with multiple tasks in // using a shared cluster.Each task runs a dedicated scala class within a JAR library attached as a dependency.One of the task fails (c...

  • 4384 Views
  • 7 replies
  • 2 kudos
Latest Reply
YoshiCoppens61
New Contributor II
  • 2 kudos

Hi,This actually should not be marked as solved. We are having the same problem, whenever a Shared Job Cluster crashes for some reason (generally OoM), all tasks will start failing until eternity, with the error message as described above. This is ac...

  • 2 kudos
6 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors