cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

CaptainJack
by New Contributor III
  • 3498 Views
  • 3 replies
  • 2 kudos

Resolved! Error Handling and Custom Messages in Workflows

I would like to be able to get custom error's message ideally visible from Workflows > Jobs UI.1. For example, workflow failed because file was missing and could not find it, in this case I am getting "Status" Failed and "Error Code" RunExecutionErro...

  • 3498 Views
  • 3 replies
  • 2 kudos
Latest Reply
Edthehead
Contributor III
  • 2 kudos

What you can do is pass the custom error message you want from the notebook back to the workflow output = f"There was an error with {error_code} : {error_msg}"dbutils.notebook.exit(output) Then when you are fetching the status of your pipeline, you c...

  • 2 kudos
2 More Replies
Manthansingh
by New Contributor
  • 1703 Views
  • 2 replies
  • 0 kudos

Writing part files in single text file

i want to write all my part file into a single text file is there anything i can do 

  • 1703 Views
  • 2 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

When writing a pyspark dataframe to a file, it will always write to a part file by default. This is because of partitions, even if there is only 1 partitions.To write into a single file you can convert the pyspark dataframe to a pandas dataframe and ...

  • 0 kudos
1 More Replies
herry
by New Contributor III
  • 4598 Views
  • 4 replies
  • 4 kudos

Resolved! Get the list of loaded files from Autoloader

Hello,We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader database to get the list of files that have been loaded?I can easily do this in AWS Glue j...

  • 4598 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Herry Ramli​ - Would you be happy to mark Hubert's answer as best so that other members can find the solution more easily?Thanks!

  • 4 kudos
3 More Replies
kumar_ravi
by New Contributor III
  • 545 Views
  • 0 replies
  • 0 kudos

Dlt pipleline with unity catalog and external tables

We were using dlt pipeline with our raw and enhanced layer ( on hive metastore) but recently upgraded to unity catalog. We have external tables(storing data on different s3 bucket and metadata for table at unity catalog).At the moment dlt doest suppo...

  • 545 Views
  • 0 replies
  • 0 kudos
Yulei
by New Contributor III
  • 26160 Views
  • 6 replies
  • 1 kudos

Resolved! Could not reach driver of cluster

 Hi, Rencently, I am seeing issue Could not reach driver of cluster <some_id> with my structure streaming job when migrating to unity catalog and found this when checking the traceback:Traceback (most recent call last):File "/databricks/python_shell/...

  • 26160 Views
  • 6 replies
  • 1 kudos
Latest Reply
Kub4S
New Contributor II
  • 1 kudos

To expand on the same error "Could not reach driver of cluster XX" but different cause;the reason in my case (ADF triggered databricks job which runs into this error) was a problem with a numpy library version, where solution is to downgrade the libr...

  • 1 kudos
5 More Replies
marvin1
by New Contributor III
  • 10420 Views
  • 6 replies
  • 0 kudos

"Unable to upload to DBFS Query" Error running SQL warehouse query?

I have sql warehouse endpoints that work fine when querying from applications such as Tableau, but just running the included sample query against a running endpoint from the Query Editor from the workspace is returning "Unable to upload to DBFS Query...

  • 10420 Views
  • 6 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Marvin Ginns​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 0 kudos
5 More Replies
hantha
by New Contributor
  • 1313 Views
  • 1 replies
  • 1 kudos

For dummies: How to avoid 'bill shock' & control AWS charges while learning to use Databricks?

Hi, I'm an out-of-work data analyst wanting to re-skill as a 'citizen data engineer'.  By following how-to guides I was able to set up my own Databricks account etc along with a personal VPC in AWS. After 2 weeks of problem-free training I checked my...

Data Engineering
AWS
billing
nat gateway
Training
VPC
  • 1313 Views
  • 1 replies
  • 1 kudos
Latest Reply
holly
Databricks Employee
  • 1 kudos

Hi Hantha Databricks needs VPCs to work, but there are the default ones and customer managed ones: https://docs.databricks.com/en/security/network/classic/customer-managed-vpc.htmlCustomer managed ones are optional, but many tutorials include them as...

  • 1 kudos
Antoine_B
by Contributor
  • 1480 Views
  • 2 replies
  • 0 kudos

Resolved! Row Filter on Unity Catalog Tables based on Unity Catalog group appartenance

Hello,I would like to prevent users belonging to a given Unity Catalog group ('restricted_users_group') to access some rows of a Unity Catalog Table.For now, I was able to define a Row Filter function to prevent a list of users to access some rows, t...

  • 1480 Views
  • 2 replies
  • 0 kudos
Latest Reply
Antoine_B
Contributor
  • 0 kudos

Ok, so this problem needs no tricks. All was in the documentationI did not know about the function IS_ACCOUNT_GROUP_MEMBER(). So this Row Filter function did the job:CREATE FUNCTION rd.my_schema.my_row_filter(filter_column INTEGER) RETURNS BOOLEANRET...

  • 0 kudos
1 More Replies
achntrl
by New Contributor
  • 2674 Views
  • 0 replies
  • 0 kudos

CI/CD - Databricks Asset Bundles - Deploy/destroy only bundles with changes after Merge Request

Hello everyone,We're in the process of migrating to Databricks and are encountering challenges implementing CI/CD using Databricks Asset Bundles. Our monorepo houses multiple independent bundles within a "dabs" directory, with only one team member wo...

  • 2674 Views
  • 0 replies
  • 0 kudos
Antoine_B
by Contributor
  • 1056 Views
  • 2 replies
  • 3 kudos

Resolved! Applying Row Filters to a table removes the ability to DEEP CLONE or SHALLOW CLONE this table

Hello,In this documentation I see some limitations coming with using Row Filters, like "Deep and shallow clones are not supported"We plan to use these Row Filters to hide sensitive data to some users.But not having CLONE available for tables with Row...

  • 1056 Views
  • 2 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

Hi Antoine_B The document explicitly calls out these limitations but there is no explicit mention of the roadmap for support. I would request you to reachout to the Account Executive and SAs so that they can suggest the alternative approach for now. ...

  • 3 kudos
1 More Replies
Neli
by New Contributor III
  • 2094 Views
  • 1 replies
  • 1 kudos

Resolved! Preferred way to read S3 - dbutils or Boto3 or better solution ?

We have a usecase where table has 15K rows , one of the column has S3 location. We need to read each row from table and fetch s3 location from one of the column,read  its content from s3. To read the content from S3 , workflow is taking lot of time, ...

  • 2094 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kannathasan
New Contributor III
  • 1 kudos

Create IAM role in AWS S3 and use those credentials to connect to Databricks by using the below codeAWS_SECRET_ACCESS_KEY={{secrets/scope/aws_secret_access_key}}AWS_ACCESS_KEY_ID={{secrets/scope/aws_access_key_id}}aws_bucket_name = "my-s3-bucket"df =...

  • 1 kudos
Nisha_Aggarwal
by New Contributor II
  • 1066 Views
  • 2 replies
  • 2 kudos

How to remove specific data from bronze layer

Hello Team,At my end, we bring data from Kafka and through autoloader we ingest it in Bronze layer and further to silver and silver plus layer. Lately due to business changes, we need to delete specific data from the bronze and silver layer due to da...

  • 1066 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nisha_Aggarwal
New Contributor II
  • 2 kudos

Hello,Thankyou for your reply on my query!My bronze layer has data in json format and currently, I need to remove 400 records from it. I also have job set up in streaming mode. Could you please suggest how I can go further with it?

  • 2 kudos
1 More Replies
Ram-Dev7
by New Contributor
  • 1294 Views
  • 2 replies
  • 0 kudos

Query on using secret scope for dbt-core integration with databricks workflow

Hello all,I am currently configuring dbt-core with Azure Databricks Workflow and using Azure Databricks M2M (Machine-to-Machine) authentication for this setup. I have the cluster ID and cluster secret ID stored in Databricks secret scope.I am seeking...

  • 1294 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @Ram-Dev7 , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
mbaas
by New Contributor III
  • 787 Views
  • 1 replies
  • 2 kudos

DLT Serverless costs

I recently started checking out the serverless delta live tables. In my understanding serverless continuous jobs (with autoloader) would only do something when new files arrive. However for serverless 4 pipelines running continuously, I spend in two ...

  • 787 Views
  • 1 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Hi @mbaas , That does not sound right. So, were you able to compare the jobs, stages and spot the difference. Are there more tasks added, or with compute perspective you do not find a difference at all but only the cost?  Also, it may be required to ...

  • 2 kudos
rumfox
by New Contributor II
  • 2522 Views
  • 2 replies
  • 2 kudos

Maximum Number of Parameters in Databricks SQL Queries

Hello Databricks Community,I'm working with Databricks SQL and encountered an issue when passing a large number of parameters in a query. Specifically, I attempted to pass 493 parameters, but I received the following error message:BAD_REQUEST : Too m...

  • 2522 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @rumfox ,I would assume there is such a limit, the error is pretty clear. But the weird thing is, I cannot find any mention in documentation about it 

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels