cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vishal48
by New Contributor II
  • 1347 Views
  • 0 replies
  • 1 kudos

Integrating row and column level security in parent child tables with masking only selected rows

Currently I am working with a project where we need to mask PIIs in few columns for VIP customers only.Let me explain briefly with example:Table A: [personid, status, address, UID, VIPFLAG]   --> Mask "UID" and "address" only where VIPFLAG is 1Table ...

  • 1347 Views
  • 0 replies
  • 1 kudos
guangyi
by Contributor III
  • 3434 Views
  • 3 replies
  • 1 kudos

Resolved! Complex type variable in Databricks.yml not working

For example here I extract the schedule parameter as a complex type variable: variables: schedule: description: schedule time type: complex default: quartz_cron_expression: '0 22 17 * * ?' timezone_id: Asia/Shanghai pa...

  • 3434 Views
  • 3 replies
  • 1 kudos
Latest Reply
pavlosskev
New Contributor III
  • 1 kudos

 If the validation is fine on your colleague's laptop and not on yours, my first assumption would be that it's a version issue. Do you have the same Databricks CLI version as your colleagues? You can check with  databricks --version Also according to...

  • 1 kudos
2 More Replies
Kotekaman
by New Contributor
  • 926 Views
  • 1 replies
  • 1 kudos

Merge Update in Notebook Faster Than Scala script

Hi Folks,I tested running a merge update using SQL queries in a notebook, and it is faster than using a Scala script. Both tests were done using the same cluster size in Databricks.How can I make the Scala script as fast as the SQL notebook?

  • 926 Views
  • 1 replies
  • 1 kudos
Latest Reply
Witold
Databricks Partner
  • 1 kudos

Have you already compared both query plans?

  • 1 kudos
CaptainJack
by New Contributor III
  • 6856 Views
  • 3 replies
  • 2 kudos

Resolved! Error Handling and Custom Messages in Workflows

I would like to be able to get custom error's message ideally visible from Workflows > Jobs UI.1. For example, workflow failed because file was missing and could not find it, in this case I am getting "Status" Failed and "Error Code" RunExecutionErro...

  • 6856 Views
  • 3 replies
  • 2 kudos
Latest Reply
Edthehead
Contributor III
  • 2 kudos

What you can do is pass the custom error message you want from the notebook back to the workflow output = f"There was an error with {error_code} : {error_msg}"dbutils.notebook.exit(output) Then when you are fetching the status of your pipeline, you c...

  • 2 kudos
2 More Replies
Manthansingh
by New Contributor
  • 3184 Views
  • 2 replies
  • 0 kudos

Writing part files in single text file

i want to write all my part file into a single text file is there anything i can do 

  • 3184 Views
  • 2 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

When writing a pyspark dataframe to a file, it will always write to a part file by default. This is because of partitions, even if there is only 1 partitions.To write into a single file you can convert the pyspark dataframe to a pandas dataframe and ...

  • 0 kudos
1 More Replies
herry
by New Contributor III
  • 6453 Views
  • 4 replies
  • 4 kudos

Resolved! Get the list of loaded files from Autoloader

Hello,We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader database to get the list of files that have been loaded?I can easily do this in AWS Glue j...

  • 6453 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Herry Ramli​ - Would you be happy to mark Hubert's answer as best so that other members can find the solution more easily?Thanks!

  • 4 kudos
3 More Replies
kumar_ravi
by New Contributor III
  • 890 Views
  • 0 replies
  • 0 kudos

Dlt pipleline with unity catalog and external tables

We were using dlt pipeline with our raw and enhanced layer ( on hive metastore) but recently upgraded to unity catalog. We have external tables(storing data on different s3 bucket and metadata for table at unity catalog).At the moment dlt doest suppo...

  • 890 Views
  • 0 replies
  • 0 kudos
marvin1
by New Contributor III
  • 14076 Views
  • 6 replies
  • 0 kudos

"Unable to upload to DBFS Query" Error running SQL warehouse query?

I have sql warehouse endpoints that work fine when querying from applications such as Tableau, but just running the included sample query against a running endpoint from the Query Editor from the workspace is returning "Unable to upload to DBFS Query...

  • 14076 Views
  • 6 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Marvin Ginns​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 0 kudos
5 More Replies
hantha
by New Contributor
  • 2632 Views
  • 1 replies
  • 1 kudos

For dummies: How to avoid 'bill shock' & control AWS charges while learning to use Databricks?

Hi, I'm an out-of-work data analyst wanting to re-skill as a 'citizen data engineer'.  By following how-to guides I was able to set up my own Databricks account etc along with a personal VPC in AWS. After 2 weeks of problem-free training I checked my...

Data Engineering
AWS
billing
nat gateway
Training
VPC
  • 2632 Views
  • 1 replies
  • 1 kudos
Latest Reply
holly
Databricks Employee
  • 1 kudos

Hi Hantha Databricks needs VPCs to work, but there are the default ones and customer managed ones: https://docs.databricks.com/en/security/network/classic/customer-managed-vpc.htmlCustomer managed ones are optional, but many tutorials include them as...

  • 1 kudos
Antoine_B
by Contributor
  • 3494 Views
  • 2 replies
  • 0 kudos

Resolved! Row Filter on Unity Catalog Tables based on Unity Catalog group appartenance

Hello,I would like to prevent users belonging to a given Unity Catalog group ('restricted_users_group') to access some rows of a Unity Catalog Table.For now, I was able to define a Row Filter function to prevent a list of users to access some rows, t...

  • 3494 Views
  • 2 replies
  • 0 kudos
Latest Reply
Antoine_B
Contributor
  • 0 kudos

Ok, so this problem needs no tricks. All was in the documentationI did not know about the function IS_ACCOUNT_GROUP_MEMBER(). So this Row Filter function did the job:CREATE FUNCTION rd.my_schema.my_row_filter(filter_column INTEGER) RETURNS BOOLEANRET...

  • 0 kudos
1 More Replies
Antoine_B
by Contributor
  • 2116 Views
  • 2 replies
  • 3 kudos

Resolved! Applying Row Filters to a table removes the ability to DEEP CLONE or SHALLOW CLONE this table

Hello,In this documentation I see some limitations coming with using Row Filters, like "Deep and shallow clones are not supported"We plan to use these Row Filters to hide sensitive data to some users.But not having CLONE available for tables with Row...

  • 2116 Views
  • 2 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

Hi Antoine_B The document explicitly calls out these limitations but there is no explicit mention of the roadmap for support. I would request you to reachout to the Account Executive and SAs so that they can suggest the alternative approach for now. ...

  • 3 kudos
1 More Replies
Neli
by New Contributor III
  • 3208 Views
  • 1 replies
  • 1 kudos

Resolved! Preferred way to read S3 - dbutils or Boto3 or better solution ?

We have a usecase where table has 15K rows , one of the column has S3 location. We need to read each row from table and fetch s3 location from one of the column,read  its content from s3. To read the content from S3 , workflow is taking lot of time, ...

  • 3208 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kannathasan
Databricks Partner
  • 1 kudos

Create IAM role in AWS S3 and use those credentials to connect to Databricks by using the below codeAWS_SECRET_ACCESS_KEY={{secrets/scope/aws_secret_access_key}}AWS_ACCESS_KEY_ID={{secrets/scope/aws_access_key_id}}aws_bucket_name = "my-s3-bucket"df =...

  • 1 kudos
Nisha_Aggarwal
by New Contributor II
  • 1951 Views
  • 2 replies
  • 2 kudos

How to remove specific data from bronze layer

Hello Team,At my end, we bring data from Kafka and through autoloader we ingest it in Bronze layer and further to silver and silver plus layer. Lately due to business changes, we need to delete specific data from the bronze and silver layer due to da...

  • 1951 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nisha_Aggarwal
New Contributor II
  • 2 kudos

Hello,Thankyou for your reply on my query!My bronze layer has data in json format and currently, I need to remove 400 records from it. I also have job set up in streaming mode. Could you please suggest how I can go further with it?

  • 2 kudos
1 More Replies
Ram-Dev7
by New Contributor
  • 2205 Views
  • 2 replies
  • 0 kudos

Query on using secret scope for dbt-core integration with databricks workflow

Hello all,I am currently configuring dbt-core with Azure Databricks Workflow and using Azure Databricks M2M (Machine-to-Machine) authentication for this setup. I have the cluster ID and cluster secret ID stored in Databricks secret scope.I am seeking...

  • 2205 Views
  • 2 replies
  • 0 kudos
Latest Reply
RishabhTiwari07
Community Manager
  • 0 kudos

Hi @Ram-Dev7 , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
mbaas
by New Contributor III
  • 2238 Views
  • 1 replies
  • 2 kudos

DLT Serverless costs

I recently started checking out the serverless delta live tables. In my understanding serverless continuous jobs (with autoloader) would only do something when new files arrive. However for serverless 4 pipelines running continuously, I spend in two ...

  • 2238 Views
  • 1 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Hi @mbaas , That does not sound right. So, were you able to compare the jobs, stages and spot the difference. Are there more tasks added, or with compute perspective you do not find a difference at all but only the cost?  Also, it may be required to ...

  • 2 kudos
Labels