cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

leelee3000
by New Contributor III
  • 1044 Views
  • 1 replies
  • 0 kudos

Parameterizing DLT Jobs

I have observed the use of advanced configuration and creating a map as a way to parameterize notebooks, but these appear to be cluster-wide settings. Is there a recommended best practice for directly passing parameters to notebooks running on a DLT ...

  • 1044 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @leelee3000, In Databricks workflows, you can pass parameters to tasks that reference notebooks. For example, you can use the dbutils.jobs.taskValues.set function to register a parameter in the first task and then reference it in subsequent tasks....

  • 0 kudos
Geoff
by New Contributor II
  • 846 Views
  • 1 replies
  • 1 kudos

Bizarre Delta Tables pipeline error: ModuleNotFound

I received the following error when trying to import a function defined in a .py file into a .ipynb file. I would add code blocks, but the message keeps getting rejected for invalid HTML.# test_lib.py (same directory, in a subfolder)def square(x):ret...

  • 846 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Geoff, The error message ModuleNotFoundError: No module named 'test_lib' indicates that Python cannot find the module test_lib. This could be due to several reasons:   File Location: The Python file test_lib.py needs to be in the same directory a...

  • 1 kudos
cg3
by New Contributor
  • 287 Views
  • 0 replies
  • 0 kudos

Define VIEW in Databricks Asset Bundles?

Is it possible to define a Unity Catalog VIEW in a Databricks Asset Bundle, or specify in the bundle that a specific notebook gets run once per deployment?

  • 287 Views
  • 0 replies
  • 0 kudos
erigaud
by Honored Contributor
  • 3976 Views
  • 1 replies
  • 1 kudos

Resolved! Dynamically specify pivot column in SQL

Hello everyone !I am looking for a way to dynamically specify pivot columns in a SQL query, so it can be used in a view. However we don't want to hard code the values that need to become columns, and would rather extract it from another table.I've se...

  • 3976 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @erigaud, In Databricks SQL, you can’t use a dynamic list of columns directly in the PIVOT clause.    However, there is a workaround using DataFrames in PySpark.    This approach allows you to pivot on the mapping column dynamically. The distinct ...

  • 1 kudos
stef2
by New Contributor III
  • 7022 Views
  • 13 replies
  • 5 kudos

Resolved! 2023-03-22 10:29:23 | Error 403 | https://customer-academy.databricks.com/

I would like to know why I am getting this error when I tried to earn badges for lakehouse fundamentals. I can't access the quiz page. Can you please help on this?

  • 7022 Views
  • 13 replies
  • 5 kudos
Latest Reply
dkn_data
New Contributor II
  • 5 kudos

Login by you gmail account in customer-academy.databricks.com and search the LakeHouse short course and enroll free

  • 5 kudos
12 More Replies
Kishan1003
by New Contributor
  • 1821 Views
  • 2 replies
  • 0 kudos

Merge Operation is very slow for S/4 Table ACDOCA

Hello,we have a scenario in Databricks where every day  we get 60-70 million records  and it takes a lot of time to merge the data into 28 billion records which is already sitting there . The time taken to rewrite the files which are affected is too ...

  • 1821 Views
  • 2 replies
  • 0 kudos
Latest Reply
177991
New Contributor II
  • 0 kudos

Hi @Kishan1003  did you find something helpful? Im dealing with a similar situation, acdoca table on my side is around 300M (fairly smaller), and incoming daily data is usually around 1M. I have try partition using period, like fiscyearper column, zo...

  • 0 kudos
1 More Replies
costi9992
by New Contributor III
  • 2633 Views
  • 6 replies
  • 0 kudos

Resolved! Add policy init_scripts.*.volumes.destination for dlt not working

Hi,I tried to create a policy to use it for DLTs that are ran with shared clusters, but when i run the DLT with this policy I have an error. Init-script is added to Allowed JARs/Init Scripts.DLT events error: Cluster scoped init script /Volumes/main/...

  • 2633 Views
  • 6 replies
  • 0 kudos
Latest Reply
ayush007
New Contributor II
  • 0 kudos

@costi9992I am facing same issue with UC enabled cluster with 13.3 Databricks Runtime.I have uploaded the init shell script in Volume with particular init script allowed by metastore admin.But I get the same error as you stated .When I looked in clus...

  • 0 kudos
5 More Replies
shivam-singh
by New Contributor
  • 641 Views
  • 1 replies
  • 0 kudos

Databricks-Autoloader-S3-KMS

Hi, I am working on a requirement where I am using autoloader in a DLT pipeline to ingest new files as they come.This flow is working fine. However I am facing an issue, when we have the source bucket an s3 location, since the bucket is having a SSE-...

  • 641 Views
  • 1 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

Can you please paste the exact errors and check below things:check following if its related to KMS:1. IAM role policy and KMS policy should have allow permissions2. Did you use extraConfig while mounting the source-s3 bucket:If you have used IAM role...

  • 0 kudos
esalohs
by New Contributor III
  • 4094 Views
  • 7 replies
  • 5 kudos

Databricks Autoloader - list only new files in an s3 bucket/directory

I have an s3 bucket with a couple of subdirectories/partitions like s3a://Bucket/dir1/ and s3a://Bucket/dir2/. There is currently in the millions of files sitting in bucket in the various subdirectories/partitions. I'm getting new data in near real t...

  • 4094 Views
  • 7 replies
  • 5 kudos
Latest Reply
kulkpd
Contributor
  • 5 kudos

below option used while performing spark.readStream:::.option('cloudFiles.format', 'json').option('cloudFiles.inferColumnTypes', 'true').option('cloudFiles.schemaEvolutionMode', 'rescue').option('cloudFiles.useNotifications', True).option('skipChange...

  • 5 kudos
6 More Replies
Muhammed
by New Contributor III
  • 13596 Views
  • 15 replies
  • 0 kudos

Filtering files for query

Hi Team,While writing my data to datalake table I am getting 'filtering files for query', it would be stuck at writingHow can I resolve this issue

  • 13596 Views
  • 15 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

My bad, somewhere in the screenshot I saw that but not able to find it now.Which source you are using to load the data, delta table, aws-s3, or azure-storage?

  • 0 kudos
14 More Replies
greyamber
by New Contributor II
  • 13545 Views
  • 6 replies
  • 0 kudos

Select job cluster vs all purpose cluster

I have workflow and need to run at every 1 minute interval, it is rest api call, should I go for all purpose cluster or job cluster to meet the SLA. We need to get the as soon as it is available. 

  • 13545 Views
  • 6 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

@greyamber Interactive cluster costs two time more than job cluster. can you explain use-case of why job API needs to invoked and what API is doing. 

  • 0 kudos
5 More Replies
Sudheerreddy25
by New Contributor II
  • 2932 Views
  • 6 replies
  • 1 kudos

Resolved! Regarding Exam got Suspended at middle without any reason.

Hi Team,My Databricks Certified Data Engineer Associate (Version 3) exam got suspended on 25th August and it is in progress state.I was there continuously in front of the camera and suddenly the alert appeared, and support person asked me to show the...

  • 2932 Views
  • 6 replies
  • 1 kudos
Latest Reply
byedla
New Contributor II
  • 1 kudos

Hello team, Same thing happened to me as well by Kryterion. This is very unfair, no reasoning nothing they just suspend you and don't give response post that. I doubt why google tied up with such worst online proctoring system.I wrote a brief email t...

  • 1 kudos
5 More Replies
Jfoxyyc
by Valued Contributor
  • 2760 Views
  • 4 replies
  • 0 kudos

Is there a way to catch the cancel button or the interrupt button in a Databricks notebook?

I'm running oracledb package and it uses sessions. When you cancel a running query it doesn't close the session even if you have a try catch block because a cancel or interrupt issues a kill command on the process. Is there a method to catch the canc...

  • 2760 Views
  • 4 replies
  • 0 kudos
Latest Reply
jonathan-dufaul
Valued Contributor
  • 0 kudos

I'm having the same issue and this has been frustrating as heck.

  • 0 kudos
3 More Replies
geetha_venkates
by New Contributor II
  • 7721 Views
  • 7 replies
  • 2 kudos

Resolved! How do we add a certificate file in Databricks for sparksubmit type of job?

How do we add a certificate file in Databricks for sparksubmit type of job? 

  • 7721 Views
  • 7 replies
  • 2 kudos
Latest Reply
nicozambelli
New Contributor II
  • 2 kudos

I have the same problem... when i worked with the hive_metastore in past, i was able tu use file system and also use API certs.Now i'm using the unity catalog and i can't upload a certificate, can somebody help me?

  • 2 kudos
6 More Replies
Karin
by New Contributor II
  • 2304 Views
  • 1 replies
  • 2 kudos

Resolved! Liquid clustering with boolean columns

Hi community Is it possible to use boolean columns as cluster keys for liquid clustering on Delta Tables? I've been trying to set a boolean column as cluster key since it's one of my most common queries when reading from the table. I'm getting the er...

Data Engineering
Liquid clustering
  • 2304 Views
  • 1 replies
  • 2 kudos
Latest Reply
jeroenvs
New Contributor III
  • 2 kudos

Can confirm that boolean columns are note allowed for liquid clustering. This seems to be undocumented and the error message is not helpful: "couldn't find clustering column in stats schema"

  • 2 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels