cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

TimB
by New Contributor III
  • 6700 Views
  • 8 replies
  • 3 kudos

Passing multiple paths to .load in autoloader

I am trying to use autoloader to load data from two different blobs from within the same account so that spark will discover the data asynchronously. However, when I try this, it doesn't work and I get the error outlined below. Can anyone point out w...

  • 6700 Views
  • 8 replies
  • 3 kudos
Latest Reply
TimB
New Contributor III
  • 3 kudos

If were were to upgrade to ADLSg2, but retain the same structure, would there be scope for this method above to be improved (besides moving to notification mode)?

  • 3 kudos
7 More Replies
pshuk
by New Contributor III
  • 790 Views
  • 2 replies
  • 0 kudos

run md5 using CLI

Hi,I want to run a md5 checksum on the uploaded file to databricks. I can generate md5 on the local file but how do I generate one on uploaded file on databricks using CLI (Command line interface). Any help would be appreciated.I tried running databr...

  • 790 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @pshuk, Unfortunately, the databricks fs md5 command is not supported directly.  You can run a Python script to compute the MD5 hash of the uploaded file.If your uploaded file is stored in Azure Blob Storage, you can use the azcopy tool to calcula...

  • 0 kudos
1 More Replies
Amit_Dass_Chmp
by New Contributor III
  • 480 Views
  • 1 replies
  • 0 kudos

On Unity Catalog - what is the best way to adding members to groups

Hi All, On Unity Catalog - what is the best way to adding members to groups using API or CLI? API should be the best option, but thought to check with you all.  

  • 480 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Amit_Dass_Chmp, In general, both API and CLI can be used to manage members and groups in the Unity Catalog. The choice between the two often depends on your specific use case and comfort level with each tool. APIs are often preferred for their...

  • 0 kudos
danial
by New Contributor II
  • 4261 Views
  • 3 replies
  • 1 kudos

Connect Databricks hosted on Azure, with RDS on AWS.

We have Databricks set up and running on Azure. Now we want to connect it with RDS (AWS) to transfer data from RDS to Azure DataLake using the Databricks.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but n...

  • 4261 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Danial Malik​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 1 kudos
2 More Replies
Michael_Appiah
by New Contributor III
  • 3936 Views
  • 6 replies
  • 3 kudos

Resolved! Parameterized spark.sql() not working

Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)Problem: I cannot run any of the examples provided in the PySpark...

Michael_Appiah_0-1704459542967.png Michael_Appiah_1-1704459570498.png
  • 3936 Views
  • 6 replies
  • 3 kudos
Latest Reply
Michael_Appiah
New Contributor III
  • 3 kudos

@Cas Unfortunately I do not have any information on this. However, I have seen that DBR 14.3 and 15.0 introduced some changes to spark.sql(). I have not checked whether those changes resolve the issue outlined here. Your best bet is probably to go ah...

  • 3 kudos
5 More Replies
bradleyjamrozik
by New Contributor III
  • 376 Views
  • 1 replies
  • 0 kudos

Autoloader Failure Creating EventSubscription

Posting this here too in case anyone else has run into this issue... Trying to set up Autoloader File Notifications but keep getting an "Internal Server Error" message.Failure on Write EventSubscription - Internal error - Microsoft Q&A

  • 376 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @bradleyjamrozik, Ensure that your service principal for Event Grid and your storage account have the necessary permissions.Specifically, grant the Contributor role to your service principal for Event Grid and your storage account

  • 0 kudos
Phuonganh
by New Contributor II
  • 662 Views
  • 2 replies
  • 2 kudos

Databricks SDK for Python: Errors with parameters for Statement Execution

Hi team,Im using Databricks SDK for python to run SQL queries. I created a variable as below:param = [{'name' : 'a', 'value' :x'}, {'name' : 'b', 'value' : 'y'}]and passed it the statement as below_ = w.statement_execution.execute_statement( warehous...

  • 662 Views
  • 2 replies
  • 2 kudos
Latest Reply
DonkeyKong
New Contributor II
  • 2 kudos

@Kaniz_Fatma This does not help resolve the issue. I am experiencing the same issue when following the above pointers. Here is the statement:response = w.statement_execution.execute_statement( statement='ALTER TABLE users ALTER COLUMN :col_name S...

  • 2 kudos
1 More Replies
cszczotka
by New Contributor III
  • 787 Views
  • 4 replies
  • 0 kudos

Shallow clone and issue with MODIFY permission to source table

Hi,I'm running shallow clone for external delta tables. The shallow clone is failing for source tables where I don't have MODIFY permission. I'm getting below exception. I don't understand why MODIFY permission to source table is required. Is there a...

  • 787 Views
  • 4 replies
  • 0 kudos
Latest Reply
Amit_Dass_Chmp
New Contributor III
  • 0 kudos

Also check this documentation on access mode :Shallow clone for Unity Catalog tables | Databricks on AWS Working with Unity Catalog shallow clones in Single User access mode, you must have permissions on the resources for the cloned table source as w...

  • 0 kudos
3 More Replies
Maatari
by New Contributor II
  • 861 Views
  • 2 replies
  • 1 kudos

Fixed interval micro-batches and AvailableNow Trigger

What is the fundamental difference between Fixed interval micro-batches and AvailableNow Trigger, given that both can consume in micro-batch based on the desire size of the micro batch ? Is the fundamental difference the fact that AvailableNow shut d...

  • 861 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Maatari,  Fixed Interval Micro-batches are like clockwork, processing data at regular intervals without stopping.AvailableNow Trigger is more adaptive, consuming data as it becomes available and then gracefully shutting down. Regarding your confu...

  • 1 kudos
1 More Replies
surband
by New Contributor III
  • 3070 Views
  • 9 replies
  • 0 kudos

Pulsar Streaming (Read) - Benchmarking Information

We are doing a first time implementation of data streaming reading from a partitioned pulsar topics to a delta table managed by UC. We are unable to scale the job beyond about ~ 40k msgs/sec. Beyond 40k msgs/sec , the job fails.  I'd imagine Databric...

  • 3070 Views
  • 9 replies
  • 0 kudos
Latest Reply
surband
New Contributor III
  • 0 kudos

Attached Grafana screenshots

  • 0 kudos
8 More Replies
JacobKesinger
by New Contributor
  • 1824 Views
  • 3 replies
  • 0 kudos

Iterating over a pyspark.pandas.groupby.DataFrameGroupBy

I have a pyspark.pandas.frame.DataFrame object (that I called from `pandas_api` on a pyspark.sql.dataframe.DataFrame object).  I have a complicated transformation that I would like to apply to this data, and in particular I would like to apply it in ...

  • 1824 Views
  • 3 replies
  • 0 kudos
Latest Reply
MichTalebzadeh
Contributor III
  • 0 kudos

Hi,The error indicates that the Unity Catalog does not support Spark higher-order functions, such as those used in pandas_udf. This limitation likely comes from architectural or compatibility constraints. To resolve the issue, consider alternative ap...

  • 0 kudos
2 More Replies
nileshtiwaari
by New Contributor
  • 338 Views
  • 1 replies
  • 0 kudos

Unity Catalog External Tables

what if I delete the external tables files manually on storage account for external table without dropping the table itself?

  • 338 Views
  • 1 replies
  • 0 kudos
Latest Reply
mhiltner
New Contributor III
  • 0 kudos

This change won't be registered as metadata thus the table will still appear on unity catalog, but you'll get an error when trying to access it, as the table metadata will point to deleted files.   

  • 0 kudos
kazinahian
by New Contributor III
  • 1091 Views
  • 2 replies
  • 1 kudos

Resolved! Lowcode ETL in Databricks

Hello everyone,I work as a Business Intelligence practitioner, employing tools like Alteryx or various low-code solutions to construct ETL processes and develop data pipelines for my Dashboards and reports. Currently, I'm delving into Azure Databrick...

  • 1091 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @kazinahian,  In the Azure ecosystem, you have a few options for building ETL (Extract, Transform, Load) data pipelines, including low-code solutions. Let’s explore some relevant tools: Azure Data Factory: Purpose: Azure Data Factory is a clou...

  • 1 kudos
1 More Replies
DataRonit
by New Contributor II
  • 454 Views
  • 1 replies
  • 1 kudos

My Databricks certified data engineer associate exam got suspended

Hi Team,My Databricks certified data engineer associate exam which was scheduled today and got suspended from the proctor side by raising some false alarms, from my end there was an internet disconnection issue for a couple of minutes. I was almost a...

  • 454 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @DataRonit, Thank you for posting your concern on Community!   To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hour...

  • 1 kudos
nggianno
by New Contributor III
  • 3153 Views
  • 5 replies
  • 2 kudos

How to enable Delta live tables serverless in Databricks?

I am trying to enable the Serverless mode in the Delta Live Tables, based on what the official Databricks channel YouTube video "Delta Live Tables A to Z: Best practices for Modern Data Pipelines".And I cannot find it in my UI. Could you help me with...

  • 3153 Views
  • 5 replies
  • 2 kudos
Latest Reply
kols
New Contributor II
  • 2 kudos

Serverless DLT pipelines are currently in PrPr (Private Preview). Thus, you will not see this checkbox if you are not part of this PrPr. To learn about enabling Serverless DLT pipelines, contact your Databricks account team.

  • 2 kudos
4 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels