cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

RobsonNLPT
by Contributor
  • 275 Views
  • 3 replies
  • 0 kudos

dbfs /mounts permissions with Clusters on Shared Mode / Serverless

Hi allI've used mounts based on service principals but users using shaed clusters or the new serverless they have problems with permissions to access resources on dbfs. Right now we have used clusters in single modeWhat should be the best approach to...

  • 275 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @RobsonNLPT, Consider using Unity Catalog for managing permissions. Unity Catalog provides fine-grained access control and can be used to manage permissions for data stored in DBFS, ADLS, and other storage systems.

  • 0 kudos
2 More Replies
bojian_tw
by New Contributor
  • 244 Views
  • 1 replies
  • 0 kudos

Delta Live Table pipeline hanging at INITIALIZING forever

I have a dlt pipeline haning at INIALIZING forever, it never stops. But I found the Analysis Exeption already happened at beginningpyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or functi...

Screenshot 2024-07-27 at 07.50.31.png
Data Engineering
Delta Live Table
dlt
  • 244 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @bojian_tw, Could you please check that all column names referenced in your pipeline are correct and exist in the source data? The error message indicates that the column data.Something.data cannot be resolved, which suggests a possible typo or a ...

  • 0 kudos
NLearn
by New Contributor II
  • 239 Views
  • 1 replies
  • 0 kudos

Lakehouse monitoring

I created snapshot and time series lake house monitoring on 2 different tables. After execution, metrics table got created and dashboard also got created but monitoring data is not populating in dashboard and metrics tables even after refresh of moni...

  • 239 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @NLearn, Could you please ensure that the data is being ingested into the tables correctly? 

  • 0 kudos
Miguel_Salas
by New Contributor II
  • 188 Views
  • 1 replies
  • 1 kudos

Last file in S3 folder using autoloader

Nowadays we already use the autoloader with checkpoint location, but I still wanted to know if it is possible to read only the last updated file within a folder. I know it somewhat loses the purpose of checkpoint locatioAnother question is it possibl...

  • 188 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Miguel_Salas, While the primary purpose of using Auto Loader with checkpointing is to process new files incrementally, you can still achieve reading only the last updated file within a folder. One approach is to use the cloudFiles.includeExisting...

  • 1 kudos
KamilK
by New Contributor II
  • 182 Views
  • 1 replies
  • 1 kudos

Include SPARK-46990 in databricks 15.4 LTS

Hi, could you include fix SPARK-46990 ([SPARK-46990] Regression: Unable to load empty avro files emitted by event-hubs - ASF JIRA (apache.org)) in Databricks 15.4? (15.4 is in the beta stage, so it might be a right time to include fix)

  • 182 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @KamilK, Databricks Runtime 15.4 LTS is currently in its beta stage, which means it’s a good time to suggest including fixes like SPARK-46990. This specific issue, related to the inability to load empty Avro files emitted by event hubs, is a known...

  • 1 kudos
gweakliem
by New Contributor
  • 227 Views
  • 1 replies
  • 0 kudos

"No module named google.cloud.spark" errors querying BigQuery

Personal Cluster 15.3 ML, Running the following notebook:import pyspark.sql.functions as F from datetime import datetime, timedelta spark.sparkContext.addPyFile("gs://spark-lib/bigquery/spark-bigquery-support-0.26.0.zip") target_hour = datetime(202...

  • 227 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @gweakliem, First, ensure that the google-cloud-spark package is installed in your Python environment. This package is necessary for integrating with Google Cloud services.  Next, ensure that your Spark environment is correctly configured to use t...

  • 0 kudos
anh-le
by New Contributor
  • 434 Views
  • 2 replies
  • 2 kudos

Image disappears after notebook export to HTML

Hi everyone,I have an image saved at DBFS which I want to include in my notebook. I'm using the standard markdown syntax![my image] (/files/my_image.png)which works and the image shows. However, when I export the notebook to HTML, the image disappear...

  • 434 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @anh-le, Hi, Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 2 kudos
1 More Replies
RishabhGarg
by New Contributor II
  • 344 Views
  • 3 replies
  • 2 kudos

Keywords and Functions supported in SQL but not in Databricks SQL.

Actually, I have around 2000 SQL queries. I have to convert them in Databricks supported SQLs, so that I can run them in databricks environment. So I want to know the list of all keywords, functions or anything that is different in databricks SQL. Pl...

  • 344 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @RishabhGarg, Hi, Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 2 kudos
2 More Replies
ptambe
by New Contributor III
  • 4099 Views
  • 7 replies
  • 3 kudos

Resolved! Is Concurrent Writes from multiple databricks clusters to same delta table on S3 Supported?

Does databricks have support for writing to same Delta Table from multiple clusters concurrently. I am specifically interested to know if there is any solution for https://github.com/delta-io/delta/issues/41 implemented in databricks OR if you have a...

  • 4099 Views
  • 7 replies
  • 3 kudos
Latest Reply
dennyglee
New Contributor III
  • 3 kudos

Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Del...

  • 3 kudos
6 More Replies
talenik
by New Contributor III
  • 743 Views
  • 4 replies
  • 1 kudos

Resolved! Ingesting logs from Databricks (GCP) to Azure log Analytics

Hi everyone, I wanted to ask if there is any way through which we can ingest logs from GCP databricks to azure log analytics in store-sync fashion. Meaning we will save logs into some cloud bucket lets say, then from there we should be able to send l...

Data Engineering
azure log analytics
Databricks
GCP databricks
google cloud
  • 743 Views
  • 4 replies
  • 1 kudos
Latest Reply
talenik
New Contributor III
  • 1 kudos

Hi @Kaniz_Fatma ,Thanks for help. We decided to develop our own library for logging to azure log analytics. We used buffer for this. We are currently on timer based logs but in future versions we wanted to move to memory based.Thanks,Nikhil

  • 1 kudos
3 More Replies
Gary_Irick
by New Contributor III
  • 7086 Views
  • 10 replies
  • 12 kudos

Delta table partition directories when column mapping is enabled

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...

  • 7086 Views
  • 10 replies
  • 12 kudos
Latest Reply
talenik
New Contributor III
  • 12 kudos

Hi @Kaniz_Fatma , I have few queries on Directory Names with Column Mapping. I have this delta table on ADLS and I am trying to read it, but I am getting below error. How can we read delta tables with column mapping enabled with pyspark?Can you pleas...

  • 12 kudos
9 More Replies
kodexolabs
by New Contributor
  • 201 Views
  • 0 replies
  • 0 kudos

Federated Learning for Decentralized, Secure Model Training

Federated learning allows you to train machine learning models on decentralized data while ensuring data privacy and security by storing data on local devices and only sharing model updates. This approach assures that raw data never leaves its source...

  • 201 Views
  • 0 replies
  • 0 kudos
guangyi
by Contributor
  • 464 Views
  • 1 replies
  • 0 kudos

Resolved! How exactly to create cluster policy via Databricks CLI ?

I tried these ways they are all not working:  Save the json config into a JSON file locally and run databricks cluster-policies create --json cluster-policy.json Error message: Error: invalid character 'c' looking for beginning of valueSave the json ...

  • 464 Views
  • 1 replies
  • 0 kudos
Latest Reply
Slash
Contributor
  • 0 kudos

Hi @guangyi ,Try to add @ before the name of json filedatabricks cluster-policies create --json @policy.json Also make sure that you're escaping quotation marks like they do in below documenation:Create a new policy | Cluster Policies API | REST API ...

  • 0 kudos
suqadi
by New Contributor
  • 150 Views
  • 1 replies
  • 0 kudos

systems table predictive_optimization_operations_history stays empty

Hi,For our lakehouse with Unity catalog enabled, we enabled predictive optimization feature for several catalogs to clean up storage with Vacuum. When we describe the catalogs, we can see that predictive optimization is enabled. The system table for ...

  • 150 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Honored Contributor
  • 0 kudos

Hello as per docs data could take 24 hours to be retrieved, can you confirm if the below requirement are met?Your region must support predictive optimization (see Databricks clouds and regions).

  • 0 kudos
prasadvaze
by Valued Contributor II
  • 276 Views
  • 1 replies
  • 2 kudos

Resolved! Grant permission on catalog but revoke from schema for the same user

I have a catalog ( in unity catalog) containing multiple schemas.  I need an AD group to have select permission on all the schemas so at catalog level I granted Select to AD grp.  Then, I need to revoke permission on one particular schema in this cat...

  • 276 Views
  • 1 replies
  • 2 kudos
Latest Reply
Walter_C
Honored Contributor
  • 2 kudos

This unfortunately is not possible due to the hierarchical mechanism in UC, you will need to grant permissions to the specific schemas directly and not by providing a major permission at the catalog level

  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels