cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Neli
by New Contributor III
  • 324 Views
  • 2 replies
  • 1 kudos

Resolved! Preferred way to read S3 - dbutils or Boto3 or better solution ?

We have a usecase where table has 15K rows , one of the column has S3 location. We need to read each row from table and fetch s3 location from one of the column,read  its content from s3. To read the content from S3 , workflow is taking lot of time, ...

  • 324 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Neli, Thanks for reaching out! Please review the responses and let us know which one best addresses your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. Th...

  • 1 kudos
1 More Replies
ayush19
by New Contributor III
  • 421 Views
  • 2 replies
  • 0 kudos

Running jar on Databricks cluster from Airflow

Hello,I have a jar file which is installed on a cluster. I need to run this jar from Airflow using DatabricksSubmitRunOperator. I followed the standard instructions as available on Airflow docshttps://airflow.apache.org/docs/apache-airflow-providers-...

ayush19_0-1722491889219.png ayush19_1-1722491926724.png ayush19_2-1722491964523.png ayush19_3-1722492023707.png
  • 421 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ayush19, To run a JAR file that is already installed on a Databricks cluster using the DatabricksSubmitRunOperator in Airflow, you must provide the libraries parameter, even if the JAR is already installed. Unfortunately, there is no way to bypas...

  • 0 kudos
1 More Replies
Fernando_Messas
by New Contributor II
  • 8287 Views
  • 7 replies
  • 4 kudos

Resolved! Error writing data to Google Bigquery

Hello, I'm facing some problems while writing data to Google BigQuery. I'm able to read data from the same table, but when I try to append data I get the following error.Error getting access token from metadata server at: http://169.254.169.254/compu...

  • 8287 Views
  • 7 replies
  • 4 kudos
Latest Reply
asif5494
New Contributor III
  • 4 kudos

Sometime this error occur when your Private key or your service account key is not going in request header, So if you are using Spark or Databricks then you have to configure the JSON Key in Spark config so it will be added in request header.

  • 4 kudos
6 More Replies
colette_chavali
by New Contributor III
  • 1154 Views
  • 2 replies
  • 6 kudos

Resolved! Nominations are OPEN for the Databricks Data Team Awards!

Databricks customers - nominate your data team and leaders for one (or more) of the six Data Team Award categories: Data Team Transformation AwardData Team for Good AwardData Team Disruptor AwardData Team Democratization AwardData Team Visionary Awar...

Data Team Awards
  • 1154 Views
  • 2 replies
  • 6 kudos
Latest Reply
Sai_Mani
New Contributor II
  • 6 kudos

Hello! where can I find more details about award nomination requirements, eligibility criteria, application entry & deadline dates for nominations? Judging criteria?  

  • 6 kudos
1 More Replies
CaptainJack
by New Contributor III
  • 447 Views
  • 4 replies
  • 1 kudos

Get taskValue from job as task, and then pass it to next task.

I have workflow like this.1 task: job as a task. Inside this job there is task which is seting parameter x as taskValue using dbutils.jobs.taskValues.set. 2. task dependent on previous job as a task. I would like to access this parameter x. I tried t...

  • 447 Views
  • 4 replies
  • 1 kudos
Latest Reply
NandiniN
Honored Contributor
  • 1 kudos

I see, I have requested for someone else to guide you on this. cc: @Kaniz_Fatma 

  • 1 kudos
3 More Replies
MYB24
by New Contributor III
  • 6077 Views
  • 7 replies
  • 0 kudos

Resolved! Error: cannot create mws credentials: invalid Databricks Account configuration

Good Evening, I am configuring databricks_mws_credentials through Terraform on AWS.  I am getting the following error:Error: cannot create mws credentials: invalid Databricks Account configuration││ with module.databricks.databricks_mws_credentials.t...

Data Engineering
AWS
credentials
Databricks
Terraform
  • 6077 Views
  • 7 replies
  • 0 kudos
Latest Reply
Alexandre467
New Contributor II
  • 0 kudos

Hello, I'm facing a similaire Issue. I try to update my TF with properly authentification and I have this error ?! â•· │ Error: cannot create mws credentials: failed visitor: context canceled │ │ with databricks_mws_credentials.this, │ on main.tf ...

  • 0 kudos
6 More Replies
riccostamendes
by New Contributor II
  • 21842 Views
  • 3 replies
  • 0 kudos

Just a doubt, can we develop a kedro project in databricks?

I am asking this because up to now I have just seen some examples of deploying a pre-existent kedro project in databricks in order to run some pipelines...

  • 21842 Views
  • 3 replies
  • 0 kudos
Latest Reply
noklam
New Contributor II
  • 0 kudos

Hi! Kedro Dev here. You can surely develop Kedro on Databricks, in fact we have a lot of Kedro project running on Databricks. In the past there has been some friction, mainly because Kedro are project based while Databricks focus a lot on notebook. T...

  • 0 kudos
2 More Replies
dnz
by New Contributor
  • 322 Views
  • 1 replies
  • 0 kudos

Performance Issue with OPTIMIZE Command for Historical Data Migration Using Liquid Clustering

Hello Databricks Community,I’m experiencing performance issues with the OPTIMIZE command when migrating historical data into a table with liquid clustering. Specifically, I am processing one year’s worth of data at a time. For example:The OPTIMIZE co...

  • 322 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dnz, Could you please ensure that the size of your data files is appropriate? Databricks recommends configuring the maximum file size for optimization using the spark.databricks.delta.optimize.maxFileSize setting. Adjusting this can help manage t...

  • 0 kudos
Antoine_B
by New Contributor III
  • 165 Views
  • 1 replies
  • 0 kudos

Table Row Filter with a criteria on CURRENT_USER() belonging to a Unity Catalog group

HelloI defined a Row Filter to exclude some rows for a given user 'user@mail.com' in SQL.Instead of providing a list of users to exclude, I would like to define my criteria on Unity Catalog groups instead of users.Here is my current filter:-- apply f...

  • 165 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Antoine_B, To update your SQL Row Filter to apply based on group membership rather than a specific user, you can use Unity Catalog’s group functionality.

  • 0 kudos
georgef
by New Contributor III
  • 1134 Views
  • 3 replies
  • 0 kudos

Resolved! Cannot import relative python paths

Hello,Some variations of this question have been asked before but there doesn't seem to be an answer for the following simple use case:I have the following file structure on a Databricks Asset Bundles project: src --dir1 ----file1.py --dir2 ----file2...

  • 1134 Views
  • 3 replies
  • 0 kudos
Latest Reply
m997al
Contributor II
  • 0 kudos

Hi.  This was a long-standing issue for me too.  This solution may not be what is desired, but it works perfectly for my needs.In my python code, I have this structure:if __name__ == '__main__': # directory structure where "mycode" is this code ...

  • 0 kudos
2 More Replies
dnz
by New Contributor
  • 157 Views
  • 1 replies
  • 0 kudos

Significant Delay in Deploying View on Unity Catalog Compared to Hive Metastore

Hi everyone,I'm experiencing a significant delay when deploying a view in the Unity Catalog compared to the Hive Metastore. Specifically, the deployment on the Unity Catalog takes 20 to 30 minutes, whereas the same deployment on the Hive Metastore co...

  • 157 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dnz, Unity Catalog might perform extra checks or operations, especially with external tables in S3, though there's no explicit documentation confirming a full data scan. Ensure that your Unity Catalog's metadata operations are optimized and that ...

  • 0 kudos
subha2
by New Contributor II
  • 177 Views
  • 1 replies
  • 0 kudos

Critical/Important Spark Listener for performance tuning of spark code

Please suggest some key/important  Listener which is most helpful in performance tuning of spark code.Kindly suggest how to use the listener in pyspark code to access the metrics as well for reference.

  • 177 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @subha2,  SparkListener: This is the base class for all listeners. It allows you to listen to various events in Spark, such as job start, job end, stage completion, and task completion. By overriding methods in this class, you can gather metrics a...

  • 0 kudos
mdelvaux
by New Contributor
  • 208 Views
  • 1 replies
  • 0 kudos

BigQuery as foreign catalog - full object structs

Hi -We have mounted BigQuery, hosting Google Analytics data, as a foreign catalog.When querying the tables, objects are returned as strings, with all keys obfuscated by "f" or "v", likely to avoid replicating object keys across all records and hence ...

  • 208 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @mdelvaux, It seems like you're dealing with obfuscated keys in your BigQuery foreign catalog setup, which is a method to minimize data transfer by not replicating object keys across records. To map these keys back to their full object structures,...

  • 0 kudos
NK_123
by New Contributor II
  • 281 Views
  • 1 replies
  • 0 kudos

DELTA_INVALID_SOURCE_VERSION issue on spark structure streaming

I am doing a structure streaming and getting this error on databricks, the source table already have 2 versions(0,1). It is still not able to find  Query {'_id': UUID('fe7a563e-f487-4d0e-beb0-efe794ab4708'), '_runId': UUID('bf0e94b5-b6ce-42bb-9bc7-15...

  • 281 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @NK_123, Updating to the latest Databricks runtime may also help.

  • 0 kudos
dadrake3
by New Contributor II
  • 273 Views
  • 2 replies
  • 1 kudos

Delta Live Tables Unity Catalog Insufficient Permissions

I am receiving the following error when I try to run my DLT pipeline with unity catalog enabled.```raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o950.load. : org.apache.spark.SparkSecurityException: [INSUFFICIENT_P...

  • 273 Views
  • 2 replies
  • 1 kudos
Latest Reply
dadrake3
New Contributor II
  • 1 kudos

I have also tried granting all permissions on the schema to myself and to all users and neither helped

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels