cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

angel_ba
by New Contributor II
  • 963 Views
  • 1 replies
  • 0 kudos

unity catalog system.access.audit lag

Hello,We have unity catalog enabled workspace. To get the completion time of a pipeline that runs multiple times a day, I am  checking system.access.audit table. Comparing the completion time of the pipeline compared to other pipeline time I am creat...

  • 963 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@angel_ba System tables are still in public preview thus there are some limitations, one of them is a blocker for your use case.Currently no support for real-time monitoring. Data is updated throughout the day. If you don’t see a log for a recent eve...

  • 0 kudos
nikhilkumawat
by New Contributor III
  • 8464 Views
  • 6 replies
  • 4 kudos

Resolved! Get file information while using "Trigger jobs when new files arrive" https://docs.databricks.com/workflows/jobs/file-arrival-triggers.html

I am currently trying to use this feature of "Trigger jobs when new file arrive" in one of my project. I have an s3 bucket in which files are arriving on random days. So I created a job to and set the trigger to "file arrival" type. And within the no...

  • 8464 Views
  • 6 replies
  • 4 kudos
Latest Reply
adriennn
Contributor
  • 4 kudos

Looks like a major oversight not to be able to get the information on what file(s) have triggered the job. Anyway, the above explanations given by Anon read like the replies of ChatGPT, especially the scenario where a dataframe is passed to a trigger...

  • 4 kudos
5 More Replies
zahra_Khedri
by New Contributor
  • 416 Views
  • 1 replies
  • 0 kudos

An error occurred when loading Jobs and Workflows App.

Hi,I was trying to open the Workflows but there is an error "An error occurred when loading Jobs and Workflows App." we need help to know why it happened and how we can resolve it please. 

Screenshot 2024-04-25 at 11.31.53.png
  • 416 Views
  • 1 replies
  • 0 kudos
Latest Reply
GeoPer
New Contributor II
  • 0 kudos

Same...and the weirdest is that all of the services looks healthy in https://status.databricks.com/Region: eu-central-1Provider: AWSCould anyone provide some info here?

  • 0 kudos
deng_dev
by New Contributor III
  • 569 Views
  • 1 replies
  • 0 kudos

Cached Views in MERGE INTO operation

Hi everyone!I want to use in-memory cached views in a merge into operation, but I am not entirely sure if the exactly saved in-memory view is used in this operation or not.So, suppose I have a table named table_1 and a cached view named cached_view_1...

  • 569 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@deng_dev - Are you using external metastore by any chance. From the physical plan, we could see the catalog`.`db`.`table_1` is not cached.  If it is glue catalog, then caching can be enabled based on the below configs in the article below https://do...

  • 0 kudos
Anonymous
by Not applicable
  • 8440 Views
  • 15 replies
  • 8 kudos

Resolved! What are some best practices for CICD?

A number of people have questions on using Databricks in a productionalized environment. What are the best practices to enable CICD automation?

  • 8440 Views
  • 15 replies
  • 8 kudos
Latest Reply
BaivabMohanty
New Contributor II
  • 8 kudos

Any leads/posts for Databricks CI/CD  integration with Bitbucket pipeline. I am facing the below error while I creation my CICD pipeline pipelines:branches:master:- step:name: Deploy Databricks Changesimage: docker:19.03.12services:- dockerscript:# U...

  • 8 kudos
14 More Replies
RakeshRakesh_De
by New Contributor III
  • 4609 Views
  • 7 replies
  • 0 kudos

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Hi,I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3I have tried all possible way but its not w...

RakeshRakesh_De_0-1713431921922.png
Data Engineering
csv
EmptyValue
FileRead
  • 4609 Views
  • 7 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

OK, after some tests:The trick is in surrounding text in your csv with quotes.  Like that spark can actually make a difference between a missing value and an empty value.  Missing values are null and can only be converted to something else implicitel...

  • 0 kudos
6 More Replies
Brad
by Contributor
  • 672 Views
  • 2 replies
  • 0 kudos

Pushdown in Postgres

Hi team,In Databricks I need to query a postgres source likeselect * from postgres_tbl where id in (select id from df)the df is got from a hive table. If I use JDBC driver, and doquery = '(select * from postgres_tbl) as t' src_df = spark.read.format(...

  • 672 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brad
Contributor
  • 0 kudos

Thanks for response. I cannot do that as we incrementally loading from source very frequently. We cannot read full data each time.

  • 0 kudos
1 More Replies
AnaMocanu
by New Contributor III
  • 1708 Views
  • 2 replies
  • 1 kudos

Resolved! Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

  • 1708 Views
  • 2 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

  • 1 kudos
1 More Replies
Clampazzo
by New Contributor II
  • 716 Views
  • 1 replies
  • 0 kudos

Can I see queries sent to All Purpose Compute from Power BI?

I am brand new to Databricks and am working on connecting a power bi semantic model to our databricks instance.  I have successfully connected it to an All Purpose Compute but was wondering if there was a way I could see the queries that power bi is ...

Data Engineering
Power BI
sql
  • 716 Views
  • 1 replies
  • 0 kudos
Latest Reply
Gaut23
New Contributor II
  • 0 kudos

For All purpose compute, best bet would be to use the system tables,specifically the system.access.audit table.  https://docs.databricks.com/en/administration-guide/system-tables/index.html

  • 0 kudos
Olaoye_Somide
by New Contributor III
  • 1587 Views
  • 1 replies
  • 0 kudos

How to Implement Custom Logging in Databricks without Using _jvm Attribute with Spark Connect?

Hello Databricks Community,I am currently working in a Databricks environment and trying to set up custom logging using Log4j in a Python notebook. However, I've run into a problem due to the use of Spark Connect, which does not support the _jvm attr...

Data Engineering
Apache Spark
data engineering
  • 1587 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Valued Contributor
  • 0 kudos

import logging logging.getLogger().setLevel(logging.WARN) log = logging.getLogger("DATABRICKS-LOGGER") log.warning("Hello")

  • 0 kudos
anish2102
by New Contributor III
  • 1784 Views
  • 4 replies
  • 1 kudos

Resolved! Pyspark operations slowness in CLuster 14.3LTS as compared to 13.3 LTS

In my notebook, i am performing few join operations which are taking more than 30s in cluster 14.3 LTS where same operation is taking less than 4s in 13.3 LTS cluster. Can someone help me how can i optimize pyspark operations like joins and withColum...

Data Engineering
clustr-14.3
spark-3.5
  • 1784 Views
  • 4 replies
  • 1 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 1 kudos

Thank you for sharing the analysis

  • 1 kudos
3 More Replies
SG
by New Contributor II
  • 1238 Views
  • 3 replies
  • 1 kudos

Customize job run name when running jobs from adf

Hi guys, i am running my Databricks jobs on a cluster job from azure datafactory using a databricks Python activity When I monitor my jobs in workflow-> job runs . I see that the run name is a concatenation of adf pipeline name , Databricks python ac...

  • 1238 Views
  • 3 replies
  • 1 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 1 kudos

I don't think that level of customisation is provided. However, I can suggest some workarounds:REST API: Create a job on the fly with desired name within ADF and trigger it using REST API in Web activity. This way you can track job completion status ...

  • 1 kudos
2 More Replies
Mohit_m
by Valued Contributor II
  • 3018 Views
  • 2 replies
  • 3 kudos

Resolved! Could not initialize class error

User is running a job triggered from ADF in Databricks. In this job they need to use custom libraries that are in jars. Most of the times jobs are running fine, however sometimes it fails with:java.lang.NoClassDefFoundError: Could not initializeAny s...

  • 3018 Views
  • 2 replies
  • 3 kudos
Latest Reply
Mohit_m
Valued Contributor II
  • 3 kudos

Can you please check if there are more than one jar containing this class . If multiple jars of the same type are available on the cluster, then there is no guarantee of JVM picking the proper classes for processing, which results in the intermittent...

  • 3 kudos
1 More Replies
Jorge3
by New Contributor III
  • 1504 Views
  • 3 replies
  • 2 kudos

Resolved! [Databricks Assets Bundles] Workflow trigger on file arrival

Hi everyone!I'm setting up a workflow using Databricks Assets Bundles (DABs). And I want to configure my workflow to be trigger on file arrival. However all the examples I've found in the documentation use schedule triggers. Does anyone know if it is...

  • 1504 Views
  • 3 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi @Jorge3 Yes, you can use continues mode also.Please find syntax below - resources: jobs: dbx_job: name: continuous_job_name continuous: pause_status: UNPAUSED queue: enabled: true

  • 2 kudos
2 More Replies
ismaelhenzel
by New Contributor III
  • 1810 Views
  • 2 replies
  • 2 kudos

Resolved! Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken p...

Data Engineering
bunlde
CICD
Databricks
  • 1810 Views
  • 2 replies
  • 2 kudos
Latest Reply
ismaelhenzel
New Contributor III
  • 2 kudos

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to imp...

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels