cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

VovaVili
by New Contributor
  • 273 Views
  • 2 replies
  • 0 kudos

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Hello all,The official documentation for Databricks Connect states that, for Databricks Runtime versions 13.0 and above, my cluster needs to have Unity Catalog enabled for me to use Databricks Connect, and use a Databricks cluster through an IDE like...

  • 273 Views
  • 2 replies
  • 0 kudos
Latest Reply
mohaimen_syed
New Contributor III
  • 0 kudos

Hi, I'm currently using Databricks Connect without the Unity Catalog on VS Code. Although I have connected the Unity Catalog separately on multiple occasion I don't thing its required.Here is the doc:https://docs.databricks.com/en/dev-tools/databrick...

  • 0 kudos
1 More Replies
Brad
by New Contributor III
  • 17 Views
  • 0 replies
  • 0 kudos

Pushdown in Postgres

Hi team,In Databricks I need to query a postgres source likeselect * from postgres_tbl where id in (select id from df)the df is got from a hive table. If I use JDBC driver, and doquery = '(select * from postgres_tbl) as t' src_df = spark.read.format(...

  • 17 Views
  • 0 replies
  • 0 kudos
AnaMocanu
by Visitor
  • 121 Views
  • 2 replies
  • 0 kudos

Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

  • 121 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

  • 0 kudos
1 More Replies
Clampazzo
by Visitor
  • 37 Views
  • 1 replies
  • 0 kudos

Can I see queries sent to All Purpose Compute from Power BI?

I am brand new to Databricks and am working on connecting a power bi semantic model to our databricks instance.  I have successfully connected it to an All Purpose Compute but was wondering if there was a way I could see the queries that power bi is ...

Data Engineering
Power BI
sql
  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
Gaut23
New Contributor II
  • 0 kudos

For All purpose compute, best bet would be to use the system tables,specifically the system.access.audit table.  https://docs.databricks.com/en/administration-guide/system-tables/index.html

  • 0 kudos
Olaoye_Somide
by New Contributor
  • 93 Views
  • 1 replies
  • 0 kudos

How to Implement Custom Logging in Databricks without Using _jvm Attribute with Spark Connect?

Hello Databricks Community,I am currently working in a Databricks environment and trying to set up custom logging using Log4j in a Python notebook. However, I've run into a problem due to the use of Spark Connect, which does not support the _jvm attr...

Data Engineering
Apache Spark
data engineering
  • 93 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Contributor III
  • 0 kudos

import logging logging.getLogger().setLevel(logging.WARN) log = logging.getLogger("DATABRICKS-LOGGER") log.warning("Hello")

  • 0 kudos
anish2102
by New Contributor II
  • 177 Views
  • 4 replies
  • 1 kudos

Resolved! Pyspark operations slowness in CLuster 14.3LTS as compared to 13.3 LTS

In my notebook, i am performing few join operations which are taking more than 30s in cluster 14.3 LTS where same operation is taking less than 4s in 13.3 LTS cluster. Can someone help me how can i optimize pyspark operations like joins and withColum...

Data Engineering
clustr-14.3
spark-3.5
  • 177 Views
  • 4 replies
  • 1 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 1 kudos

Thank you for sharing the analysis

  • 1 kudos
3 More Replies
SG
by New Contributor II
  • 550 Views
  • 3 replies
  • 1 kudos

Customize job run name when running jobs from adf

Hi guys, i am running my Databricks jobs on a cluster job from azure datafactory using a databricks Python activity When I monitor my jobs in workflow-> job runs . I see that the run name is a concatenation of adf pipeline name , Databricks python ac...

  • 550 Views
  • 3 replies
  • 1 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 1 kudos

I don't think that level of customisation is provided. However, I can suggest some workarounds:REST API: Create a job on the fly with desired name within ADF and trigger it using REST API in Web activity. This way you can track job completion status ...

  • 1 kudos
2 More Replies
Mohit_m
by Valued Contributor II
  • 1183 Views
  • 2 replies
  • 3 kudos

Resolved! Could not initialize class error

User is running a job triggered from ADF in Databricks. In this job they need to use custom libraries that are in jars. Most of the times jobs are running fine, however sometimes it fails with:java.lang.NoClassDefFoundError: Could not initializeAny s...

  • 1183 Views
  • 2 replies
  • 3 kudos
Latest Reply
Mohit_m
Valued Contributor II
  • 3 kudos

Can you please check if there are more than one jar containing this class . If multiple jars of the same type are available on the cluster, then there is no guarantee of JVM picking the proper classes for processing, which results in the intermittent...

  • 3 kudos
1 More Replies
Jorge3
by New Contributor III
  • 143 Views
  • 3 replies
  • 2 kudos

Resolved! [Databricks Assets Bundles] Workflow trigger on file arrival

Hi everyone!I'm setting up a workflow using Databricks Assets Bundles (DABs). And I want to configure my workflow to be trigger on file arrival. However all the examples I've found in the documentation use schedule triggers. Does anyone know if it is...

  • 143 Views
  • 3 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi @Jorge3 Yes, you can use continues mode also.Please find syntax below - resources: jobs: dbx_job: name: continuous_job_name continuous: pause_status: UNPAUSED queue: enabled: true

  • 2 kudos
2 More Replies
Rene
by New Contributor
  • 60 Views
  • 1 replies
  • 0 kudos

Can we build IOT data trading platform by using Databricks?

I have an idea of sharing & trading IoT data streamlined from many data sources on the incentive platform.I would be appreciate it if you guys discuss with me about the idea.Thank you

  • 60 Views
  • 1 replies
  • 0 kudos
Latest Reply
betty4920taylor
  • 0 kudos

Hello @Rene,Building an IoT data trading platform using Databricks is indeed a feasible and innovative idea. Databricks provides a unified analytics platform that can handle massive amounts of data processing and advanced analytics, which is essentia...

  • 0 kudos
ismaelhenzel
by New Contributor II
  • 541 Views
  • 2 replies
  • 1 kudos

Resolved! Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken p...

Data Engineering
bunlde
CICD
Databricks
  • 541 Views
  • 2 replies
  • 1 kudos
Latest Reply
ismaelhenzel
New Contributor II
  • 1 kudos

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to imp...

  • 1 kudos
1 More Replies
smedegaard
by New Contributor III
  • 83 Views
  • 2 replies
  • 1 kudos

[delta live tabel] exception: getPrimaryKeys not implemented for debezium

I've defined a streaming deltlive table in a notebook using python.running on "preview" channeldelta cache accelerated (Standard_D4ads_v5) computeIt fails withorg.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, ru...

  • 83 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @smedegaard,  You’re encountering a StreamingQueryException with the message: “getPrimaryKeys not implemented for debezium SQLSTATE: XXKST.” This error suggests that the getPrimaryKeys operation is not supported for the Debezium connector in your ...

  • 1 kudos
1 More Replies
Phani1
by Valued Contributor
  • 60 Views
  • 1 replies
  • 0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

  • 60 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, Let’s explore the integration of Databricks with Boomi and compare it to Azure Event Hub. Databricks Integration with Boomi: Databricks is a powerful data analytics platform that allows you to process large-scale data and build machin...

  • 0 kudos
ETLdeveloper
by New Contributor II
  • 86 Views
  • 1 replies
  • 0 kudos

Resolved! I have to run the notebook in concurrently using process pool executor in python

Hello All,My scenario required me to create a code that reads tables from the source catalog and writes them to the destination catalog using Spark. Doing one by one is not a good option when there are 300 tables in the catalog. So I am trying the pr...

  • 86 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @ETLdeveloper You can use the multithreading that help you to run notebook in parallel.Attaching code for your reference - from concurrent.futures import ThreadPoolExecutor class NotebookData: def __init__(self, path, timeout, parameters = Non...

  • 0 kudos
Labels
Top Kudoed Authors