cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dg
by New Contributor II
  • 22303 Views
  • 7 replies
  • 3 kudos

Trying to use pdf2image on databricks

Trying to use pdf2image on databricks, but its failing with "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?"I've installed pdf2image & poppler-utils by running the following in a cell:%pip install pdf2image%pip ...

  • 22303 Views
  • 7 replies
  • 3 kudos
Latest Reply
Slalom_Tobias
New Contributor III
  • 3 kudos

Seems like this thread has died, but for posterity, databricks provides the following code for installing poppler on a cluster. The code is sourced from the dbdemos accelerators, specifically the "LLM Chatbot With Retrieval Augmented Generation (RAG)...

  • 3 kudos
6 More Replies
Ravikumashi
by Contributor
  • 10428 Views
  • 8 replies
  • 0 kudos

failed to initialise azure-event-hub with azure AAD(service principal)

We have been trying to authenticate azure-event-hub with azure AD(service principal) instead of shared access key(connection string) and read events from azure-event-hub and it is failing to initialise azure-event-hubs. And throwing no such method ex...

Error message full
  • 10428 Views
  • 8 replies
  • 0 kudos
Latest Reply
Ravikumashi
Contributor
  • 0 kudos

@swathi-dataops I have added ServicePrincipalCredentialsAuth and ServicePrincipalAuthBase as a normal classes instead of creating a separate jar for these 2 classes and packaged them as a part of my project jar.And used the below code for configuring...

  • 0 kudos
7 More Replies
vinay076
by New Contributor III
  • 4978 Views
  • 4 replies
  • 0 kudos

MY exam got suspended

Hello Team,I encountered Pathetic experience while attempting my 1st Data Bricks certification. I was continuously in front of the camera and an alert appeared and then my exam resumed.Later a support person asked me to show the full room and I showe...

  • 4978 Views
  • 4 replies
  • 0 kudos
Latest Reply
Cert-Team
Databricks Employee
  • 0 kudos

Thanks @vinay076 You will receive notice of the reschedule via an email from Webassessor.

  • 0 kudos
3 More Replies
raghu2
by New Contributor III
  • 2302 Views
  • 0 replies
  • 0 kudos

DLT table from a text source

I am trying to create a delta live table by reading a text source. I get an error message that states that both source and target should be in delta format. Am I missing something? 

  • 2302 Views
  • 0 replies
  • 0 kudos
Edthehead
by Contributor III
  • 6197 Views
  • 5 replies
  • 0 kudos

Incremental join transformation using Delta live tables

I'm attempting to build an incremental data processing pipeline using delta live tables. The aim to stream data from a source multiple times in a day and join the data within the specific increment only.I'm using autoloader to load the data increment...

pic.png
  • 6197 Views
  • 5 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

basically you want to do a stream-stream join.  If you want to do that you need to take a few things into account (see link).DLT might do this for you, but I never used it so I cannot confirm that.If your source tables are delta tables, you could ind...

  • 0 kudos
4 More Replies
SamarthJain
by New Contributor II
  • 8626 Views
  • 4 replies
  • 2 kudos

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your...

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your help here to understand what happens internally at the "Stream Initializing" phase of the Spark Streaming job tha...

  • 8626 Views
  • 4 replies
  • 2 kudos
Latest Reply
MohsenJ
Contributor
  • 2 kudos

I'm facing the same issue when I try to run this example Create a monitor using the API | Databricks on AWS (Inference Lakehouse Monitor regression example notebook). any idea? 

  • 2 kudos
3 More Replies
ZacayDaushin
by New Contributor
  • 1794 Views
  • 1 replies
  • 0 kudos

spline agent in Databricks use

spline Agent I use spline agent to get lineage of Databricks notebooks and for that i put the following code - attached to the notebook But i get the error attached%scalaimport scala.util.parsing.json.JSONimport za.co.absa.spline.harvester.SparkLinea...

  • 1794 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

Could be me  but I do not see an error message?

  • 0 kudos
ktsoi
by New Contributor III
  • 4396 Views
  • 4 replies
  • 0 kudos

Resolved! INVALID_STATE: Storage configuration limit exceeded, only 11 storage configurations are allowed

Our team are trying to set up a new workspace (8th workspace), but failed to create the storage configurations required for the new workspace with an error of INVALID_STATE: Storage configuration limit exceeded, only 11 storage configurations are all...

  • 4396 Views
  • 4 replies
  • 0 kudos
Latest Reply
_Architect_
New Contributor II
  • 0 kudos

I solved the issue by simply going into Cloud Resources in Databricks console and navigated to "Credential Configuration" and "Storage Configuration" and deleted all the configurations which are not needed anymore(belongining to deleted workspaces)I ...

  • 0 kudos
3 More Replies
Arinjay
by New Contributor
  • 1875 Views
  • 1 replies
  • 0 kudos

Can not add comment on table via create table statement

I am not able to add comment using this create table statement with as (query)

Arinjay_0-1711492175110.png
  • 1875 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

  CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source [ OPTIONS ( key1=val1, key2=val2, ... ) ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED B...

  • 0 kudos
Haylyon
by New Contributor II
  • 12287 Views
  • 3 replies
  • 3 kudos

Missing 'DBAcademy DLT' as a Cluster Policy when creating Delta Live Tables pipeline

I am currently in the middle of the Data Engineering Associate course on the Databricks Partner Academy. I am on module 4 - "Build Data Pipelines with Delta Live Tables", and trying to complete the lab "DE 4.1 - DLT UI Walkthrough". I have successful...

  • 12287 Views
  • 3 replies
  • 3 kudos
Latest Reply
SeRo
New Contributor II
  • 3 kudos

Policy will be available after running/Users/<YOUR USER NAME>/Data Engineering with Databricks - v3.1.4/Includes/Workspace-Setup

  • 3 kudos
2 More Replies
brian999
by Contributor
  • 3508 Views
  • 3 replies
  • 0 kudos

Writing to Snowflake from Databricks - sqlalchemy replacement?

I am trying to migrate some complex python load processes into databricks. Our load processes currently use pandas and we're hoping to refactor into Spark soon. For now, I need to figure out how to alter our functions that get sqlalchemy connection e...

  • 3508 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@brian999  -  spark-snowflake connector is inbuilt into the DBR. Please refer to the below article for examples.  https://docs.databricks.com/en/connect/external-systems/snowflake.html#read-and-write-data-from-snowflake Please let us know if this hel...

  • 0 kudos
2 More Replies
kmodelew
by New Contributor III
  • 3283 Views
  • 1 replies
  • 0 kudos

TaskSensor - check if task is succeded

Hi,I would like to check if the task within job is succeded (even the job is marked as failed because on of the tasks).I need to create dependency for tasks within other jobs. The case is that I have one job for loading all tables for one country. Re...

  • 3283 Views
  • 1 replies
  • 0 kudos
JoseMacedo
by New Contributor II
  • 2680 Views
  • 3 replies
  • 0 kudos

How to cache on 500 billion rows

Hello!I'm using a server less SQL cluster on Data bricks and I have a dataset on Delta Table that has 500 billion rows. I'm trying to filter to have around 7 billion and the cache that dataset to use it on other queries and make it run faster.When I ...

  • 2680 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I missed the 'serverless sql' part.  CACHE is for spark, I don´t think it works for serverless sql.Here is how caching works on DBSQL.

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels