cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

MartinIsti
by New Contributor III
  • 1687 Views
  • 3 replies
  • 1 kudos

Resolved! DLT - runtime parameterisation of execution

I have started to use DLT in a prototype framework and I now face the below challenge for which any help would be appreciated.First let me give a brief context:I have metadata sitting in a .json file that I read as the first task and put it into a lo...

Data Engineering
configuration
Delta Live Table
job
parameters
workflow
  • 1687 Views
  • 3 replies
  • 1 kudos
Latest Reply
data-engineer-d
Contributor
  • 1 kudos

@Kaniz_Fatma Can you please provide some reference to REST API approach? I do not see that available on the docs. TIA

  • 1 kudos
2 More Replies
pSdatabricks
by New Contributor II
  • 2744 Views
  • 3 replies
  • 0 kudos

Azure Databricks Monitoring & Alerting (Data Observability) Tools / Frameworks for Enterprise

I am trying to evaluate options for Monitoring and Alerting tools like New Relic, Datadog, Grafana with Databricks on Azure . No one supports when reached out to them. I would like to hear from the databricks team on the recommended tool / framework ...

  • 2744 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sruthivika
New Contributor II
  • 0 kudos

I'd recommend this new tool we've been trying out. It's really helpful for monitoring and provides good insights on how Azure Databricks clusters, pools & jobs are doing – like if they're healthy or having issues. It brings everything together, makin...

  • 0 kudos
2 More Replies
FlexException
by New Contributor II
  • 4601 Views
  • 5 replies
  • 1 kudos

Dynamic Number of Tasks in Databricks Workflow

Do Databricks workflows support creating a workflow with a dynamic number of tasks?For example, let's say we have a DAG like this:T1 ->    T2(1) ->             T2(2) ->              .....                 -> T3             T2(n-1) ->             T2(n)...

  • 4601 Views
  • 5 replies
  • 1 kudos
Latest Reply
tanyeesern
New Contributor II
  • 1 kudos

@FlexException Databricks API supports job creation and execution Task Parameters and Values in Databricks Workflows | by Ryan Chynoweth | MediumOne possibility is after running earlier job, process the output to create a dynamic number of tasks in s...

  • 1 kudos
4 More Replies
superspan
by New Contributor II
  • 806 Views
  • 2 replies
  • 0 kudos

How to access Spark UI metrics in an automated way (API)

I am doing some automated testing; and would like ultimately to access per job/stage/task metrics as shown in the UI (e.g. spark UI -> sql dataframe) -> plan visualization in an automated way (API is ideal; but some ad-hoc metrics pipelines from loca...

  • 806 Views
  • 2 replies
  • 0 kudos
Latest Reply
superspan
New Contributor II
  • 0 kudos

Thanks for the response. This enables the event logs. But the event logs seem to be empty. Would you know where I can get the spark metrics as seen from the spark ui.

  • 0 kudos
1 More Replies
Geoff123
by New Contributor III
  • 3212 Views
  • 8 replies
  • 0 kudos

Trouble on Accessing Azure Storage from Databricks (Python)

I used the same accessing method shown in https://community.databricks.com/t5/data-engineering/to-read-data-from-azure-storage/td-p/32230 but kept get the error below.org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient pr...

  • 3212 Views
  • 8 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

Hi,you can find storage account firewall information by accessing resource in azure portal Please mind that if you are using Unity Catalog you should NOT mount Storage Account, you should rather use abstraction of Storage Creadentials and External Lo...

  • 0 kudos
7 More Replies
Marinagomes
by New Contributor
  • 1137 Views
  • 1 replies
  • 0 kudos

raise Py4JJavaError while changing data type of a column

HiI'm using Azure databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12). I'm trying to convert 2 columns from string data type to timestamp data type . My date columns are in below format2/18/2021 7:20:12 PMSo I wrote following commandfrom py...

  • 1137 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Marinagomes,  Try Using try_to_timestamp: Instead of to_timestamp, consider using try_to_timestamp. It returns null for malformed expressions, which can help identify problematic rows.

  • 0 kudos
databrick53
by New Contributor II
  • 2080 Views
  • 6 replies
  • 0 kudos

can't execute the code

When I was executing the code, I was getting this error:"Notebook detached×Exception when creating execution context: java.net.SocketTimeoutException: Connect Timeout"Can someone help me?

  • 2080 Views
  • 6 replies
  • 0 kudos
Latest Reply
toolhater
New Contributor II
  • 0 kudos

as of last night 3/27 it looks like it was working again

  • 0 kudos
5 More Replies
Cheryl
by New Contributor II
  • 1794 Views
  • 3 replies
  • 0 kudos

Query example for databricks Query History API

Hi I am trying to get query history data from my SQL warehouse. Following previous examples is not working. databricks_workspace_url = "xxx"token = "xxx"start_time = 1707091200end_time = 1707174000api_endpoint = f"{databricks_workspace_url}/api/2.0/s...

  • 1794 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@Cheryl - you can use query_start_time=2023-01-01T00:00:00Z  as a parameter to filter for the time frame. available filter criteria are given below - https://docs.databricks.com/api/workspace/queryhistory/list#filter_by-query_start_time_range    

  • 0 kudos
2 More Replies
HelloDatabricks
by New Contributor II
  • 2947 Views
  • 5 replies
  • 8 kudos

Connect Timeout - Error when trying to run a cell

Hello everybody.Whenever I am trying to run a simple cell I receive the following error message now:Notebook detached. Exception when creating expectation context: java.net.SocketTimeoutException: Connect Timeout.After that error message the cluster ...

  • 2947 Views
  • 5 replies
  • 8 kudos
Latest Reply
MarijaS
New Contributor III
  • 8 kudos

today is ok

  • 8 kudos
4 More Replies
RajNath
by New Contributor II
  • 1294 Views
  • 2 replies
  • 0 kudos

Traversing to previous rows and getting the data based on condition

Sample Input data setClusterIdEventEventTime1212-18-r9u1kzn1RUNNING2024-02-02T11:38:30.168+00:001212-18-r9u1kzn1TERMINATING2024-02-02T13:43:33.933+00:001212-18-r9u1kzn1STARTING2024-02-02T15:50:05.174+00:001212-18-r9u1kzn1RUNNING2024-02-02T15:54:21.51...

  • 1294 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @RajNath , Handling event times and aggregations in large datasets can be challenging, but Structured Streaming in Databricks provides powerful tools to address this. Let’s break down your requirements and explore how you can achieve them: Ru...

  • 0 kudos
1 More Replies
RajNath
by New Contributor II
  • 1615 Views
  • 2 replies
  • 0 kudos

Cost of using delta sharing with unity catalog

I am new to databricks delta sharing. In case of delta sharing, i don't see any cluster running. Tried looking for documentation but only hint i got is, it usage delta sharing server but what is the cost of it and how to configure and optimize for la...

  • 1615 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @RajNath, Let’s dive into the world of Delta Sharing and explore how it works, its cost implications, and optimization strategies. What is Delta Sharing? Delta Sharing is a secure data-sharing platform developed by Databricks. It allows you to ...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2461 Views
  • 3 replies
  • 3 kudos

Resolved! 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11 Connect Timeout

"Notebook detached Exception when creating execution context: java.net.SocketTimeout Exception: Connect Timeout" when trying to connect my cluster to a notebook. Then "Error trying to handle that request We failed to handle that request, please try a...

  • 2461 Views
  • 3 replies
  • 3 kudos
Latest Reply
Wolverine
New Contributor III
  • 3 kudos

Hello @Kaniz_Fatma  I am facing same issue I tried changing DBR but it is still giving me error and the cluster is not startingRegardsMS

  • 3 kudos
2 More Replies
dg
by New Contributor II
  • 10129 Views
  • 7 replies
  • 1 kudos

Trying to use pdf2image on databricks

Trying to use pdf2image on databricks, but its failing with "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?"I've installed pdf2image & poppler-utils by running the following in a cell:%pip install pdf2image%pip ...

  • 10129 Views
  • 7 replies
  • 1 kudos
Latest Reply
Slalom_Tobias
New Contributor III
  • 1 kudos

Seems like this thread has died, but for posterity, databricks provides the following code for installing poppler on a cluster. The code is sourced from the dbdemos accelerators, specifically the "LLM Chatbot With Retrieval Augmented Generation (RAG)...

  • 1 kudos
6 More Replies
Ravikumashi
by Contributor
  • 4743 Views
  • 8 replies
  • 0 kudos

failed to initialise azure-event-hub with azure AAD(service principal)

We have been trying to authenticate azure-event-hub with azure AD(service principal) instead of shared access key(connection string) and read events from azure-event-hub and it is failing to initialise azure-event-hubs. And throwing no such method ex...

Error message full
  • 4743 Views
  • 8 replies
  • 0 kudos
Latest Reply
Ravikumashi
Contributor
  • 0 kudos

@swathi-dataops I have added ServicePrincipalCredentialsAuth and ServicePrincipalAuthBase as a normal classes instead of creating a separate jar for these 2 classes and packaged them as a part of my project jar.And used the below code for configuring...

  • 0 kudos
7 More Replies
Constantine
by Contributor III
  • 4014 Views
  • 5 replies
  • 1 kudos

Resolved! How to use Databricks Query History API (REST API)

I have setup authentication using this page https://docs.databricks.com/sql/api/authentication.html and run curl -n -X GET https://<databricks-instance>.cloud.databricks.com/api/2.0/sql/history/queriesTo get history of all sql endpoint queries, but I...

  • 4014 Views
  • 5 replies
  • 1 kudos
Latest Reply
MorpheusGoGo
New Contributor II
  • 1 kudos

Are you sure this works?payload = { "filter_by": {    }, "max_results": 1} Returns 1 result. payload = { "filter_by": {      "query_start_time_range":{       "start_time_ms" :1640995200000,        "end_time_ms" : 1641081599000   } }, "max_results": 1...

  • 1 kudos
4 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels