Data Engineering

Forum Posts

Sorted by:

by emanuelsh • New Contributor

07-27-2023 1:41:43 AM

599 Views
0 replies
0 kudos

Schema Evolution from Kafka Source

Hi,I have a Spark streaming process that reads data from a Kafka topic to Azure DLThis is how I implement the MERGE capability into the delta table.In addition to the same topic, I have another streaming process that simply writes data to DLIn kafka ...

Data Engineering

599 Views
0 replies
0 kudos

07-27-2023 1:41:43 AM

by katedb • New Contributor

07-21-2023 1:57:53 AM

792 Views
1 replies
0 kudos

Clusters do not start - bootstrap timeout

Hello,Whenever I try to start any of already existing clusters, I get Bootstrap timeout error. In the logs, there are following messages:[Bootstrap Event] Can reach databricks-update-oregon.s3.us-west-2.amazonaws.com: [FAILED] [ 257.556698] audit: k...

Data Engineering

bootstrap

compute

792 Views
1 replies
0 kudos

07-21-2023 1:57:53 AM

View Replies

Latest Reply

User16752239289
Valued Contributor

07-26-2023 3:59:00 PM

0 kudos

The error message indicate the ec2 instance cannot access databricks-update-oregon.s3.us-west-2.amazonaws.com. Do you have s3 endpoint setup or can traffic route to databricks-update-oregon.s3.us-west-2.amazonaws.com ?

0 kudos

07-26-2023 3:59:00 PM

by NWIEFInance • New Contributor

07-24-2023 11:24:15 AM

756 Views
1 replies
0 kudos

Connect to EXCEL

I have hardtime connecting my existing EXCEL file to source data from DataBricks and need help

Data Engineering

756 Views
1 replies
0 kudos

07-24-2023 11:24:15 AM

View Replies

Latest Reply

User16539034020
Contributor II

07-26-2023 2:48:15 PM

0 kudos

Hi, Thanks for contacting Databricks Support. We doesn't support direct Excel-Databricks connectivity. However, Databricks can be accessed through ODBC and JDBC interfaces, and we can leverage these with Excel's Power Query functionality for indirect...

0 kudos

07-26-2023 2:48:15 PM

by matanper • New Contributor III

07-25-2023 7:07:19 AM

2137 Views
5 replies
1 kudos

Custom docker image fails to initalize

I'm trying to use a custom docker image for my job. This is my docker file:FROM databricksruntime/standard:12.2-LTS COPY . . RUN /databricks/python3/bin/pip install -U pip RUN /databricks/python3/bin/pip install -r requirements.txt USER rootMy job ...

Data Engineering

2137 Views
5 replies
1 kudos

07-25-2023 7:07:19 AM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

07-26-2023 8:30:05 AM

1 kudos

Hi, I think, disabling iptables will be better in this case, could you please try the below command and confirm? $ sudo iptables -S

1 kudos

07-26-2023 8:30:05 AM

4 More Replies

by Łukasz • New Contributor III

07-18-2023 5:02:06 AM

2308 Views
6 replies
5 kudos

Resolved! Dense rank possible bug

I have the case of deduplicating data source over specific business key using dense_rank function. Currently the data source does not have any duplicates, so the function should return 1 in all cases. The issue is that dense rank does not return prop...

Data Engineering

2308 Views
6 replies
5 kudos

07-18-2023 5:02:06 AM

View Replies

Latest Reply

saipujari_spark
Valued Contributor

07-19-2023 12:37:11 PM

5 kudos

Hey @Łukasz Thanks for reporting.As I see Spark 3.4.0 introduced an improvement that looks to be the cause for this issue.Improvement: https://issues.apache.org/jira/browse/SPARK-37099Similar Bug: https://issues.apache.org/jira/browse/SPARK-44448This...

5 kudos

07-19-2023 12:37:11 PM

5 More Replies

by 415963 • New Contributor II

07-25-2023 7:49:16 AM

1662 Views
3 replies
2 kudos

Not able to catch structured streaming exception

I would like to catch and handle an exception in a structured streaming job.The databricks notebook still displays the exception, regardless of added exception handling (see attached screenshot)I guess that the exception is displayed by the cell outp...

Data Engineering

1662 Views
3 replies
2 kudos

07-25-2023 7:49:16 AM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

07-26-2023 8:21:15 AM

2 kudos

Hi, I understand, could you please also provide the last line of the error after scrolling down in the notebook cell?

2 kudos

07-26-2023 8:21:15 AM

2 More Replies

by Retko • Contributor

07-25-2023 1:33:23 AM

4281 Views
4 replies
2 kudos

Running Command is often stuck on "Running Command..."

Hi,when running command, it often gets stuck and message below it says: "Running Command..."What can I do with it besides of restarting cluster?Also tried reattaching and clearing state, but no help here.Thanks

Data Engineering

4281 Views
4 replies
2 kudos

07-25-2023 1:33:23 AM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

07-26-2023 8:14:15 AM

2 kudos

Hi, do you see this while running a command in the notebook? Please tag @Debayan with your next comment which will notify me. Thanks!

2 kudos

07-26-2023 8:14:15 AM

3 More Replies

by DennisB • New Contributor III

07-24-2023 4:47:42 AM

2021 Views
4 replies
2 kudos

Resolved! Better Worker Node Core Utilisation

Hi everyone,Hoping someone can help me with this problem. I have an embarrassingly parallel workload, which I'm parallelising over 4 worker nodes (of type Standard_F4, so 4 cores each). Each workload is single-threaded, so I believe that only one cor...

Data Engineering

2021 Views
4 replies
2 kudos

07-24-2023 4:47:42 AM

View Replies

Latest Reply

DennisB
New Contributor III

07-26-2023 5:51:24 AM

2 kudos

So I managed to get the 1-core-per-executor working successfully. The bit that wasn't working was spark.executor.memory -- this was too high, but lowering it so that the sum of the executors memory was ~90% of the worker node's memory allowed it to w...

2 kudos

07-26-2023 5:51:24 AM

3 More Replies

by MadrasSenpai • New Contributor II

06-12-2023 12:14:21 PM

903 Views
3 replies
2 kudos

How to install cmdstanpy in dbx cluster

I have built an HMC model using cmdstand. In my local machine, I have install cmdstan for the following approach. import cmdstanpy cmdstanpy.install_cmdstan()But in Databricks I need to reinstall it every time when I train a new model, from the noteb...

Data Engineering

903 Views
3 replies
2 kudos

06-12-2023 12:14:21 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 11:10:31 PM

2 kudos

Hi @Rajamannar Aanjaram Krishnamoorthy Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

2 kudos

06-15-2023 11:10:31 PM

2 More Replies

by sarguido • New Contributor II

02-21-2023 5:13:09 AM

1471 Views
4 replies
2 kudos

Delta Live Tables: bulk import of historical data?

Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...

Data Engineering

1471 Views
4 replies
2 kudos

02-21-2023 5:13:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-21-2023 11:31:20 PM

2 kudos

Hi @Sarah Guido Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

2 kudos

04-21-2023 11:31:20 PM

3 More Replies

by NWIEFInance • New Contributor

07-25-2023 9:45:28 AM

449 Views
1 replies
2 kudos

Connect to EXCEL

> I have hard time connecting to Excel, any help connecting Data Bricks to EXCEL

Data Engineering

449 Views
1 replies
2 kudos

07-25-2023 9:45:28 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-26-2023 2:38:15 AM

2 kudos

Hi @NWIEFInance, This article describes using the Databricks ODBC driver to connect Databricks to Microsoft Excel. After establishing the connection, you can access the data in Databricks from Excel. You can also use Excel to analyze the data further...

2 kudos

07-26-2023 2:38:15 AM

by Priyag1 • Honored Contributor II

05-05-2023 11:55:35 PM

957 Views
2 replies
11 kudos

Query parameters in dashboardsQueries can optionally leverage parameters or static values. When a visualization based on a parameterized query is adde...

Query parameters in dashboardsQueries can optionally leverage parameters or static values. When a visualization based on a parameterized query is added to a dashboard, the visualization can either be configured to use a:Widget parameterWidget paramet...

Data Engineering

957 Views
2 replies
11 kudos

05-05-2023 11:55:35 PM

View Replies

Latest Reply

Natalie_NL
New Contributor II

07-26-2023 2:31:59 AM

11 kudos

Hi, I build a dashboard with dashboard parameters, it works pretty easy!The advantage of dashboard parameters is that you do not have to set a default (it can be: all). This is convenient when you need to filter on values that change every time the q...

11 kudos

07-26-2023 2:31:59 AM

1 More Replies

by The_raj • New Contributor

07-26-2023 12:42:33 AM

2449 Views
1 replies
2 kudos

Error while reading file <file path>. [DEFAULT_FILE_NOT_FOUND]

Hi,I have a workflow created where there are 5 notebooks in it. One of the notebooks is failing with below error. I have tried refreshing the table. Still facing the same issue. When I try to run the notebook manually, it works fine. Can someone plea...

Data Engineering

2449 Views
1 replies
2 kudos

07-26-2023 12:42:33 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-26-2023 12:50:02 AM

2 kudos

Hi @The_raj , The error message you are encountering indicates a failure during the execution of a Spark job on Databricks. Specifically, it seems that Task 736 in Stage 92.0 failed multiple times, and the most recent loss was due to a "DEFAULT_FILE...

2 kudos

07-26-2023 12:50:02 AM

by mickniz • Contributor

10-12-2022 8:31:27 AM

14319 Views
7 replies
18 kudos

cannot import name 'sql' from 'databricks'

I am working on Databricks version 10.4 premium cluster and while importing sql from databricks module I am getting below error. cannot import name 'sql' from 'databricks' (/databricks/python/lib/python3.8/site-packages/databricks/__init__.py).Trying...

Data Engineering

14319 Views
7 replies
18 kudos

10-12-2022 8:31:27 AM

View Replies

Latest Reply

wallystart
New Contributor II

07-25-2023 4:31:29 PM

18 kudos

I resolve the same error installing library from cluster interface (UI)

18 kudos

07-25-2023 4:31:29 PM

6 More Replies

by dvmentalmadess • Valued Contributor

06-28-2023 5:03:32 PM

1088 Views
3 replies
0 kudos

Ingestion Time Clustering on initial load

We are migrating our data into Databricks and I was looking at the recommendations for partitioning here: https://docs.databricks.com/tables/partitions.html. This recommends not specifying partitioning and allowing "Ingestion Time Partitioning" (ITP)...

Data Engineering

1088 Views
3 replies
0 kudos

06-28-2023 5:03:32 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-22-2023 9:08:50 PM

0 kudos

Hi @dvmentalmadess Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you. T...

0 kudos

07-22-2023 9:08:50 PM

2 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Schema Evolution from Kafka Source

Clusters do not start - bootstrap timeout

Connect to EXCEL

Custom docker image fails to initalize

Resolved! Dense rank possible bug

Not able to catch structured streaming exception

Running Command is often stuck on "Running Command..."

Resolved! Better Worker Node Core Utilisation

How to install cmdstanpy in dbx cluster

Delta Live Tables: bulk import of historical data?

Connect to EXCEL

Query parameters in dashboardsQueries can optionally leverage parameters or static values. When a visualization based on a parameterized query is adde...

Error while reading file <file path>. [DEFAULT_FILE_NOT_FOUND]

cannot import name 'sql' from 'databricks'

Ingestion Time Clustering on initial load

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...