Data Engineering

Forum Posts

Sorted by:

by gabe123 • New Contributor

Friday

198 Views
1 replies
0 kudos

Strange Error with custom module in delta live table pipeline

The chunk of code in questionsys.path.append( spark.conf.get("util_path", "/Workspace/Repos/Production/loch-ness/utils/") ) from broker_utils import extract_day_with_suffix, proper_case_address_udf, proper_case_last_name_first_udf, proper_case_ud...

Data Engineering

198 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

4m ago

0 kudos

Hi @gabe123 , It seems like you’re encountering a ModuleNotFoundError when trying to import the broker_utils module in your Python code. Let’s troubleshoot this issue step by step: Check Module Location: First, ensure that the broker_utils.py fil...

0 kudos

4m ago

by lieber_augustin • New Contributor

Friday

97 Views
1 replies
0 kudos

Reading from one Postgres table result in several Scan JDBCRelation operations

Hello,I am working on a Spark job where I'm reading several tables from PostgreSQL into DataFrames as follows: df = (spark.read .format("postgresql") .option("query", query) .option("host", database_host) .option("port...

Data Engineering

97 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

5m ago

0 kudos

Hi @lieber_augustin, Optimizing the performance of your PostgreSQL queries involves several considerations. Let’s address both the potential optimizations and the reason behind multiple Scan JDBCRelation operations. Database Design: Properly des...

0 kudos

5m ago

by Husky • New Contributor III

02-08-2024 2:16:59 AM

1194 Views
4 replies
1 kudos

Resolved! Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Context:IDE: IntelliJ 2023.3.2Library: databricks-connect 13.3Python: 3.10Description:I develop notebooks and python scripts locally in the IDE and I connect to the spark cluster via databricks-connect for a better developer experience. I download a...

Data Engineering

1194 Views
4 replies
1 kudos

02-08-2024 2:16:59 AM

View Replies

Latest Reply

lathaniel
New Contributor III

04-02-2024 11:20:56 AM

1 kudos

Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):w ...

1 kudos

04-02-2024 11:20:56 AM

3 More Replies

by jainshasha • New Contributor

Monday

75 Views
4 replies
0 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

Data Engineering

75 Views
4 replies
0 kudos

Monday

View Replies

Latest Reply

jainshasha
New Contributor

29m ago

0 kudos

Hi @Kaniz Attaching the screenshots of 5 of the workflows which schedule at same time

0 kudos

29m ago

3 More Replies

by dbdude • New Contributor II

08-17-2023 4:01:48 PM

4509 Views
4 replies
0 kudos

AWS Secrets Works In One Cluster But Not Another

Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials

Data Engineering

4509 Views
4 replies
0 kudos

08-17-2023 4:01:48 PM

View Replies

Latest Reply

Husky
New Contributor III

44m ago

0 kudos

Hey @dbdude, I am facing the same error. Did you find a solution to access the AWS credentials on a Shared Cluster?@Kaniz The reason why fetching the AWS credentials on a Shared Cluster does not work is a limitation of the network and file system acc...

0 kudos

44m ago

3 More Replies

by madhumitha • Visitor

yesterday

42 Views
2 replies
0 kudos

Connect power bi desktop semantic model output to databricks

Hello, I am trying to connect the power bi semantic model output (basically the data that has already been pre processed) to databricks. Does anybody know how to do this? I would like it to be an automated process so I would like to know any way to p...

Data Engineering

42 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

58m ago

0 kudos

Hi @madhumitha, Connecting Power BI semantic model output to Databricks can be done in a few steps. Here are a couple of options: Databricks Power Query Connector: The new Databricks connector is natively integrated into Power BI. You can configu...

0 kudos

58m ago

1 More Replies

by Ulman • Visitor

yesterday

53 Views
3 replies
0 kudos

Switching to File Notification Mode with ADLS Gen2 - Encountering StorageException

Hello,We are currently utilizing an autoloader with file listing mode for a stream, which is experiencing significant latency due to the non-incremental naming of files in the directory—a condition that cannot be altered.In an effort to mitigate this...

Data Engineering

ADLS gen2

autoloader

file notification mode

53 Views
3 replies
0 kudos

yesterday

View Replies

Latest Reply

Wojciech_BUK
Contributor III

55m ago

0 kudos

Hi @Ulman ,i think that by default this method will try to create Event Grid and Storage Queue on the same Storage Account as your data.Please not that PREMIUM Blob Storage do not have QUEUE service.In my opinion the easiest way would be to create ma...

0 kudos

55m ago

2 More Replies

by RabahO • New Contributor II

an hour ago

7 Views
0 replies
0 kudos

Dashboard always display truncated data

Hello, we're working with a serverless SQL cluster to query Delta tables and display some analytics in dashboards. We have some basic group by queries that generate around 36k lines, and they are executed without the "limit" key word. So in the data ...

Data Engineering

7 Views
0 replies
0 kudos

an hour ago

by pragarwal • New Contributor II

3 hours ago

25 Views
1 replies
0 kudos

Adding Member to group using account databricks rest api

Hi All,I want to add a member to a group in databricks account level using rest api (https://docs.databricks.com/api/azure/account/accountgroups/patch) as mentioned in this link I could able to authenticate but not able to add member while using belo...

Data Engineering

25 Views
1 replies
0 kudos

3 hours ago

View Replies

Latest Reply

Kaniz
Community Manager

an hour ago

0 kudos

Hi @pragarwal, The body you’ve shared is almost correct. However, there’s a small issue. Instead of directly providing the email address as the value, you need to provide an object with the "value" field set to the email address. Here’s the correcte...

0 kudos

an hour ago

by jorperort • New Contributor

Thursday

199 Views
3 replies
0 kudos

[Databricks Assets Bundles] no deployment state

Good morning, I'm trying to run: databricks bundle run --debug -t dev integration_tests_job My bundle looks: bundle: name: x include: - ./resources/*.yml targets: dev: mode: development default: true workspace: host: x r...

Data Engineering

Databricks Assets Bundles

Deployment Error

pid=265687

199 Views
3 replies
0 kudos

Thursday

View Replies

Latest Reply

Kaniz
Community Manager

Friday

0 kudos

Hi @jorperort, The error message you’re seeing, “no deployment state. Did you forget to run ‘databricks bundle deploy’?”, indicates that the deployment state is missing. Here are some steps you can take to resolve this issue: Verify Deploym...

0 kudos

Friday

2 More Replies

by ashraf1395 • Visitor

2 hours ago

26 Views
1 replies
0 kudos

Optimising Clusters in Databricks on GCP

Hi there everyone,We are trying to get hands on Databricks Lakehouse for a prospective client's project.Our Major aim for the project is to Compare Datalakehosue on Databricks and Bigquery Datawarehouse in terms of Costs and time to setup and run que...

Data Engineering

26 Views
1 replies
0 kudos

2 hours ago

View Replies

Latest Reply

Kaniz
Community Manager

an hour ago

0 kudos

Hi @ashraf1395, Comparing Databricks Lakehouse and Google BigQuery is essential to make an informed decision for your project. Let’s address your questions: Cluster Configurations for Databricks: Databricks provide flexibility in configuring com...

0 kudos

an hour ago

by mamiya • New Contributor

Thursday

114 Views
2 replies
0 kudos

ODBC PowerBI 2 commands in one query

Hello everyone,I'm trying to use the ODBC DirectQuery option in PowerBI, but I keep getting an error about another command. The SQL query works while using the SQL Editor. Do I need to change the setup of my ODBC connector?DECLARE dateFrom DATE = DA...

Data Engineering

114 Views
2 replies
0 kudos

Thursday

View Replies

Latest Reply

Kaniz
Community Manager

Friday

0 kudos

Hi @mamiya , Here are a few steps you can take to address the error: Check Power Query Editor Steps: The error might be related to a specific step in the Power Query Editor. Try opening the Power Query Editor and reviewing the steps. If there’s a...

0 kudos

Friday

1 More Replies

by Deepak_Kandpal • New Contributor III

09-27-2022 1:21:37 AM

8182 Views
4 replies
3 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key with com.crealytics:spark-excel

I have setup my Databricks notebook to use Service Principal to access ADLS using below configuration.service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>") spark.conf.set("fs.azure.account.auth.type.<storage-accou...

Data Engineering

8182 Views
4 replies
3 kudos

09-27-2022 1:21:37 AM

View Replies

Latest Reply

Harsha_Dbrs
Visitor

yesterday

3 kudos

Below is the implementation of same code in scala:spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.<accountName>.dfs.core.windows.net",<accountKey>)

3 kudos

yesterday

3 More Replies

by Ameshj • New Contributor

Wednesday

278 Views
6 replies
0 kudos

Dbfs init script migration

I need help with migrating from dbfs on databricks to workspace. I am new to databricks and am struggling with what is on the links provided.My workspace.yml also has dbfs hard-coded. Included is a full deployment with great expectations.This was don...

Data Engineering

Azure Databricks

dbfs

Great expectations

python

278 Views
6 replies
0 kudos

Wednesday

View Replies

Latest Reply

NandiniN
Valued Contributor II

yesterday

0 kudos

Hi @Ameshj , Sorry for the delay in the response. For the all_df screenshot - how are you creating that df? Does it contain Tablename? How is it related to init script migration? Kindly add set -x after the first line, and enable cluster logs to DBFS...

0 kudos

yesterday

5 More Replies

by prats33 • New Contributor

yesterday

60 Views
1 replies
0 kudos

schedule job termination

Hi i want to terminate my databricks job daily at 11.59am, how can i achieve this in databricks

Data Engineering

60 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

yesterday

0 kudos

Hi @prats33 You can use databricks cluster API for terminate your cluster at any specific time, create notebook for API and schedule it as databricks workflow job on job cluster at 11:59.

0 kudos

yesterday

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Strange Error with custom module in delta live table pipeline

Reading from one Postgres table result in several Scan JDBCRelation operations

Resolved! Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Job Cluster in Databricks workflow

AWS Secrets Works In One Cluster But Not Another

Connect power bi desktop semantic model output to databricks

Switching to File Notification Mode with ADLS Gen2 - Encountering StorageException

Dashboard always display truncated data

Adding Member to group using account databricks rest api

[Databricks Assets Bundles] no deployment state

Optimising Clusters in Databricks on GCP

ODBC PowerBI 2 commands in one query

Resolved! Invalid configuration value detected for fs.azure.account.key with com.crealytics:spark-excel

Dbfs init script migration

schedule job termination

Upload file from local file system to Unity Catalo...

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...