Data Engineering

Forum Posts

Sorted by:

by ironv • Visitor

18m ago

2 Views
0 replies
0 kudos

using concurrent.futures for parallelization

Hi, trying to copy a table with billions of rows from an enterprise data source into my databricks table. To do this, I need to use a homegrown library which handles auth etc, runs the query and return a dataframe. I am partitioning the table using...

Data Engineering

2 Views
0 replies
0 kudos

18m ago

by philHarasz • New Contributor

5 hours ago

35 Views
1 replies
0 kudos

Writing a small pyspark dataframe to a table is taking a very long time

My experience with Databricks pyspark up to this point has always been to execute a SQL query against existing Databricks tables, then write the resulting pyspark dataframe into a new table. For the first time, I am now getting data via an API which ...

Data Engineering

35 Views
1 replies
0 kudos

5 hours ago

View Replies

Latest Reply

MariuszK
Contributor II

4 hours ago

0 kudos

Can you share the code? You need to remember that spark uses lazy evaluation so it can give you impression that code works fast and saving works slowly because a code is executed when you it hit an action.

0 kudos

4 hours ago

by akshay716 • Visitor

8 hours ago

45 Views
1 replies
0 kudos

How to create Service Principal and access APIs like clusters list without adding to admin group

I have created a Databricks Managed Service Principal and trying to access the APIs like clusters list, job lists pipelines but without adding it to admin group I am getting empty list in response. There are other ways to get clusters by adding polic...

Data Engineering

45 Views
1 replies
0 kudos

8 hours ago

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

4 hours ago

0 kudos

Hi @akshay716, You can assign specific permissions directly to the service principal without granting it broader admin access

0 kudos

4 hours ago

by tebodelpino1234 • Visitor

5 hours ago

30 Views
0 replies
0 kudos

can view allow_expectations_col in unit catalog

I am developing a dlt that manages expectations and it works correctly.but I need to see the columns__DROP_EXPECTATIONS_COL__MEETS_DROP_EXPECTATIONS__ALLOW_EXPECTATIONS_COLin the unified catalog, I can see them in the delta table that the dlt generat...

Data Engineering

30 Views
0 replies
0 kudos

5 hours ago

by Ramonrcn • New Contributor III

6 hours ago

30 Views
0 replies
0 kudos

Cant read/write tables with shared cluster

Hi!I have a pipeline that i cant execute sucessfully in a shared cluster. Basically i read a query from multiple sources on my databricks instance, including streaming tables (thats the reason i have to use a shared cluster).But when comes to the par...

Data Engineering

30 Views
0 replies
0 kudos

6 hours ago

by Sergio_Linares • Visitor

8 hours ago

30 Views
1 replies
0 kudos

When Sign in databricks partner-academy i can not see the courses

Dear partner academy team, I am writing to report an issue I am experiencing when trying to access the partner academy courses. Despite using my credentials, I am unable to view any of the courses. Could you please look into this and assist me in res...

Data Engineering

30 Views
1 replies
0 kudos

8 hours ago

View Replies

Latest Reply

Advika
Databricks Employee

8 hours ago

0 kudos

Hello @Sergio_Linares! Please file a ticket with the Databricks support team to get assistance with this issue.

0 kudos

8 hours ago

by DataEnginerrOO1 • Visitor

13 hours ago

93 Views
4 replies
0 kudos

Access for delta lake with serverless

I have an issue when trying to use the command display(dbutils.fs.ls("abfss://test@test.dfs.core.windows.net")). When I execute the command on my personal cluster, it works, and I can see the files. Before that, I set the following configurations:spa...

Data Engineering

93 Views
4 replies
0 kudos

13 hours ago

View Replies

Latest Reply

Rjdudley
Valued Contributor II

11 hours ago

0 kudos

Can your serverless compute access any storage in that storage account? Something else to check is if your NCC is configured correctly: Configure private connectivity from serverless compute - Azure Databricks | Microsoft Learn. However, if your se...

0 kudos

11 hours ago

3 More Replies

by KSB • Visitor

8 hours ago

25 Views
0 replies
0 kudos

databricks

Hi Team,Having excel file in sharepoint folder, and has to insert excel data into SQL table from databricks notebook . can i have clear steps on it. Dont have access to Azure Active Directory. can anyone gives solution without using AZURE Active Dir...

Data Engineering

25 Views
0 replies
0 kudos

8 hours ago

by BabakBastan • New Contributor

9 hours ago

20 Views
0 replies
0 kudos

Missing Delta-live-Table in hive-metastore catalog

Hi experts,I defined my delta table in an external location as following:%sqlCREATE OR REFRESH STREAMING TABLE pumpdata (Body string,EnqueuedTimeUtc string,SystemProperties string,_rescued_data string,Properties string)USING DELTALOCATION 'abfss://md...

Data Engineering

Delta Live Tables

20 Views
0 replies
0 kudos

9 hours ago

by mkEngineer • New Contributor III

9 hours ago

30 Views
2 replies
0 kudos

How to Version & Deploy Databricks Workflows with Azure DevOps (CI/CD)?

Hi everyone,I’m trying to set up versioning and CI/CD for my Databricks workflows using Azure DevOps and Git. While I’ve successfully versioned notebooks in a Git repo, I’m struggling with handling workflows (which define orchestration, dependencies,...

Data Engineering

30 Views
2 replies
0 kudos

9 hours ago

View Replies

Latest Reply

mkEngineer
New Contributor III

9 hours ago

0 kudos

As of now, my current approach is to manually copy/paste YAMLs across workspaces and version them using Git/Azure DevOps by saving them as DBFS files. The CD process is then handled using Databricks DBFS File Deployment by Data Thirst Ltd.While this ...

0 kudos

9 hours ago

1 More Replies

by BillBishop • New Contributor II

a week ago

69 Views
2 replies
0 kudos

DAB for_each_task python wheel fail

using python_wheel_wrapper experimental true allows me to use python_wheel_task on an older cluster.However, if I embed the python_wheel_task in a for_each_task it fails at runtime with: "Library installation failed for library due to user error. Er...

Data Engineering

69 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

a week ago

0 kudos

Hi @BillBishop, I will check on this internally as outcome does not seem to be correct. If possible, upgrade your cluster to DBR 14.1 or later. This would resolve the issue without relying on the experimental feature

0 kudos

a week ago

1 More Replies

by Yuppp • Visitor

10 hours ago

22 Views
0 replies
0 kudos

Need help with setting up ForEach task in Databricks

Hi everyone,I have a workflow involving two notebooks: Notebook A and Notebook B. At the end of Notebook A, we generate a variable number of files, let's call it N. I want to run Notebook B for each of these N files.I know Databricks has a Foreach ta...

Data Engineering

ForEach

Workflows

22 Views
0 replies
0 kudos

10 hours ago

by rushi29 • New Contributor III

08-02-2024 5:56:22 AM

1668 Views
5 replies
0 kudos

sparkContext in Runtime 15.3

Hello All, Our Azure databricks cluster is running under "Legacy Shared Compute" policy with 15.3 runtime. One of the python notebooks is used to connect to an Azure SQL database to read/insert data. The following snippet of code is responsible for r...

Data Engineering

1668 Views
5 replies
0 kudos

08-02-2024 5:56:22 AM

View Replies

Latest Reply

jayct
New Contributor II

11 hours ago

0 kudos

@rushi29 @GangsterI ended up implementing pyodbc with the mssql driver using init scripts.Spark context is no longer usable in shared compute so that was the only approach we could take.

0 kudos

11 hours ago

4 More Replies

by ewe • Visitor

11 hours ago

29 Views
0 replies
0 kudos

Databricks apps (streamlit) not able to install python libs

So, I have a databricks streamlit app that is not able to install any python lib defined on the requirements.txt.Issue is not specific to one lib, tried other ones but no python lib can be installed. Anyone with similar issue to help ? [2025-02-19 10...

Data Engineering

29 Views
0 replies
0 kudos

11 hours ago

by p_romm • Visitor

13 hours ago

34 Views
1 replies
0 kudos

Structured Streaming writeStream - Query is no longer active causes task to fail

Hi, I execute readStream/writeStream in workflow task. Write stream uses .trigger(availableNow=True) option. After writeStream I'm waiting query to finish with query.awaitTermination(). However from time to time, pipeline ends with "Query <id> is no ...

Data Engineering

34 Views
1 replies
0 kudos

13 hours ago

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

12 hours ago

0 kudos

Hello @p_romm - Are you using serverless compute?

0 kudos

12 hours ago

User

Count

1610

763

345

286

251

Databricks Community

Forum Posts

using concurrent.futures for parallelization

Writing a small pyspark dataframe to a table is taking a very long time

How to create Service Principal and access APIs like clusters list without adding to admin group

can view allow_expectations_col in unit catalog

Cant read/write tables with shared cluster

When Sign in databricks partner-academy i can not see the courses

Access for delta lake with serverless

databricks

Missing Delta-live-Table in hive-metastore catalog

How to Version & Deploy Databricks Workflows with Azure DevOps (CI/CD)?

DAB for_each_task python wheel fail

Need help with setting up ForEach task in Databricks

sparkContext in Runtime 15.3

Databricks apps (streamlit) not able to install python libs

Structured Streaming writeStream - Query is no longer active causes task to fail

Connect with Databricks Users in Your Area

databricks workspace import_dir not working withou...

Writing back from notebook to blob storage as sing...

Hostname not resolving using Spark JDBC

Error updating tables in DLT

How to develop with databricks connect smoothly?