Data Engineering

Forum Posts

Sorted by:

by addy • New Contributor III

03-04-2024 1:33:27 PM

752 Views
3 replies
2 kudos

Reading a table from a catalog that is in a different/external workspace

I am trying to read a table that is hosted on a different workspace. We have been told to establish a connection to said workspace using a table and consume the table.Code I am using isfrom databricks import sqlconnection = sql.connect(server_hostnam...

Data Engineering

catalog

Databricks

sql

752 Views
3 replies
2 kudos

03-04-2024 1:33:27 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-05-2024 9:52:45 PM

2 kudos

Hey there! Thanks a bunch for being part of our awesome community! We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

2 kudos

03-05-2024 9:52:45 PM

2 More Replies

by Data_Engineer3 • Contributor II

02-26-2024 8:46:33 AM

867 Views
3 replies
0 kudos

live spark driver log analysis

In databricks, if we want to see the live log of the exuction we can able to see it from the driver log page of the cluster.But in that we can't able to search by key word instead of that we need to download every one hour log file and live logs are ...

Data Engineering

867 Views
3 replies
0 kudos

02-26-2024 8:46:33 AM

View Replies

Latest Reply

Data_Engineer3
Contributor II

02-28-2024 2:35:01 AM

0 kudos

Hi @shan_chandra ,It is like we are putting our driver log into another cloud platform, But here I want to check the live log in local machine tools, is this possible?

0 kudos

02-28-2024 2:35:01 AM

2 More Replies

by akhileshp • New Contributor III

03-06-2024 11:18:25 PM

746 Views
6 replies
0 kudos

Query Serverless SQL Warehouse from Spark Submit Job

I am trying to load data from a table in SQL warehouse using spark.sql("SELECT * FROM <table>") in a spark submit job, but the job is failing with [TABLE_OR_VIEW_NOT_FOUND] The table or view . The same statement is working in notebook but not in a jo...

Data Engineering

746 Views
6 replies
0 kudos

03-06-2024 11:18:25 PM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

03-07-2024 6:19:29 AM

0 kudos

- when you query table manually and running job - do both those actions happens in same Databricks Workspace- what is job configuration - who is job Owner or Run As Account -> do this principal/persona has access to the table ?

0 kudos

03-07-2024 6:19:29 AM

5 More Replies

by User16826987838 • Contributor

06-23-2021 12:43:14 PM

1105 Views
2 replies
0 kudos

Convert pdf's is into structured data

Is there anything on Databricks to help read PDF (payment invoices and receipts for example) and convert it to structured data?

Data Engineering

1105 Views
2 replies
0 kudos

06-23-2021 12:43:14 PM

View Replies

Latest Reply

SoniaFoster
New Contributor II

03-07-2024 7:01:01 AM

0 kudos

Thanks! Converting PDF format is sometimes a difficult task as not all converters provide accuracy. I want to share with you one interesting tool I recently discovered that can make your work even more efficient. I recently came across an amazing onl...

0 kudos

03-07-2024 7:01:01 AM

1 More Replies

by Tam • New Contributor III

03-01-2024 5:41:30 AM

873 Views
4 replies
0 kudos

Resolved! Error on Starting Databricks SQL Warehouse Serverless with Instance Profile

I have two workspaces, one in us-west-2 and the other in ap-southeast-1. I have configured the same instance profile for both workspaces. I followed the documentation to set up the instance profile for Databricks SQL Warehouse Serverless by adding th...

Data Engineering

873 Views
4 replies
0 kudos

03-01-2024 5:41:30 AM

View Replies

Latest Reply

Ayushi_Suthar
Honored Contributor

03-03-2024 11:15:28 PM

0 kudos

Hi @Tam , Hope you are doing well! I checked the error in details and it would be because the Instance Profile Name and the Role ARN name don't match exactly. Please see points 3 and 4 here in the docs: https://docs.databricks.com/sql/admin/serverle...

0 kudos

03-03-2024 11:15:28 PM

3 More Replies

by laksh • New Contributor II

02-28-2023 4:23:14 PM

1441 Views
4 replies
3 kudos

What kind of data quality rules that can be run using unity catalog

We are trying to build data quality process for initial file level or data ingestion level for bronze and add more specific business times for silver and business related aggregates for golden layer.

Data Engineering

1441 Views
4 replies
3 kudos

02-28-2023 4:23:14 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-20-2023 10:05:08 PM

3 kudos

Hi @arun laksh Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

3 kudos

03-20-2023 10:05:08 PM

3 More Replies

by Stellar • New Contributor II

03-06-2024 12:14:44 AM

1267 Views
2 replies
0 kudos

CDC DLT

Hi all,I would appreciate some clarity regarding the DLT and CDC. So my first question would be, when it comes to the "source" table in the synta, is that CDC table or? Further, if we want to use only databricks, would mounting foreign catalog be a g...

Data Engineering

1267 Views
2 replies
0 kudos

03-06-2024 12:14:44 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2024 12:17:05 AM

0 kudos

Hi @Stellar, Let’s dive into your questions about Delta Live Tables (DLT) and Change Data Capture (CDC). CDC Implementation with Delta Live Tables (DLT): DLT simplifies CDC using the APPLY CHANGES API. Previously, the commonly used method was the...

0 kudos

03-07-2024 12:17:05 AM

1 More Replies

by DmitriyLamzin • New Contributor

01-09-2024 8:32:59 AM

2408 Views
2 replies
0 kudos

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...

Data Engineering

pandas udf

2408 Views
2 replies
0 kudos

01-09-2024 8:32:59 AM

View Replies

Latest Reply

julia
New Contributor II

03-07-2024 2:45:16 AM

0 kudos

We experienced similar issues and after an extensive back-and-forth with customer support from Azure and Databricks we gave up. Our current "solution" is to stick with version 12.2 LTS ML also for new projects until they maybe release a version where...

0 kudos

03-07-2024 2:45:16 AM

1 More Replies

by Avinash_Narala • New Contributor III

02-25-2024 10:24:56 PM

1425 Views
8 replies
1 kudos

Rewrite Notebooks Programatically

Hello,I want to refactor the notebook programatically. So, written the code as follows: import requestsimport base64# Databricks Workspace API URLsworkspace_url = f"{host}/api/2.0/workspace"export_url = f"{workspace_url}/export"import_url = f"{worksp...

Data Engineering

Notebook

1425 Views
8 replies
1 kudos

02-25-2024 10:24:56 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-06-2024 9:55:18 PM

1 kudos

Hi @Avinash_Narala, Let’s create a new notebook in your Databricks workspace using the modified JSON content you have. Below are the steps to achieve this programmatically: Create a New Notebook: To create a new notebook, you’ll need to use the D...

1 kudos

03-06-2024 9:55:18 PM

7 More Replies

by DatabricksDude • New Contributor

03-06-2024 10:49:44 AM

233 Views
1 replies
0 kudos

How to set a job trigger in a yml deployment asset bundle?

Working on an asset bundle/yml file to deploy a job and some notebooks. How to specify within the yml file a trigger to run the job when files arrive at a specified location? Thanks in advance!

Data Engineering

233 Views
1 replies
0 kudos

03-06-2024 10:49:44 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2024 1:20:17 AM

0 kudos

Hi @DatabricksDude, To trigger a job in your YAML file when specific files arrive at a specified location, you can use the following approaches based on the context of your deployment: GitLab CI/CD: If you’re using GitLab CI/CD, you can achieve t...

0 kudos

03-07-2024 1:20:17 AM

by kamilmuszynski • New Contributor

03-06-2024 3:14:32 AM

595 Views
1 replies
0 kudos

Asset Bundles - path is not contained in bundle root path

I'm trying to adopt a code base to use asset bundles. I was trying to come up with folder structure that will work for our bundles and came up with layout as below:common/ (source code)services/ (source code)dist/ (here artifacts from monorepo are bu...

Data Engineering

asset-bundles

595 Views
1 replies
0 kudos

03-06-2024 3:14:32 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2024 1:13:25 AM

0 kudos

Hi @kamilmuszynski, When working with Databricks Asset Bundles, there are specific rules and guidelines for structuring your configuration files. Let’s break down the key points to address your concerns: Bundle Configuration File (databricks.yml...

0 kudos

03-07-2024 1:13:25 AM

by valjas • New Contributor III

03-05-2024 11:39:41 PM

364 Views
1 replies
0 kudos

Warehouse Name in System Tables

Hello.I am creating a table to monitor the usage of All-purpose Compute and SQL Warehouses. From the tables in 'system' catalog, I can get cluster_name and cluster_id. However only warehouse_id is available and not warehouse name. Is there a way to g...

Data Engineering

364 Views
1 replies
0 kudos

03-05-2024 11:39:41 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2024 1:10:35 AM

0 kudos

Hi @valjas, To monitor and manage SQL warehouses in your Databricks workspace, you can utilize the warehouse events system table. This table records events related to warehouse activity, including when a warehouse starts, stops, scales up, or scales ...

0 kudos

03-07-2024 1:10:35 AM

by jim12321 • New Contributor II

03-06-2024 12:24:05 PM

505 Views
1 replies
0 kudos

Resolved! Foreign Catalog SQL Server Dynamic Port

When creating a Foreign Catalog SQL Server Connection, a port number is required. However, many sql servers have dynamic ports and the port number keeps changing. Is there a solution for this?In most common cases, it should allow instance name instea...

Data Engineering

Foreign Catalog

JDBC

505 Views
1 replies
0 kudos

03-06-2024 12:24:05 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2024 12:56:56 AM

0 kudos

Hi @jim12321, Dealing with dynamic ports in SQL Server connections can be tricky, but there are ways to address this challenge. Let’s explore a couple of options: Static Port Configuration: By default, SQL Server named instances are configured t...

0 kudos

03-07-2024 12:56:56 AM

by NT911 • New Contributor II

03-04-2024 1:06:26 AM

425 Views
2 replies
0 kudos

Databricks Error while executing this line of code

import geopandas as gpdfrom shapely.geometry import *Pd_csv_sel_pq_gg = gpd.GeoDataFrame(Points_csv_sel_pq_gg.toPandas(), geometry="geometry") Error is given below /databricks/spark/python/pyspark/sql/pandas/utils.py:37: DeprecationWarning: distutil...

Data Engineering

425 Views
2 replies
0 kudos

03-04-2024 1:06:26 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-05-2024 10:38:37 PM

0 kudos

Hi @NT911, Ensure that you have the correct versions of Spark, GeoPandas, and other relevant libraries installed. Sometimes compatibility issues arise due to outdated or incompatible dependencies.

0 kudos

03-05-2024 10:38:37 PM

1 More Replies

by Avinash_Narala • New Contributor III

02-26-2024 8:43:06 PM

698 Views
2 replies
0 kudos

create notebook programatically

Hello,I have json content of the notebook with me.Can I know is there a way to create notebook with that content using python?

Data Engineering

Notebook

698 Views
2 replies
0 kudos

02-26-2024 8:43:06 PM

View Replies

Latest Reply

Kaniz
Community Manager

02-28-2024 4:40:08 AM

0 kudos

Hi @Avinash_Narala , You can use Python to convert JSON content into a DataFrame in Databricks. To do this, you'll first convert the JSON content into a list of JSON strings, then parallelize the list to create an RDD, and finally use spark.read.jso...

0 kudos

02-28-2024 4:40:08 AM

1 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Reading a table from a catalog that is in a different/external workspace

live spark driver log analysis

Query Serverless SQL Warehouse from Spark Submit Job

Convert pdf's is into structured data

Resolved! Error on Starting Databricks SQL Warehouse Serverless with Instance Profile

What kind of data quality rules that can be run using unity catalog

CDC DLT

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Rewrite Notebooks Programatically

How to set a job trigger in a yml deployment asset bundle?

Asset Bundles - path is not contained in bundle root path

Warehouse Name in System Tables

Resolved! Foreign Catalog SQL Server Dynamic Port

Databricks Error while executing this line of code

create notebook programatically

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...