Data Engineering

Forum Posts

Sorted by:

Start a conversation

by ZacayDaushin • New Contributor

01-24-2024 11:57:14 PM

3067 Views
3 replies
0 kudos

How to access system.access.table_lineage

I try to make a select from system.access.table_lineage but i dont have to see the tablewhat permission to i have

Data Engineering

3067 Views
3 replies
0 kudos

01-24-2024 11:57:14 PM

View Replies

Latest Reply

Nivethan_Venkat
Databricks MVP

03-10-2025 4:20:17 PM

0 kudos

Hi @ZacayDaushin,To query the table in system catalog, you need to have SELECT permission on top of the table to query and see the results.Best Regards,Nivethan V

0 kudos

03-10-2025 4:20:17 PM

2 More Replies

by smpa01 • Contributor

04-28-2025 11:31:18 AM

1253 Views
1 replies
1 kudos

Resolved! tbl name as paramater marker

I am getting an error here, when I do this//this works fine declare sqlStr = 'select col1 from catalog.schema.tbl LIMIT (?)'; declare arg1 = 500; EXECUTE IMMEDIATE sqlStr USING arg1; //this does not declare sqlStr = 'select col1 from (?) LIMIT (?)';...

Data Engineering

1253 Views
1 replies
1 kudos

04-28-2025 11:31:18 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

04-28-2025 12:30:13 PM

1 kudos

@smpa01 In SQL EXECUTE IMMEDIATE, you can only parameterize values, not identifiers like table names, column names, or database names.That is, placeholders (?) can only replace constant values, not object names (tables, schemas, columns, etc.).SELECT...

1 kudos

04-28-2025 12:30:13 PM

by p_romm • New Contributor III

02-19-2025 3:26:30 AM

1409 Views
4 replies
0 kudos

Structured Streaming writeStream - Query is no longer active causes task to fail

Hi, I execute readStream/writeStream in workflow task. Write stream uses .trigger(availableNow=True) option. After writeStream I'm waiting query to finish with query.awaitTermination(). However from time to time, pipeline ends with "Query <id> is no ...

Data Engineering

1409 Views
4 replies
0 kudos

02-19-2025 3:26:30 AM

View Replies

Latest Reply

cmathieu
New Contributor III

04-28-2025 9:15:18 AM

0 kudos

@Alberto_Umana this bug was apparently fixed a few months ago, but we're still facing the same issue on our end.

0 kudos

04-28-2025 9:15:18 AM

3 More Replies

by 397973 • New Contributor III

04-28-2025 6:57:12 AM

1410 Views
1 replies
1 kudos

Resolved! Several unavoidable for loops are slowing this PySpark code. Is it possible to improve it?

Hi. I have a PySpark notebook that takes 25 minutes to run as opposed to one minute in on-prem Linux + Pandas. How can I speed it up?It's not a volume issue. The input is around 30k rows. Output is the same because there's no filtering or aggregation...

Data Engineering

1410 Views
1 replies
1 kudos

04-28-2025 6:57:12 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

04-28-2025 9:10:55 AM

1 kudos

@397973 Spark is optimized for 100s of GB or millions of rows, NOT small in-memory lookups with heavy control flow (unless engineered carefully).That's why Pandas is much faster for your specific case now.Pre-load and Broadcast All MappingsInstead of...

1 kudos

04-28-2025 9:10:55 AM

by Lo • New Contributor II

04-27-2025 2:45:51 PM

1799 Views
1 replies
0 kudos

SocketTimeoutException when creating execution context in Databricks Community Edition

Hello,I’m experiencing an issue in Databricks Community Edition.When I try to run a notebook, I get this error:"Exception when creating execution context: java.net.SocketTimeoutException: connect Timeout"What I have tried:- Restarting the cluster- Ch...

Data Engineering

1799 Views
1 replies
0 kudos

04-27-2025 2:45:51 PM

View Replies

Latest Reply

Advika
Community Manager

04-28-2025 4:17:00 AM

0 kudos

Hello @Lo! There is a similar thread where another user encountered the same issue and shared a solution that worked for them. I suggest reviewing that thread to see if the solution is helpful in your case as well.

0 kudos

04-28-2025 4:17:00 AM

by vidya_kothavale • Contributor

04-26-2025 11:48:50 AM

1860 Views
1 replies
1 kudos

Issue reading Vertica table into Databricks - Numeric value out of range

I am trying to read a Vertica table into a Spark DataFrame using JDBC in Databricks.Here is my sample code:hostname = ""username = ""password = ""database_port = ""database_name = ""qry_col_level = f"""SELECT * FROM analytics_DS.ansh_units_cum_dash""...

Data Engineering

1860 Views
1 replies
1 kudos

04-26-2025 11:48:50 AM

View Replies

Latest Reply

Renu_
Valued Contributor II

04-28-2025 3:21:39 AM

1 kudos

Hi @vidya_kothavale, based on my research and understanding, Databricks and Spark's JDBC connectors currently don’t offer an automatic way to truncate or round high precision decimal values when loading data. To handle this, you would need to either:...

1 kudos

04-28-2025 3:21:39 AM

by kweks970 • New Contributor

04-25-2025 12:57:36 PM

2842 Views
1 replies
0 kudos

DEV and PROD

"SELECT * FROM' data call on my table in PROD is giving all the rows of data (historical data), but a call on my table in DEV is giving me just one row of data (current one row of historical data). what could be the problem??

Data Engineering

2842 Views
1 replies
0 kudos

04-25-2025 12:57:36 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

04-25-2025 2:51:15 PM

0 kudos

Please don't cross post. Thanks, Louis.

0 kudos

04-25-2025 2:51:15 PM

by AlexMc • New Contributor III

04-24-2025 9:59:26 AM

1838 Views
6 replies
1 kudos

Resolved! GET /api/2.2/jobs/list Ordering

Hi there!I am calling the job list API (via the Python SDK):GET /api/2.2/jobs/listdocs.databricks.com/api/workspace/jobs/listDoes anyone know what ordering is applied / calculated for the list of jobs? Is it consistent or random?Is it by creation tim...

Data Engineering

1838 Views
6 replies
1 kudos

04-24-2025 9:59:26 AM

View Replies

Latest Reply

AlexMc
New Contributor III

04-25-2025 1:42:57 PM

1 kudos

Thanks both - this was very helpful!

1 kudos

04-25-2025 1:42:57 PM

5 More Replies

by Christian_C • New Contributor II

04-17-2025 8:47:05 AM

2264 Views
7 replies
0 kudos

Google Pub Sub and Delta live table

I am using delta live table and pub sub to ingest message from 30 different topics in parallel. I noticed that initialization time can be very long around 15 minutes. Does someone knows how to reduced initialization time in dlt ? Thanks You

Data Engineering

2264 Views
7 replies
0 kudos

04-17-2025 8:47:05 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

04-23-2025 8:59:52 AM

0 kudos

Classic clusters can take up to seven minutes to be acquired, configured, and deployed, with most of this time spent waiting for the cloud service to allocate virtual machines. In contrast, serverless clusters typically start in under eight seconds. ...

0 kudos

04-23-2025 8:59:52 AM

6 More Replies

by BF7 • Contributor

04-25-2025 6:58:18 AM

1472 Views
3 replies
2 kudos

Resolved! How can we get AutoLoader to detect a file footer?

We are dealing with CSVs that have footers in them. When we have an empty file, the presence of this footer seems to impair the schema inferencing of AutoLoader, because of the footer.I know where is a header = true parameter, but I don't see a foote...

Data Engineering

1472 Views
3 replies
2 kudos

04-25-2025 6:58:18 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

04-25-2025 8:33:24 AM

2 kudos

To be clear, when you say footer are you referring to the last row of the tuple? e.g. Header = row 1, Footer = row_last.

2 kudos

04-25-2025 8:33:24 AM

2 More Replies

by Yuki • Contributor

04-24-2025 9:55:59 PM

1125 Views
2 replies
1 kudos

Resolved! Can we implement Unity Catalog table lifecycle?

I want to delete tables that haven't been selected or otherwise accessed for several months.I can see the Delta table history, but I can only catch the DDL or update/insert/delete and can't catch "select".I realized that the Unity Catalog insight, ht...

Data Engineering

1125 Views
2 replies
1 kudos

04-24-2025 9:55:59 PM

View Replies

Latest Reply

Yuki
Contributor

04-25-2025 8:21:28 AM

1 kudos

Hi @Renu_ ,I appreciate for your clear response. I now have a better understanding and will work with our admin team to develop a strategy.Thank you.

1 kudos

04-25-2025 8:21:28 AM

1 More Replies

by Bart_DE • New Contributor II

04-24-2025 3:59:40 AM

1911 Views
2 replies
0 kudos

Resolved! Concurency behavior with merge operations

Hi community,I have this case right now in project where i have to develop a solution that will prevent duplicate data from being ingested twice to delta lake. Some of our data suppliers at a rare occurence are sending us the same dataset in two diff...

Data Engineering

1911 Views
2 replies
0 kudos

04-24-2025 3:59:40 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

04-24-2025 5:50:05 AM

0 kudos

Your idea of using a log table to track processed ingestions and leveraging a MERGE operation in your pipeline is a sound approach for preventing duplicate data ingestion into Delta Lake. Delta Lake's ACID transactions and its support for concurrency...

0 kudos

04-24-2025 5:50:05 AM

1 More Replies

by Anonymous • Not applicable

06-25-2021 1:56:40 PM

3186 Views
2 replies
0 kudos

DBFS Permissions

if there is permission control on the folder/file level in DBFS.e.g. if a team member uploads a file to /Filestore/Tables/TestData/testfile, could we mask permissions on TestData and/or testfile?

Data Engineering

3186 Views
2 replies
0 kudos

06-25-2021 1:56:40 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 2:23:59 PM

0 kudos

DBFS does not have ACL at this point

0 kudos

06-25-2021 2:23:59 PM

1 More Replies

by sahil3 • New Contributor

04-24-2025 11:44:13 PM

651 Views
1 replies
0 kudos

NOT ABLE TO ATTACH CLUSTRE

notebook detached-exception when creating execution context:java.until.concurrent.timeoutexceoption:timed out after 15 seconds

Data Engineering

651 Views
1 replies
0 kudos

04-24-2025 11:44:13 PM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

04-25-2025 12:31:21 AM

0 kudos

Hey @sahil3 Try detach and re-attaching the notebook to the notebook. Please note that this will clear the state of the notebook.if the issue persists, try restarting the cluster.Best,

0 kudos

04-25-2025 12:31:21 AM

by rak_haq • New Contributor III

04-23-2025 4:20:22 PM

2003 Views
3 replies
1 kudos

Resolved! How to use read_kafka() SQL with secret()?

Hi,I want to read data from the Azure Event Hub using SQL.Can someone please give me an executable example where you can also use the connection string from the event hub using the SQL function secret(), for example?This is what i tried but it Databr...

Data Engineering

azure

event_hub

kafka

sql

streaming

2003 Views
3 replies
1 kudos

04-23-2025 4:20:22 PM

View Replies

Latest Reply

rak_haq
New Contributor III

04-24-2025 2:55:06 PM

1 kudos

I found the solution und could successfully establish a connection to Event-Hub. SELECT cast(value as STRING) as raw_json, current_timestamp() as processing_time FROM read_kafka( bootstrapServers => '<YOUR EVENT-HUB NAMESPACE>.servicebus.windows.n...

1 kudos

04-24-2025 2:55:06 PM

2 More Replies

Databricks Community

Forum Posts

How to access system.access.table_lineage

Resolved! tbl name as paramater marker

Structured Streaming writeStream - Query is no longer active causes task to fail

Resolved! Several unavoidable for loops are slowing this PySpark code. Is it possible to improve it?

SocketTimeoutException when creating execution context in Databricks Community Edition

Issue reading Vertica table into Databricks - Numeric value out of range

DEV and PROD

Resolved! GET /api/2.2/jobs/list Ordering

Google Pub Sub and Delta live table

Resolved! How can we get AutoLoader to detect a file footer?

Resolved! Can we implement Unity Catalog table lifecycle?

Resolved! Concurency behavior with merge operations

DBFS Permissions

NOT ABLE TO ATTACH CLUSTRE

Resolved! How to use read_kafka() SQL with secret()?

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...