Data Engineering

Forum Posts

Sorted by:

by 117074 • New Contributor III

03-08-2024 6:01:50 AM

759 Views
3 replies
0 kudos

Notebook Visualisations suddenly not working

Hi all,I have a python script which runs SQL code against our Delta Live Tables and returns a pandas dataframe. I do this multiple times and then use 'display(pandas_dataframe)'. Once this displays I then create a visualization from the UI which is t...

Data Engineering

759 Views
3 replies
0 kudos

03-08-2024 6:01:50 AM

View Replies

Latest Reply

117074
New Contributor III

03-08-2024 11:45:17 AM

0 kudos

Thank you for the detailed response Kaniz, I appreciate it! I do think it may have been cache issues due to there being no spark computation when running them when the error occured.It did lead me down a train of thought.. is it possible to extract t...

0 kudos

03-08-2024 11:45:17 AM

2 More Replies

by RaccoonRadio • New Contributor

08-31-2023 2:16:51 AM

2388 Views
2 replies
0 kudos

Resolved! Difference @dlt.table and @dlt.create_table decorator

Hi!I'm currently trying to stream data from an Azure Event Hub (kafka) using DLT. The provided example (https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/event-hubs) works well.I saw in different examples the usage of two different...

Data Engineering

2388 Views
2 replies
0 kudos

08-31-2023 2:16:51 AM

View Replies

Latest Reply

Kaniz
Community Manager

08-31-2023 3:39:02 AM

0 kudos

Hi @RaccoonRadio , In Databricks Delta Live Tables (DLT), both @dlt.table and @dlt.create_table decorators are used, but they serve slightly different purposes. Here's the distinction: @dlt.table: This decorator is used to define a Delta ...

0 kudos

08-31-2023 3:39:02 AM

1 More Replies

by Coders • New Contributor II

03-07-2024 11:19:02 AM

962 Views
4 replies
0 kudos

Feedback on the data quality and consistency checks in Spark

I'm seeking validation from experts regarding the data quality and consistency checks we're implementing as part of a data migration using Spark and Databricks.Our migration involves transferring data from Azure Data Lake to a different data lake. As...

Data Engineering

962 Views
4 replies
0 kudos

03-07-2024 11:19:02 AM

View Replies

Latest Reply

Coders
New Contributor II

03-08-2024 8:58:41 AM

0 kudos

Hi,Thank you for the response. When we say we are copying, it's a data migration from one data lake to another. Not performing any kind of DDL or DML queries using spark SQL on top it. It's a straightforward merge from one data lake to another using ...

0 kudos

03-08-2024 8:58:41 AM

3 More Replies

by AshR • New Contributor III

03-08-2024 12:34:58 AM

518 Views
2 replies
0 kudos

Lakehouse Fundamentals Certificate/Badge not received/appearing.

Hello @Nadia1 , Ref : request (00445798) I raise a ticket for Lakehouse Fundamentals Certificate/Badge not appearing.I got a response that an email has been sent.Still, I have not received badge email, also its not reflecting in academy page (ht...

Data Engineering

518 Views
2 replies
0 kudos

03-08-2024 12:34:58 AM

View Replies

Latest Reply

Nadia1
Honored Contributor

03-08-2024 8:00:45 AM

0 kudos

Hello, Please submit a support ticket and our team will help you asap. Thank you

0 kudos

03-08-2024 8:00:45 AM

1 More Replies

by MichTalebzadeh • Contributor

03-07-2024 4:24:42 PM

554 Views
2 replies
1 kudos

Working with a text file that is both compressed by bz2 followed by zip in PySpark

I have downloaded Am azon reviews for sentiment analysis from here. The file is not particularly large (just over 500MB) but comes in the following formattest.ft.txt.bz2.zipSo it is a text file that is compressed by bz2 followed by zip. Now I like t...

Data Engineering

bz2

pyspark

zip

554 Views
2 replies
1 kudos

03-07-2024 4:24:42 PM

View Replies

Latest Reply

MichTalebzadeh
Contributor

03-08-2024 4:08:25 AM

1 kudos

Thanks for your reply @Kaniz On the face of it spark can handle both .bz2 and .zip . It practice it does not work with both at the same time. You end up with ineligible characters as text. I suspect it handles decompression of outer layer (in this ca...

1 kudos

03-08-2024 4:08:25 AM

1 More Replies

by spark_user1 • New Contributor

03-07-2024 10:45:39 PM

208 Views
1 replies
0 kudos

Whitelisting GraphFrame Jar files does not work for shared compute.

Hello,I'm encountering a Py4JSecurityException while using the GraphFrames jar library in a job task with shared compute. Despite following all documentation to whitelist my jar libraries in Volumes and ensuring compatibility with my Spark and Scala ...

Data Engineering

208 Views
1 replies
0 kudos

03-07-2024 10:45:39 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-08-2024 3:14:57 AM

0 kudos

Hi @spark_user1, I understand that you’re facing a Py4JSecurityException while working with the GraphFrames jar library in a job task with shared compute. Let’s tackle this issue step by step: Whitelisting JAR Libraries: You mentioned that you’v...

0 kudos

03-08-2024 3:14:57 AM

by Miro_ta • New Contributor III

11-04-2023 10:58:19 AM

6389 Views
9 replies
4 kudos

Resolved! Can't query delta tables, token missing required scope

Hello,I've correctly set up a stream from kinesis, but I can't read anything from my delta tableI'm actually reproducing the demo from Frank Munz: https://github.com/fmunz/delta-live-tables-notebooks/tree/main/motion-demoand I'm running the following...

Data Engineering

6389 Views
9 replies
4 kudos

11-04-2023 10:58:19 AM

View Replies

Latest Reply

holly
New Contributor III

03-08-2024 2:36:52 AM

4 kudos

Hello, I also had this issue. It was because I was trying to read a DLT table with a Machine Learning Runtime. At time of writing, Machine Learning Runtimes are not compatible with shared access mode, so I ended up setting up two clusters, one MLR as...

4 kudos

03-08-2024 2:36:52 AM

8 More Replies

by SamDataWalk • New Contributor III

03-03-2024 9:28:08 AM

679 Views
3 replies
1 kudos

Resolved! Databricks bug with show tblproperties - redacted - Azure databricks

I am struggling to report what is a fairly fundamental bug. Can anyone help? Ideally someone from Databricks themselves. Or others who can confirm they can replicate it.There is a bug where databricks seems to be hiding “any” properties which have th...

Data Engineering

679 Views
3 replies
1 kudos

03-03-2024 9:28:08 AM

View Replies

Latest Reply

SamDataWalk
New Contributor III

03-07-2024 11:55:47 PM

1 kudos

I managed to get a response back from support at databricks.Admittedly it is a bit nuclear, but there is a way of switching it off.spark.conf.set("spark.databricks.behaviorChange.SC102534CommandRedactProperties.enabled", False)So, I have managed to u...

1 kudos

03-07-2024 11:55:47 PM

2 More Replies

by Ela • New Contributor III

02-08-2023 9:44:48 PM

639 Views
1 replies
1 kudos

Checking for availability of dynamic data masking functionality in SQL.

I am looking forward for functionality similar to snowflake which allows attaching masking to a existing column. Documents found related to masking with encryption but my use case is on the existing table. Solutions using views along with Dynamic Vie...

Data Engineering

639 Views
1 replies
1 kudos

02-08-2023 9:44:48 PM

View Replies

Latest Reply

sivankumar86
New Contributor II

03-07-2024 3:26:31 PM

1 kudos

Unity catalog provide similar feature https://docs.databricks.com/en/data-governance/unity-catalog/row-and-column-filters.html

1 kudos

03-07-2024 3:26:31 PM

by thethirtyfour • New Contributor III

02-19-2024 1:28:05 PM

815 Views
3 replies
1 kudos

Resolved! error installing the igraph and networkD3 library

Hi!I am trying to install the igraph and networkD3 CRAN packages for use within a notebook. However, I am receiving the below installation error when attempting to do so.Could someone please assist? Thank you!* installing *source* package ‘igraph’ .....

Data Engineering

815 Views
3 replies
1 kudos

02-19-2024 1:28:05 PM

View Replies

Latest Reply

haleight-dc
New Contributor III

03-07-2024 10:49:37 AM

1 kudos

Hi! I just figured this out myself. I'm not sure why this is suddenly occurring, since igraph has always loaded fine for me in databricks but didn't this week. I found that the following solution worked.In your notebook before installing your R libra...

1 kudos

03-07-2024 10:49:37 AM

2 More Replies

by 159312 • New Contributor III

07-07-2022 11:22:36 AM

905 Views
3 replies
2 kudos

How to write log entries from a Delta Live Table pipeline.

From a notebook I can import the log4j logger from cs and write to a log like so:log4jLogger = sc._jvm.org.apache.log4jLOGGER = log4jLogger.LogManager.getLogger(__name__)LOGGER.info("pyspark script logger initialized")But this does not work in a Delt...

Data Engineering

905 Views
3 replies
2 kudos

07-07-2022 11:22:36 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-11-2022 5:46:12 AM

2 kudos

Hi @Ben Bogart, The event log for each pipeline is stored in a Delta table in DBFS. You can view event log entries in the Delta Live Tables user interface, the Delta Live Tables API, or by directly querying the Delta table. This article focuses on q...

2 kudos

07-11-2022 5:46:12 AM

2 More Replies

by addy • New Contributor III

03-04-2024 1:33:27 PM

750 Views
3 replies
2 kudos

Reading a table from a catalog that is in a different/external workspace

I am trying to read a table that is hosted on a different workspace. We have been told to establish a connection to said workspace using a table and consume the table.Code I am using isfrom databricks import sqlconnection = sql.connect(server_hostnam...

Data Engineering

catalog

Databricks

sql

750 Views
3 replies
2 kudos

03-04-2024 1:33:27 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-05-2024 9:52:45 PM

2 kudos

Hey there! Thanks a bunch for being part of our awesome community! We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

2 kudos

03-05-2024 9:52:45 PM

2 More Replies

by Data_Engineer3 • Contributor II

02-26-2024 8:46:33 AM

861 Views
3 replies
0 kudos

live spark driver log analysis

In databricks, if we want to see the live log of the exuction we can able to see it from the driver log page of the cluster.But in that we can't able to search by key word instead of that we need to download every one hour log file and live logs are ...

Data Engineering

861 Views
3 replies
0 kudos

02-26-2024 8:46:33 AM

View Replies

Latest Reply

Data_Engineer3
Contributor II

02-28-2024 2:35:01 AM

0 kudos

Hi @shan_chandra ,It is like we are putting our driver log into another cloud platform, But here I want to check the live log in local machine tools, is this possible?

0 kudos

02-28-2024 2:35:01 AM

2 More Replies

by akhileshp • New Contributor III

03-06-2024 11:18:25 PM

741 Views
6 replies
0 kudos

Query Serverless SQL Warehouse from Spark Submit Job

I am trying to load data from a table in SQL warehouse using spark.sql("SELECT * FROM <table>") in a spark submit job, but the job is failing with [TABLE_OR_VIEW_NOT_FOUND] The table or view . The same statement is working in notebook but not in a jo...

Data Engineering

741 Views
6 replies
0 kudos

03-06-2024 11:18:25 PM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

03-07-2024 6:19:29 AM

0 kudos

- when you query table manually and running job - do both those actions happens in same Databricks Workspace- what is job configuration - who is job Owner or Run As Account -> do this principal/persona has access to the table ?

0 kudos

03-07-2024 6:19:29 AM

5 More Replies

by User16826987838 • Contributor

06-23-2021 12:43:14 PM

1103 Views
2 replies
0 kudos

Convert pdf's is into structured data

Is there anything on Databricks to help read PDF (payment invoices and receipts for example) and convert it to structured data?

Data Engineering

1103 Views
2 replies
0 kudos

06-23-2021 12:43:14 PM

View Replies

Latest Reply

SoniaFoster
New Contributor II

03-07-2024 7:01:01 AM

0 kudos

Thanks! Converting PDF format is sometimes a difficult task as not all converters provide accuracy. I want to share with you one interesting tool I recently discovered that can make your work even more efficient. I recently came across an amazing onl...

0 kudos

03-07-2024 7:01:01 AM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Notebook Visualisations suddenly not working

Resolved! Difference @dlt.table and @dlt.create_table decorator

Feedback on the data quality and consistency checks in Spark

Lakehouse Fundamentals Certificate/Badge not received/appearing.

Working with a text file that is both compressed by bz2 followed by zip in PySpark

Whitelisting GraphFrame Jar files does not work for shared compute.

Resolved! Can't query delta tables, token missing required scope

Resolved! Databricks bug with show tblproperties - redacted - Azure databricks

Checking for availability of dynamic data masking functionality in SQL.

Resolved! error installing the igraph and networkD3 library

How to write log entries from a Delta Live Table pipeline.

Reading a table from a catalog that is in a different/external workspace

live spark driver log analysis

Query Serverless SQL Warehouse from Spark Submit Job

Convert pdf's is into structured data

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...