Data Engineering

Forum Posts

Sorted by:

by wyzer • Contributor II

11-18-2022 8:25:08 AM

5764 Views
2 replies
12 kudos

Resolved! Add the creation date of a parquet file into a DataFrame

Currently I load multiple parquet file with this code:df = spark.read.parquet("/mnt/dev/bronze/Voucher/*/*")(Inside the Voucher folder, there is one folder by date. Each one containing one parquet file)How can I add a column into this DataFrame, that...

Data Engineering

5764 Views
2 replies
12 kudos

11-18-2022 8:25:08 AM

View Replies

Latest Reply

wyzer
Contributor II

11-18-2022 12:46:00 PM

12 kudos

Thanks @Michail Karamanos

12 kudos

11-18-2022 12:46:00 PM

1 More Replies

by jeffgreen813 • New Contributor

11-18-2022 10:21:03 AM

911 Views
0 replies
0 kudos

How are you managing your DLT pipelines to maintain graph readability?

I've been building out a few pipelines in DLT and noticed that the usefulness of the user interface has started breaking down at a glance. I've attached a screenshot of one of my pipelines. It's not very far along and it's already pretty rough. You c...

Data Engineering

911 Views
0 replies
0 kudos

11-18-2022 10:21:03 AM

by AnubhavG • Contributor

11-18-2022 2:18:47 AM

4061 Views
5 replies
18 kudos

Resolved! External KMS integration with Databricks like AWS KMS, Azure Key Vault.

I would like to know how we can integrate Databricks with External KMS providers, like currently it is doing with AWS KMS and Azure Key Valut?Can we import keys from any other KMS?

Data Engineering

4061 Views
5 replies
18 kudos

11-18-2022 2:18:47 AM

View Replies

Latest Reply

Vivian_Wilfred
Databricks Employee

11-18-2022 8:26:13 AM

18 kudos

@Anubhav Gupta Databricks is hosted on the cloud provider which means that all resources used by databricks in the backend are in the cloud. For instance, if you create a cluster, the VMs are launched in AWS as EC2 instances. So the integration of K...

18 kudos

11-18-2022 8:26:13 AM

4 More Replies

by rgb • New Contributor

11-18-2022 8:49:39 AM

987 Views
0 replies
0 kudos

Migration_pipeline.py failing to get default credentials

cat ~/.databrickscfg looks like this (with the correct token/host values in place of xxxxxx)[DEFAULT]host = xxxxxxtoken = xxxxxxjobs-api-version = 2.0The command I run to start the pipeline with default configured credentials is :sudo python3 migrati...

Data Engineering

987 Views
0 replies
0 kudos

11-18-2022 8:49:39 AM

by 693872 • New Contributor II

11-11-2022 5:36:38 PM

3302 Views
5 replies
2 kudos

Here I am getting this error when i execute left join on two data frame: PythonException: 'pyspark.serializers.SerializationError: Caused by Traceback (most recent call last): going to post full traceback:

I simply do left join on two data frame and both data frame content i was able to print.Here is the code looks like:-df_silver = spark.sql("select ds.PropertyID,\ ds.* from dfsilver as ds LEFT JOIN dfaddmaster as dm \ ...

Data Engineering

3302 Views
5 replies
2 kudos

11-11-2022 5:36:38 PM

View Replies

Latest Reply

Dooley
Valued Contributor II

11-18-2022 8:39:41 AM

2 kudos

Did that answer your question? Did it work?

2 kudos

11-18-2022 8:39:41 AM

4 More Replies

by jurbschat • New Contributor III

11-18-2022 8:28:43 AM

1285 Views
0 replies
6 kudos

Is Azure Database for MySQL - Flexible Server supported as external metastore.

In the docs it's mention that "if you use Azure Database for MySQL as an external metastore, you must change the value of the lower_case_table_names property from 1 (the default) to 2 in the server-side database configuration."However "lower_case_tab...

Data Engineering

1285 Views
0 replies
6 kudos

11-18-2022 8:28:43 AM

by marcus1 • New Contributor III

11-18-2022 8:23:46 AM

499 Views
0 replies
0 kudos

Why does databricks https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html#get-users take so long

I've been observing as we added more workspaces and users to those workspaces that fetching users per workspace is now taking 11 minutes or more.Our automation to provision group access is now unacceptably long. I've noted that the UI doesn't suffer...

Data Engineering

499 Views
0 replies
0 kudos

11-18-2022 8:23:46 AM

by J_M_W • Contributor

10-11-2022 3:26:13 AM

3510 Views
2 replies
5 kudos

Resolved! Databricks is automatically creating a _apply_changes_storage table in the database when using apply_changes for Delta Live Tables

Hi there,I am using apply_changes (aka. Delta Live Tables Change Data Capture) and it works fine. However, it seems to automatically create a secondary table in the database metastore called _apply_storage_changes_{tableName}So for every table I use ...

Data Engineering

3510 Views
2 replies
5 kudos

10-11-2022 3:26:13 AM

View Replies

Latest Reply

J_M_W
Contributor

11-18-2022 6:56:48 AM

5 kudos

Hi - Thanks @Hubert Dudek I will look into disabling access for the users!

5 kudos

11-18-2022 6:56:48 AM

1 More Replies

by berserkersap • Contributor

08-13-2022 12:32:58 PM

10710 Views
1 replies
0 kudos

How to deal with Decimal data type arithmetic operations ?

I am dealing with values ranging from 10^9 to 10^-9 , the sum of values can go up to 10^20 and need accuracy. So I wanted to use Decimal Data type [ Using SQL in Data Science & Engineering workspace]. However, I got to know the peculiar behavior of D...

Data Engineering

10710 Views
1 replies
0 kudos

08-13-2022 12:32:58 PM

View Replies

Latest Reply

berserkersap
Contributor

11-18-2022 6:18:59 AM

0 kudos

Hello Everyone,I understand that there is no best answer for this question. So, I could only do the same thing I found when I surfed the net.The method I found works whenIf you know the range of values you deal with (not just the input data but also ...

0 kudos

11-18-2022 6:18:59 AM

by 190809 • Contributor

11-18-2022 4:51:47 AM

1440 Views
2 replies
0 kudos

Invalid port error when trying to read from PlanetScale MySQL databse

Using the code below I am attempting to connect to a PlanetScale MySQL database. I get the following error: java.sql.SQLException: error parsing url : Incorrect port value. However the port is the default 3306, and I have used the correct url based o...

Data Engineering

1440 Views
2 replies
0 kudos

11-18-2022 4:51:47 AM

View Replies

Latest Reply

Pat
Honored Contributor III

11-18-2022 5:15:26 AM

0 kudos

HI @Rachel Cunningham ,maybe you can share your `driver` and `url` value (masked)?

0 kudos

11-18-2022 5:15:26 AM

1 More Replies

by eques_99 • New Contributor II

11-17-2022 11:11:15 AM

1755 Views
2 replies
0 kudos

Remove a category (slice) from a Pie Chart

I added a grand total row to a "Count" in SQL, which I needed for some counter visualisations. I used the "ROLL UP" command to get the grand total.However, I have a pie chart which references the same count, and so the grand total row has been added...

Data Engineering

1755 Views
2 replies
0 kudos

11-17-2022 11:11:15 AM

View Replies

Latest Reply

eques_99
New Contributor II

11-18-2022 1:32:14 AM

0 kudos

hi, as per the picture above, the slice disappears but the name ("null" in this case) remains on the legend.

0 kudos

11-18-2022 1:32:14 AM

1 More Replies

by Jayanth746 • New Contributor III

11-17-2022 9:36:53 AM

5324 Views
2 replies
2 kudos

Databricks <-> Kafka - SSL handshake failed

I am receiving SSL handshake error even though the trust-store I have created is based on server certificate and the fingerprint in the certificate matches the trust-store fingerprint.kafkashaded.org.apache.kafka.common.errors.SslAuthenticationExcept...

Data Engineering

5324 Views
2 replies
2 kudos

11-17-2022 9:36:53 AM

View Replies

Latest Reply

Debayan
Databricks Employee

11-17-2022 11:18:44 PM

2 kudos

Hi @Jayanth Goulla , worth a try ,https://stackoverflow.com/questions/54903381/kafka-failed-authentication-due-to-ssl-handshake-failedDid you follow: https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/kafka?

2 kudos

11-17-2022 11:18:44 PM

1 More Replies

by elgeo • Valued Contributor II

11-16-2022 2:27:20 AM

1912 Views
1 replies
2 kudos

Disable auto-complete (tab button)

Hello. How could we disable autocomplete that appears with tab button? Thank you

Data Engineering

1912 Views
1 replies
2 kudos

11-16-2022 2:27:20 AM

View Replies

Latest Reply

elgeo
Valued Contributor II

11-18-2022 12:30:22 AM

2 kudos

Thank you @Kaniz Fatma

2 kudos

11-18-2022 12:30:22 AM

by vs_29 • New Contributor II

11-16-2022 12:31:32 AM

3010 Views
1 replies
3 kudos

Custom Log4j logs are not being written to the DBFS storage.

I used custom Log4j appender to write the custom logs through the init script and I can see the Custom Log file on the Driver logs but Databricks is not writing those custom logs to the DBFS. I have configured Logging Destination in the Advanced sec...

Data Engineering

3010 Views
1 replies
3 kudos

11-16-2022 12:31:32 AM

View Replies

Latest Reply

Debayan
Databricks Employee

11-17-2022 11:39:13 PM

3 kudos

Hi @VIjeet Sharma , Do you receive any error? This can be an issue using DBFS mount point /dbfs in an init script: the DBFS mount point is installed asynchronously, so at the very beginning of init script execution, that mount point might not be ava...

3 kudos

11-17-2022 11:39:13 PM

by sharonbjehome • New Contributor

11-16-2022 4:17:29 AM

1688 Views
1 replies
1 kudos

Structered Streamin from MongoDB Atlas not parsing JSON correctly

HI all,I have a table in MongoDB Atlas that I am trying to read continuously to memory and then will write that file out eventually. However, when I look at the in-memory table it doesn't have the correct schema.Code here:from pyspark.sql.types impo...

Data Engineering

1688 Views
1 replies
1 kudos

11-16-2022 4:17:29 AM

View Replies

Latest Reply

Debayan
Databricks Employee

11-17-2022 11:36:04 PM

1 kudos

Hi @sharonbjehome , This has to be checked thoroughly via a support ticket, did you follow: https://docs.databricks.com/external-data/mongodb.html Also, could you please check with mongodb support, Was this working before?

1 kudos

11-17-2022 11:36:04 PM

User

Count

1611

768

345

286

252

Databricks Community

Forum Posts

Resolved! Add the creation date of a parquet file into a DataFrame

How are you managing your DLT pipelines to maintain graph readability?

Resolved! External KMS integration with Databricks like AWS KMS, Azure Key Vault.

Migration_pipeline.py failing to get default credentials

Here I am getting this error when i execute left join on two data frame: PythonException: 'pyspark.serializers.SerializationError: Caused by Traceback (most recent call last): going to post full traceback:

Is Azure Database for MySQL - Flexible Server supported as external metastore.

Why does databricks https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html#get-users take so long

Resolved! Databricks is automatically creating a _apply_changes_storage table in the database when using apply_changes for Delta Live Tables

How to deal with Decimal data type arithmetic operations ?

Invalid port error when trying to read from PlanetScale MySQL databse

Remove a category (slice) from a Pie Chart

Databricks <-> Kafka - SSL handshake failed

Disable auto-complete (tab button)

Custom Log4j logs are not being written to the DBFS storage.

Structered Streamin from MongoDB Atlas not parsing JSON correctly

Join Us as a Local Community Builder!

Issue while reading external iceberg table from GC...

What's the best way to get from Python dict > JSON...

[INTERNAL_ERROR] The Spark SQL phase analysis fail...

Unable to see All purpose compute

How to schedule workflow in python script