Data Engineering

Forum Posts

Sorted by:

by jerry-xu-sa • New Contributor II

03-06-2023 11:45:02 PM

3112 Views
2 replies
1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

Data Engineering

3112 Views
2 replies
1 kudos

03-06-2023 11:45:02 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:58:05 PM

1 kudos

Hi @Jerry Xu Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

1 kudos

03-31-2023 5:58:05 PM

1 More Replies

by wschoi • New Contributor III

03-07-2023 5:18:58 PM

3645 Views
4 replies
1 kudos

Resolved! How can I cluster-install a c-Python library (pyRFC)?

If possible, how can one go about installing a Python library with SDK dependencies like pyRFC? (https://github.com/SAP/PyRFC)The SDK dependencies depend on the type of OS, and since we're running Databricks out of AWS, I assume one would have to mat...

Data Engineering

3645 Views
4 replies
1 kudos

03-07-2023 5:18:58 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:57:48 PM

1 kudos

Hi @Wonseok Choi Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

1 kudos

03-31-2023 5:57:48 PM

3 More Replies

by ramz • New Contributor II

03-07-2023 12:40:33 AM

3707 Views
4 replies
1 kudos

High driver memory usage on loading parquet file

Hi, I am using pyspark and i am reading a bunch of parquet files and doing the count on each of them. Driver memory shoots up about 6G to 8G. My setup:I have a cluster of 1 driver node and 2 worker node (all of them 16 core 128 GB RAM). This is th...

Data Engineering

3707 Views
4 replies
1 kudos

03-07-2023 12:40:33 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:57:17 PM

1 kudos

Hi @ramz siva Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wi...

1 kudos

03-31-2023 5:57:17 PM

3 More Replies

by pepe • New Contributor II

03-09-2023 1:09:32 PM

9110 Views
2 replies
1 kudos

Why can't I install python libraries when i update cluster runtime from 10.1 to 12.1?

This same question was asked here 9 months ago without any answer:https://community.databricks.com/s/question/0D58Y000096VjKrSAK/managedlibraryinstallfailed-when-changing-databricks-runtime-version-from-91-to-110I was using runtime 9.1, and then upgr...

Data Engineering

9110 Views
2 replies
1 kudos

03-09-2023 1:09:32 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:54:18 PM

1 kudos

Hi @JOSE RODRIGUEZ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

1 kudos

03-31-2023 5:54:18 PM

1 More Replies

by Ondrej_Lostak • New Contributor

03-10-2023 1:23:09 AM

1633 Views
2 replies
0 kudos

Visulization only from sample of data

When I display dataframe and add visualization, I can see a preview from only a sample of data, and when I confirm it, it is counted from all of the data. Until now, everything is fine. However, when I change the dataframe, the visualization is incon...

Data Engineering

1633 Views
2 replies
0 kudos

03-10-2023 1:23:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:53:13 PM

0 kudos

Hi @Ondrej Lostak Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

0 kudos

03-31-2023 5:53:13 PM

1 More Replies

by thushar • Contributor

03-08-2023 11:57:42 PM

4130 Views
4 replies
0 kudos

Delta file partitions

Have one function to create files with partitions, in that the partitions are created based on metadata (getPartitionColumns) that we are keeping. In a table we have two columns that are mentioned as partition columns, say 'Team' and 'Speciality'. Wh...

Data Engineering

4130 Views
4 replies
0 kudos

03-08-2023 11:57:42 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:52:51 PM

0 kudos

Hi @Thushar R Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

0 kudos

03-31-2023 5:52:51 PM

3 More Replies

by sedat • New Contributor II

03-06-2023 4:07:41 PM

5405 Views
2 replies
0 kudos

Rust support (?) in databricks

Hi, for kafka streams and integration, I have seen some presentations and documents Rust is a good alternative to Spark. Is there a native support for RUST in databricks or what is best method to connect to kafka resources within Databricks.thanks fo...

Data Engineering

5405 Views
2 replies
0 kudos

03-06-2023 4:07:41 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:51:37 PM

0 kudos

Hi @Sedat EKSI Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

0 kudos

03-31-2023 5:51:37 PM

1 More Replies

by Anjum • New Contributor II

03-06-2023 9:30:17 PM

5501 Views
6 replies
1 kudos

PGP encryption and decryption using gnupg

Hi,We are using python-gnupg==0.4.8 package for encryption and decryption and this was working as expected when we are using Databricks runtime : 9.1 LTS but when we upgarded our runtime to 12.1, it stopped working with error "gnupghome should be a d...

Data Engineering

5501 Views
6 replies
1 kudos

03-06-2023 9:30:17 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:50:37 PM

1 kudos

Hi @Anjum Aara Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

1 kudos

03-31-2023 5:50:37 PM

5 More Replies

by Prasann_gupta • New Contributor

03-09-2023 10:40:52 PM

10647 Views
3 replies
0 kudos

SQL CONTAINS Function is not working on Databricks

I am trying to use sql CONTAINS function in my sql query but it is throwing the below error :AnalysisException: Undefined function: 'CONTAINS'. This function is neither a registered temporary function nor a permanent function registered in the databa...

Data Engineering

10647 Views
3 replies
0 kudos

03-09-2023 10:40:52 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:47:41 PM

0 kudos

Hi @Prasann Gupta Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

0 kudos

03-31-2023 5:47:41 PM

2 More Replies

by Abhradwip • New Contributor II

03-09-2023 2:29:34 AM

3895 Views
3 replies
0 kudos

How to create Delta Live table from Json files using Custom schema? I am getting the below error for the attached code # Error org.apache.spark.sql.AnalysisException: Table has a user-specified schema that is incompatible with the schema

#### Code# CodeImport DataTypefrom pyspark.sql.types import StructType, StructField, TimestampType, IntegerType, StringType, FloatType, BooleanType, LongType# Define Custom Schemacall_schema = StructType( [ StructField("RecordType", StringType(),...

Data Engineering

3895 Views
3 replies
0 kudos

03-09-2023 2:29:34 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:22:23 PM

0 kudos

Hi @Abhradwip Mukherjee Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from yo...

0 kudos

03-31-2023 5:22:23 PM

2 More Replies

by Siebert_Looije • Contributor

03-08-2023 4:15:35 AM

2136 Views
2 replies
0 kudos

How to fix 'An error occurred while rendering this editor' in github databricks?

How to fix the error 'An error occurred while rendering this editor.' in the github UI from databricks?Kind regards,Siebert Looije

Data Engineering

2136 Views
2 replies
0 kudos

03-08-2023 4:15:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:18:44 PM

0 kudos

Hi @Siebert Looije Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

0 kudos

03-31-2023 5:18:44 PM

1 More Replies

by najmead • Contributor

03-10-2023 3:30:27 AM

6417 Views
2 replies
1 kudos

Spark Settings in SQL Warehouse

I'm running a query, trying to parse a string into a map, and I get the following error;org.apache.spark.SparkRuntimeException: Duplicate map key was found, please check the input data. If you want to remove the duplicated keys, you can set "spark.s...

Data Engineering

6417 Views
2 replies
1 kudos

03-10-2023 3:30:27 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:15:38 PM

1 kudos

Hi @Nicholas Mead Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

1 kudos

03-31-2023 5:15:38 PM

1 More Replies

by Rob_79 • New Contributor II

03-09-2023 8:55:06 PM

2463 Views
2 replies
0 kudos

Is it possible for Databricks to automatically discover pii data from a dataset while processing?

Data Engineering

2463 Views
2 replies
0 kudos

03-09-2023 8:55:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:14:55 PM

0 kudos

Hi @Rabie Ash Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-31-2023 5:14:55 PM

1 More Replies

by ssy • New Contributor II

03-06-2023 3:33:06 PM

3513 Views
2 replies
0 kudos

How to configure pip file to include libraries from a proxy location

I need to configure pip file to include login credentials to allow for libraries to download from corporate artifactory. I'm trying to learn how to open a config file within databricks and add my credentials and package information. I will then have ...

Data Engineering

3513 Views
2 replies
0 kudos

03-06-2023 3:33:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:13:18 PM

0 kudos

Hi @Samy Syed Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-31-2023 5:13:18 PM

1 More Replies

by Jfoxyyc • Valued Contributor

03-06-2023 8:56:48 AM

2195 Views
2 replies
0 kudos

DLT - deduplication pattern?

Say we have an incremental append happening using autoloader, where filename is being added to the dataframe and that's all. If we want to de-duplicate this data in a rolling window, we can do something like merge into logs using dedupedLogs on ...

Data Engineering

2195 Views
2 replies
0 kudos

03-06-2023 8:56:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:10:31 PM

0 kudos

Hi @Jordan Fox Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-31-2023 5:10:31 PM

1 More Replies

User

Count

1611

768

348

286

252

Databricks Community

Forum Posts

Order of a dataframe is not perserved after calling cache() and limit()

Resolved! How can I cluster-install a c-Python library (pyRFC)?

High driver memory usage on loading parquet file

Why can't I install python libraries when i update cluster runtime from 10.1 to 12.1?

Visulization only from sample of data

Delta file partitions

Rust support (?) in databricks

PGP encryption and decryption using gnupg

SQL CONTAINS Function is not working on Databricks

How to create Delta Live table from Json files using Custom schema? I am getting the below error for the attached code # Error org.apache.spark.sql.AnalysisException: Table has a user-specified schema that is incompatible with the schema

How to fix 'An error occurred while rendering this editor' in github databricks?

Spark Settings in SQL Warehouse

Is it possible for Databricks to automatically discover pii data from a dataset while processing?

How to configure pip file to include libraries from a proxy location

DLT - deduplication pattern?

Join Us as a Local Community Builder!

Databricks External table row maximum size

DAB | Set tag based on job parameter

How can I use Terraform to assign an external loca...

global temp view issue

Dlt pipeline showing legacy , even though all thin...