cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jerry-xu-sa
by New Contributor II
  • 3112 Views
  • 2 replies
  • 1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

  • 3112 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jerry Xu​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

  • 1 kudos
1 More Replies
wschoi
by New Contributor III
  • 3645 Views
  • 4 replies
  • 1 kudos

Resolved! How can I cluster-install a c-Python library (pyRFC)?

If possible, how can one go about installing a Python library with SDK dependencies like pyRFC? (https://github.com/SAP/PyRFC)The SDK dependencies depend on the type of OS, and since we're running Databricks out of AWS, I assume one would have to mat...

  • 3645 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Wonseok Choi​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

  • 1 kudos
3 More Replies
ramz
by New Contributor II
  • 3707 Views
  • 4 replies
  • 1 kudos

High driver memory usage on loading parquet file

Hi, I am using pyspark and i am reading a bunch of parquet files and doing the count on each of them. Driver memory shoots up about 6G to 8G. My setup:I have a cluster of 1 driver node and 2 worker node (all of them 16 core 128 GB RAM). This is th...

  • 3707 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @ramz siva​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wi...

  • 1 kudos
3 More Replies
pepe
by New Contributor II
  • 9110 Views
  • 2 replies
  • 1 kudos

Why can't I install python libraries when i update cluster runtime from 10.1 to 12.1?

This same question was asked here 9 months ago without any answer:https://community.databricks.com/s/question/0D58Y000096VjKrSAK/managedlibraryinstallfailed-when-changing-databricks-runtime-version-from-91-to-110I was using runtime 9.1, and then upgr...

  • 9110 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @JOSE RODRIGUEZ​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 1 kudos
1 More Replies
Ondrej_Lostak
by New Contributor
  • 1633 Views
  • 2 replies
  • 0 kudos

Visulization only from sample of data

When I display dataframe and add visualization, I can see a preview from only a sample of data, and when I confirm it, it is counted from all of the data. Until now, everything is fine. However, when I change the dataframe, the visualization is incon...

  • 1633 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ondrej Lostak​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
1 More Replies
thushar
by Contributor
  • 4130 Views
  • 4 replies
  • 0 kudos

Delta file partitions

Have one function to create files with partitions, in that the partitions are created based on metadata (getPartitionColumns) that we are keeping. In a table we have two columns that are mentioned as partition columns, say 'Team' and 'Speciality'. Wh...

  • 4130 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Thushar R​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 0 kudos
3 More Replies
sedat
by New Contributor II
  • 5405 Views
  • 2 replies
  • 0 kudos

Rust support (?) in databricks

Hi, for kafka streams and integration, I have seen some presentations and documents Rust is a good alternative to Spark. Is there a native support for RUST in databricks or what is best method to connect to kafka resources within Databricks.thanks fo...

  • 5405 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Sedat EKSI​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 0 kudos
1 More Replies
Anjum
by New Contributor II
  • 5501 Views
  • 6 replies
  • 1 kudos

PGP encryption and decryption using gnupg

Hi,We are using python-gnupg==0.4.8 package for encryption and decryption and this was working as expected when we are using Databricks runtime : 9.1 LTS but when we upgarded our runtime to 12.1, it stopped working with error "gnupghome should be a d...

  • 5501 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Anjum Aara​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 1 kudos
5 More Replies
Prasann_gupta
by New Contributor
  • 10647 Views
  • 3 replies
  • 0 kudos

SQL CONTAINS Function is not working on Databricks

I am trying to use sql CONTAINS function in my sql query but it is throwing the below error :AnalysisException: Undefined function: 'CONTAINS'. This function is neither a registered temporary function nor a permanent function registered in the databa...

  • 10647 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Prasann Gupta​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
2 More Replies
Abhradwip
by New Contributor II
  • 3895 Views
  • 3 replies
  • 0 kudos

How to create Delta Live table from Json files using Custom schema? I am getting the below error for the attached code # Error org.apache.spark.sql.AnalysisException: Table has a user-specified schema that is incompatible with the schema

#### Code# CodeImport DataTypefrom pyspark.sql.types import StructType, StructField, TimestampType, IntegerType, StringType, FloatType, BooleanType, LongType# Define Custom Schemacall_schema = StructType(  [    StructField("RecordType", StringType(),...

  • 3895 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Abhradwip Mukherjee​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from yo...

  • 0 kudos
2 More Replies
Siebert_Looije
by Contributor
  • 2136 Views
  • 2 replies
  • 0 kudos

How to fix 'An error occurred while rendering this editor' in github databricks?

How to fix the error 'An error occurred while rendering this editor.' in the github UI from databricks?Kind regards,Siebert Looije

image
  • 2136 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Siebert Looije​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
1 More Replies
najmead
by Contributor
  • 6417 Views
  • 2 replies
  • 1 kudos

Spark Settings in SQL Warehouse

I'm running a query, trying to parse a string into a map, and I get the following error;org.apache.spark.SparkRuntimeException: Duplicate map key was found, please check the input data. If you want to remove the duplicated keys, you can set "spark.s...

  • 6417 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Nicholas Mead​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

  • 1 kudos
1 More Replies
Rob_79
by New Contributor II
  • 2463 Views
  • 2 replies
  • 0 kudos
  • 2463 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rabie Ash​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
ssy
by New Contributor II
  • 3513 Views
  • 2 replies
  • 0 kudos

How to configure pip file to include libraries from a proxy location

I need to configure pip file to include login credentials to allow for libraries to download from corporate artifactory. I'm trying to learn how to open a config file within databricks and add my credentials and package information. I will then have ...

  • 3513 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Samy Syed​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Jfoxyyc
by Valued Contributor
  • 2195 Views
  • 2 replies
  • 0 kudos

DLT - deduplication pattern?

Say we have an incremental append happening using autoloader, where filename is being added to the dataframe and that's all. If we want to de-duplicate this data in a rolling window, we can do something like merge into logs using dedupedLogs on ...

  • 2195 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Jordan Fox​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels