cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

thushar
by Contributor
  • 2026 Views
  • 4 replies
  • 0 kudos

Delta file partitions

Have one function to create files with partitions, in that the partitions are created based on metadata (getPartitionColumns) that we are keeping. In a table we have two columns that are mentioned as partition columns, say 'Team' and 'Speciality'. Wh...

  • 2026 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Thushar R​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 0 kudos
3 More Replies
sedat
by New Contributor II
  • 3273 Views
  • 2 replies
  • 0 kudos

Rust support (?) in databricks

Hi, for kafka streams and integration, I have seen some presentations and documents Rust is a good alternative to Spark. Is there a native support for RUST in databricks or what is best method to connect to kafka resources within Databricks.thanks fo...

  • 3273 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Sedat EKSI​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 0 kudos
1 More Replies
Anjum
by New Contributor II
  • 3323 Views
  • 6 replies
  • 1 kudos

PGP encryption and decryption using gnupg

Hi,We are using python-gnupg==0.4.8 package for encryption and decryption and this was working as expected when we are using Databricks runtime : 9.1 LTS but when we upgarded our runtime to 12.1, it stopped working with error "gnupghome should be a d...

  • 3323 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Anjum Aara​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 1 kudos
5 More Replies
Prasann_gupta
by New Contributor
  • 4992 Views
  • 3 replies
  • 0 kudos

SQL CONTAINS Function is not working on Databricks

I am trying to use sql CONTAINS function in my sql query but it is throwing the below error :AnalysisException: Undefined function: 'CONTAINS'. This function is neither a registered temporary function nor a permanent function registered in the databa...

  • 4992 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Prasann Gupta​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
2 More Replies
Abhradwip
by New Contributor II
  • 2452 Views
  • 3 replies
  • 0 kudos

How to create Delta Live table from Json files using Custom schema? I am getting the below error for the attached code # Error org.apache.spark.sql.AnalysisException: Table has a user-specified schema that is incompatible with the schema

#### Code# CodeImport DataTypefrom pyspark.sql.types import StructType, StructField, TimestampType, IntegerType, StringType, FloatType, BooleanType, LongType# Define Custom Schemacall_schema = StructType(  [    StructField("RecordType", StringType(),...

  • 2452 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Abhradwip Mukherjee​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from yo...

  • 0 kudos
2 More Replies
Siebert_Looije
by Contributor
  • 1236 Views
  • 2 replies
  • 0 kudos

How to fix 'An error occurred while rendering this editor' in github databricks?

How to fix the error 'An error occurred while rendering this editor.' in the github UI from databricks?Kind regards,Siebert Looije

image
  • 1236 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Siebert Looije​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
1 More Replies
najmead
by Contributor
  • 3486 Views
  • 2 replies
  • 1 kudos

Spark Settings in SQL Warehouse

I'm running a query, trying to parse a string into a map, and I get the following error;org.apache.spark.SparkRuntimeException: Duplicate map key was found, please check the input data. If you want to remove the duplicated keys, you can set "spark.s...

  • 3486 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Nicholas Mead​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

  • 1 kudos
1 More Replies
Rob_79
by New Contributor II
  • 1362 Views
  • 2 replies
  • 0 kudos
  • 1362 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rabie Ash​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
ssy
by New Contributor II
  • 2001 Views
  • 2 replies
  • 0 kudos

How to configure pip file to include libraries from a proxy location

I need to configure pip file to include login credentials to allow for libraries to download from corporate artifactory. I'm trying to learn how to open a config file within databricks and add my credentials and package information. I will then have ...

  • 2001 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Samy Syed​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Jfoxyyc
by Valued Contributor
  • 1433 Views
  • 2 replies
  • 0 kudos

DLT - deduplication pattern?

Say we have an incremental append happening using autoloader, where filename is being added to the dataframe and that's all. If we want to de-duplicate this data in a rolling window, we can do something like merge into logs using dedupedLogs on ...

  • 1433 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Jordan Fox​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Nirbhay
by New Contributor II
  • 1128 Views
  • 3 replies
  • 0 kudos

Databricks community edition login issue

I am unable to login to data bricks community edition with my login id nirbhay.singh06@gmail.comPlease help me or send me mail if possible what so ever is the solution.This is required for my practice what should i do why every time getting issue her...

  • 1128 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Nirbhay Singh​ Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not reso...

  • 0 kudos
2 More Replies
Akshith_Rajesh
by New Contributor III
  • 1537 Views
  • 4 replies
  • 1 kudos

Does DataBricks lock the file in Adls Gen 2 before writing (Append) to a file If yes then how can we fetch the file is locked

I have a requirement , I am running 2 Notebooks parallelly I want to overwrite the file parallelly .If 2 Notebooks Try to overwrite the file at the same time , will I lose the data because of overwriting the file at the same time .I want to overwr...

  • 1537 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Rajesh Akshith​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
3 More Replies
essentialDatabr
by New Contributor II
  • 1648 Views
  • 1 replies
  • 1 kudos

Confusion about {{run_id}} and {{parent_run_id}} variables for Databricks jobs (Azure)

In Databricks jobs on Azure you can use the {{run_id}} and {{parent_run_id}}variables for a specific run: https://docs.databricks.com/workflows/jobs/jobs.htmlFor Databricks jobs with only two or more tasks, then {{run_id}} seems to correspond to task...

  • 1648 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Kasper H​ :Yes, you are correct in your understanding that in Databricks jobs with multiple tasks, the {{run_id}} variable corresponds to the task_run_id and the {{parent_run_id}} variable corresponds to the job_run_id.For Databricks jobs with only ...

  • 1 kudos
asethia
by New Contributor
  • 3079 Views
  • 1 replies
  • 0 kudos

delta lake in Apache Spark

Hi,As per documentation https://docs.delta.io/latest/quick-start.html , we can configure DeltaCatalog using spark.sql.catalog.spark_catalog.The Iceberg supports two Catalog implementations (https://iceberg.apache.org/docs/latest/spark-configuration/#...

  • 3079 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Arun Sethia​ :Yes, Delta Lake also supports custom catalogs. Delta Lake uses the Spark Catalog API, which allows for pluggable catalog implementations. You can implement your own custom catalog to use with Delta Lake.To use a custom catalog, you can...

  • 0 kudos
kll
by New Contributor III
  • 3698 Views
  • 1 replies
  • 1 kudos

Resolved! OSError: Invalid argument when attempting to save a pandas dataframe to csv

I am attempting to save a pandas DataFrame to as csv to a directory I created in Databricks workspace or in the `cwd`. import pandas as pd   import os   df.to_csv("data.csv", index=False)   df.to_csv(str(os.getcwd()) + "/data.csv", index=False)      ...

  • 3698 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @Keval Shah​ ,You can save your dataframe to csv in dbfs storage.Please refer below code that might help you-df = pd.read_csv(StringIO(data), sep=',') #print(df) df.to_csv('/dbfs/FileStore/ajay/file1.txt')

  • 1 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors