Data Engineering

Forum Posts

Sorted by:

by Abhradwip • New Contributor II

03-09-2023 2:29:34 AM

2644 Views
3 replies
0 kudos

How to create Delta Live table from Json files using Custom schema? I am getting the below error for the attached code # Error org.apache.spark.sql.AnalysisException: Table has a user-specified schema that is incompatible with the schema

#### Code# CodeImport DataTypefrom pyspark.sql.types import StructType, StructField, TimestampType, IntegerType, StringType, FloatType, BooleanType, LongType# Define Custom Schemacall_schema = StructType( [ StructField("RecordType", StringType(),...

Data Engineering

2644 Views
3 replies
0 kudos

03-09-2023 2:29:34 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:22:23 PM

0 kudos

Hi @Abhradwip Mukherjee Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from yo...

0 kudos

03-31-2023 5:22:23 PM

2 More Replies

by Siebert_Looije • Contributor

03-08-2023 4:15:35 AM

1345 Views
2 replies
0 kudos

How to fix 'An error occurred while rendering this editor' in github databricks?

How to fix the error 'An error occurred while rendering this editor.' in the github UI from databricks?Kind regards,Siebert Looije

Data Engineering

1345 Views
2 replies
0 kudos

03-08-2023 4:15:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:18:44 PM

0 kudos

Hi @Siebert Looije Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

0 kudos

03-31-2023 5:18:44 PM

1 More Replies

by najmead • Contributor

03-10-2023 3:30:27 AM

3808 Views
2 replies
1 kudos

Spark Settings in SQL Warehouse

I'm running a query, trying to parse a string into a map, and I get the following error;org.apache.spark.SparkRuntimeException: Duplicate map key was found, please check the input data. If you want to remove the duplicated keys, you can set "spark.s...

Data Engineering

3808 Views
2 replies
1 kudos

03-10-2023 3:30:27 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:15:38 PM

1 kudos

Hi @Nicholas Mead Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

1 kudos

03-31-2023 5:15:38 PM

1 More Replies

by Rob_79 • New Contributor II

03-09-2023 8:55:06 PM

1511 Views
2 replies
0 kudos

Is it possible for Databricks to automatically discover pii data from a dataset while processing?

Data Engineering

1511 Views
2 replies
0 kudos

03-09-2023 8:55:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:14:55 PM

0 kudos

Hi @Rabie Ash Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-31-2023 5:14:55 PM

1 More Replies

by ssy • New Contributor II

03-06-2023 3:33:06 PM

2223 Views
2 replies
0 kudos

How to configure pip file to include libraries from a proxy location

I need to configure pip file to include login credentials to allow for libraries to download from corporate artifactory. I'm trying to learn how to open a config file within databricks and add my credentials and package information. I will then have ...

Data Engineering

2223 Views
2 replies
0 kudos

03-06-2023 3:33:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:13:18 PM

0 kudos

Hi @Samy Syed Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-31-2023 5:13:18 PM

1 More Replies

by Jfoxyyc • Valued Contributor

03-06-2023 8:56:48 AM

1533 Views
2 replies
0 kudos

DLT - deduplication pattern?

Say we have an incremental append happening using autoloader, where filename is being added to the dataframe and that's all. If we want to de-duplicate this data in a rolling window, we can do something like merge into logs using dedupedLogs on ...

Data Engineering

1533 Views
2 replies
0 kudos

03-06-2023 8:56:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:10:31 PM

0 kudos

Hi @Jordan Fox Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-31-2023 5:10:31 PM

1 More Replies

by Nirbhay • New Contributor II

03-05-2023 2:06:05 AM

1239 Views
3 replies
0 kudos

Databricks community edition login issue

I am unable to login to data bricks community edition with my login id nirbhay.singh06@gmail.comPlease help me or send me mail if possible what so ever is the solution.This is required for my practice what should i do why every time getting issue her...

Data Engineering

1239 Views
3 replies
0 kudos

03-05-2023 2:06:05 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-05-2023 9:36:38 PM

0 kudos

Hi @Nirbhay Singh Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not reso...

0 kudos

03-05-2023 9:36:38 PM

2 More Replies

by Akshith_Rajesh • New Contributor III

03-05-2023 3:13:15 AM

1777 Views
4 replies
1 kudos

Does DataBricks lock the file in Adls Gen 2 before writing (Append) to a file If yes then how can we fetch the file is locked

I have a requirement , I am running 2 Notebooks parallelly I want to overwrite the file parallelly .If 2 Notebooks Try to overwrite the file at the same time , will I lose the data because of overwriting the file at the same time .I want to overwr...

Data Engineering

1777 Views
4 replies
1 kudos

03-05-2023 3:13:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 4:52:14 PM

1 kudos

Hi @Rajesh Akshith Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

1 kudos

03-31-2023 4:52:14 PM

3 More Replies

by essentialDatabr • New Contributor II

02-23-2023 1:52:14 AM

1777 Views
1 replies
1 kudos

Confusion about {{run_id}} and {{parent_run_id}} variables for Databricks jobs (Azure)

In Databricks jobs on Azure you can use the {{run_id}} and {{parent_run_id}}variables for a specific run: https://docs.databricks.com/workflows/jobs/jobs.htmlFor Databricks jobs with only two or more tasks, then {{run_id}} seems to correspond to task...

Data Engineering

1777 Views
1 replies
1 kudos

02-23-2023 1:52:14 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 8:44:41 AM

1 kudos

@Kasper H :Yes, you are correct in your understanding that in Databricks jobs with multiple tasks, the {{run_id}} variable corresponds to the task_run_id and the {{parent_run_id}} variable corresponds to the job_run_id.For Databricks jobs with only ...

1 kudos

03-31-2023 8:44:41 AM

by asethia • New Contributor

02-23-2023 2:43:24 PM

3269 Views
1 replies
0 kudos

delta lake in Apache Spark

Hi,As per documentation https://docs.delta.io/latest/quick-start.html , we can configure DeltaCatalog using spark.sql.catalog.spark_catalog.The Iceberg supports two Catalog implementations (https://iceberg.apache.org/docs/latest/spark-configuration/#...

Data Engineering

3269 Views
1 replies
0 kudos

02-23-2023 2:43:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 8:43:02 AM

0 kudos

@Arun Sethia :Yes, Delta Lake also supports custom catalogs. Delta Lake uses the Spark Catalog API, which allows for pluggable catalog implementations. You can implement your own custom catalog to use with Delta Lake.To use a custom catalog, you can...

0 kudos

03-31-2023 8:43:02 AM

by kll • New Contributor III

03-30-2023 10:37:53 AM

4085 Views
1 replies
1 kudos

Resolved! OSError: Invalid argument when attempting to save a pandas dataframe to csv

I am attempting to save a pandas DataFrame to as csv to a directory I created in Databricks workspace or in the `cwd`. import pandas as pd import os df.to_csv("data.csv", index=False) df.to_csv(str(os.getcwd()) + "/data.csv", index=False) ...

Data Engineering

4085 Views
1 replies
1 kudos

03-30-2023 10:37:53 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

03-31-2023 4:27:40 AM

1 kudos

Hi @Keval Shah ,You can save your dataframe to csv in dbfs storage.Please refer below code that might help you-df = pd.read_csv(StringIO(data), sep=',') #print(df) df.to_csv('/dbfs/FileStore/ajay/file1.txt')

1 kudos

03-31-2023 4:27:40 AM

by Kaijser • New Contributor II

03-13-2023 1:37:35 PM

2927 Views
4 replies
1 kudos

Logging clogged up with error messages (OSError: [Errno 95] Operation not supported, --- Logging error ---)

I have encountered this issue for a while now and it happens each run that is triggered. I discovered 2 things:1) If I run my script on a cluster that is not active and the cluster is activated by a scheduled trigger (not manually!) this doesn't happ...

Data Engineering

2927 Views
4 replies
1 kudos

03-13-2023 1:37:35 PM

View Replies

Latest Reply

manasa
Contributor

03-31-2023 4:07:08 AM

1 kudos

Hi @Aaron Kaijser Are you able to your logfile to ADLS?If yes, could you please explain how you did it

1 kudos

03-31-2023 4:07:08 AM

3 More Replies

by Retko • Contributor

03-08-2023 11:47:18 PM

4375 Views
2 replies
3 kudos

Resolved! How to quickly check if Delta Table is Empty

Hi,I need some quick way to return True if Delta Table is Empty.Tried this, but is is quite slow when checking more tables.spark.read.table("table_name").count()spark.read.table("table_name").rdd.isEmpty()len(spark.read.table("table_name").head(1)) =...

Data Engineering

4375 Views
2 replies
3 kudos

03-08-2023 11:47:18 PM

View Replies

Latest Reply

Vartika
Moderator

03-31-2023 2:59:44 AM

3 kudos

Hi @Retko Okter Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best? If not, please tell us so we can help you.Thanks!

3 kudos

03-31-2023 2:59:44 AM

1 More Replies

by bharathi • New Contributor

03-06-2023 2:31:35 PM

956 Views
2 replies
1 kudos

Hive database

The hive database and tables created in my workspace is not visible for other users when we were trying to access the databricks created at our work place

Data Engineering

956 Views
2 replies
1 kudos

03-06-2023 2:31:35 PM

View Replies

Latest Reply

Vartika
Moderator

03-31-2023 2:54:51 AM

1 kudos

Hi @bharathi vish Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

1 kudos

03-31-2023 2:54:51 AM

1 More Replies

by uzairm • New Contributor III

03-10-2023 3:20:05 AM

4073 Views
2 replies
1 kudos

My whole code is running on driver node, I want my code to run on worker nodes so that the memory of driver node is not exhausted. Please tell me improvement is my codes. My spark crashes frequently when the pulled data from s3 is huge.

I am running process which has 4 steps.Querying s3 file paths from dynamo DB based on certain parameters given by user. (function to do so provided by client, just have to import). Returns a list of filesCheck if those file paths have already been qu...

Data Engineering

4073 Views
2 replies
1 kudos

03-10-2023 3:20:05 AM

View Replies

Latest Reply

Vartika
Moderator

03-31-2023 2:38:44 AM

1 kudos

Hi @uzair mustafa Thank you for posting your question in our community! We are happy to assist you.Does @Suteja Kanuri's answer help? If it does, would you be happy to mark it as best?This will help other community members who may have similar ques...

1 kudos

03-31-2023 2:38:44 AM

1 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

How to create Delta Live table from Json files using Custom schema? I am getting the below error for the attached code # Error org.apache.spark.sql.AnalysisException: Table has a user-specified schema that is incompatible with the schema

How to fix 'An error occurred while rendering this editor' in github databricks?

Spark Settings in SQL Warehouse

Is it possible for Databricks to automatically discover pii data from a dataset while processing?

How to configure pip file to include libraries from a proxy location

DLT - deduplication pattern?

Databricks community edition login issue

Does DataBricks lock the file in Adls Gen 2 before writing (Append) to a file If yes then how can we fetch the file is locked

Confusion about {{run_id}} and {{parent_run_id}} variables for Databricks jobs (Azure)

delta lake in Apache Spark

Resolved! OSError: Invalid argument when attempting to save a pandas dataframe to csv

Logging clogged up with error messages (OSError: [Errno 95] Operation not supported, --- Logging error ---)

Resolved! How to quickly check if Delta Table is Empty

Hive database

My whole code is running on driver node, I want my code to run on worker nodes so that the memory of driver node is not exhausted. Please tell me improvement is my codes. My spark crashes frequently when the pulled data from s3 is huge.

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error