Data Engineering

Forum Posts

Sorted by:

by 190809 • Contributor

01-27-2023 6:45:34 AM

459 Views
1 replies
0 kudos

Trying to figure out what is causing non-null values in my bronze tables to be returned as NULL in silver tables.

I have a process which loads data from json to a bronze table. It then adds a couple of columns and creates a silver table. But the silver table has NULL values where there were values in the bronze tables. Process as follows:def load_to_silver(sourc...

Data Engineering

459 Views
1 replies
0 kudos

01-27-2023 6:45:34 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:23:48 AM

0 kudos

@Rachel Cunningham :One possible reason for this issue could be a data type mismatch between the bronze and silver tables. It is possible that the column in the bronze table has a non-null value, but the data type of that column is different from th...

0 kudos

04-10-2023 7:23:48 AM

by Harsh_Paliwal • New Contributor

01-27-2023 1:47:52 AM

1191 Views
1 replies
0 kudos

java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, kernel exited with exit code 1.

I am running a parameterized autoloader notebook in a workflow.This notebook is being called 29 times in parallel, and FYI UC is also enabled.I am facing this error:java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, ke...

Data Engineering

1191 Views
1 replies
0 kudos

01-27-2023 1:47:52 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:22:08 AM

0 kudos

@Harsh Paliwal :The error message suggests that there might be a conflict with the xtables lock.One thing you could try is to add the -w option as suggested by the error message. You can add the following command to the beginning of your notebook t...

0 kudos

04-10-2023 7:22:08 AM

by Chris_Konsur • New Contributor III

01-27-2023 10:06:06 PM

1482 Views
1 replies
0 kudos

Unit test with Nutter

When I run the simple test in a notebook, it works fine, but when I run it from the Azure ADO pipeline, it fails with the error.code;def __init__(self): NutterFixture.__init__(self) from runtime.nutterfixture import NutterFixture, tagclass uTestsDa...

Data Engineering

1482 Views
1 replies
0 kudos

01-27-2023 10:06:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:14:45 AM

0 kudos

@Chris Konsur :The error message suggests that there is an issue with the standard output buffer when the Python interpreter is shutting down, which could be related to daemon threads. This error is not specific to Databricks or Azure ADO pipeline, ...

0 kudos

04-10-2023 7:14:45 AM

by danniely • New Contributor II

01-31-2023 7:25:36 AM

2675 Views
1 replies
2 kudos

Pyspark RDD fails with pytest

when I call RDD Apis during pytest, it seems like module "serializer.py" cannot find any other modules under pyspark.I've already looked up on the internet, and it seems like pyspark modules are not properly importing other referring modules.I see ot...

Data Engineering

2675 Views
1 replies
2 kudos

01-31-2023 7:25:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:09:42 AM

2 kudos

@hyunho lee : It sounds like you are encountering an issue with PySpark's serializer not being able to find the necessary modules during testing with Pytest. One solution you could try is to set the PYTHONPATH environment variable to include the pat...

2 kudos

04-10-2023 7:09:42 AM

by quakenbush • Contributor

01-30-2023 6:01:16 AM

2266 Views
1 replies
0 kudos

Is there something like Oracle's VPD-Feature in Databricks?

Since I am porting some code from Oracle to Databricks, I have another specific question.In Oracle there's something called Virtual Private Database, VPD. It's a simple security feature used to generate a WHERE-clause which the system will add to a u...

Data Engineering

2266 Views
1 replies
0 kudos

01-30-2023 6:01:16 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:07:58 AM

0 kudos

@Roger Bieri :In Databricks, you can use the UserDefinedFunction (UDF) feature to create a custom function that will be applied to a DataFrame. You can use this feature to add a WHERE clause to a DataFrame based on the user context. Here's an exampl...

0 kudos

04-10-2023 7:07:58 AM

by Fed • New Contributor III

01-26-2023 6:36:45 AM

4310 Views
1 replies
0 kudos

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

Tree-based estimators in pyspark.ml have an argument called checkpointIntervalcheckpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will ...

Data Engineering

4310 Views
1 replies
0 kudos

01-26-2023 6:36:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:00:04 AM

0 kudos

@Federico Trifoglio :If sc.getCheckpointDir() returns None, it means that no checkpoint directory is set in the SparkContext. In this case, the checkpointInterval argument will indeed be ignored. To set a checkpoint directory, you can use the SparkC...

0 kudos

04-10-2023 7:00:04 AM

by Phani1 • Valued Contributor

01-30-2023 3:11:52 AM

2106 Views
1 replies
0 kudos

best practices/steps for hive meta store backup and restore.

Hi Team,Could you share with us the best practices/steps for hive meta store backup and restore?Regards,Phanindra

Data Engineering

2106 Views
1 replies
0 kudos

01-30-2023 3:11:52 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 6:56:20 AM

0 kudos

@Janga Reddy :Certainly! Here are the steps for Hive metastore backup and restore on Databricks:Backup:Stop all running Hive services and jobs on the Databricks cluster.Create a backup directory in DBFS (Databricks File System) where the metadata fi...

0 kudos

04-10-2023 6:56:20 AM

by maaaxx • New Contributor III

03-05-2023 1:42:22 PM

1078 Views
4 replies
0 kudos

dbutils conflicts with a custom spark extension

Hello dear community,we have installed a custom spark extension to filter the files allowed to be read into the notebook. It was all good if we use the spark functions.However, the files are not filtered properly if the user would use e.g., dbutils.f...

Data Engineering

1078 Views
4 replies
0 kudos

03-05-2023 1:42:22 PM

View Replies

Latest Reply

Vartika
Moderator

04-03-2023 3:08:45 AM

0 kudos

Hi @Yuan Gao,Checking in. If @tayyab vohra's answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information? Thanks!

0 kudos

04-03-2023 3:08:45 AM

3 More Replies

by Direo • Contributor

04-07-2023 5:38:07 AM

595 Views
1 replies
0 kudos

Operations applied when running fs.write_table to overwrite existing feature table in hive metastore

Hi,there was a need to query an older snapshot of a table. Therefore ran:deltaTable = DeltaTable.forPath(spark, 'dbfs:/<path>') display(deltaTable.history())and noticed that every fs.write_table run triggers two operations:Write and CREATE OR REPLACE...

Data Engineering

595 Views
1 replies
0 kudos

04-07-2023 5:38:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 5:58:47 AM

0 kudos

@Direo Direo :When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:It writes the new data to disk in the Delta format, andIt at...

0 kudos

04-10-2023 5:58:47 AM

by harry546 • New Contributor III

02-06-2023 2:19:43 AM

2059 Views
6 replies
3 kudos

Resolved! Security Analysis Tool (SAT) On Azure setup failed with error - [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with name `workspace_status` cannot be resolved.

Hi All,I was trying to setup Security Analysis Tool (SAT) on Azure Databricks cluster. I followed the setup steps motioned over here - https://github.com/databricks-industry-solutions/security-analysis-tool/blob/main/docs/setup.mdI started to run "se...

Data Engineering

2059 Views
6 replies
3 kudos

02-06-2023 2:19:43 AM

View Replies

Latest Reply

arun_pamulapati
New Contributor III

04-10-2023 5:30:34 AM

3 kudos

For those who may be coming here to this questions, thanks to @Arnold Souza and @Harish Koduru We not only updated our setup instructions https://github.com/databricks-industry-solutions/security-analysis-tool/blob/main/docs/setup.md but we also c...

3 kudos

04-10-2023 5:30:34 AM

5 More Replies

by Data_Analytics1 • Contributor III

02-03-2023 12:07:42 AM

7126 Views
8 replies
2 kudos

TimeoutException: Futures timed out after [5 seconds]. I am getting this error while running few parallel jobs at an interval of 5 minutes.

java.util.concurrent.TimeoutException: Futures timed out after [5 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at scala.concurrent.Await$.$...

Data Engineering

7126 Views
8 replies
2 kudos

02-03-2023 12:07:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 12:38:19 AM

2 kudos

Hi @Mahesh Chahare Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

2 kudos

04-08-2023 12:38:19 AM

7 More Replies

by tech2cloud • New Contributor II

02-11-2023 6:20:28 AM

1270 Views
2 replies
0 kudos

Databricks Autoloader streamReader does not include the partition column as part of output.

I have folder structure at source such as/transaction/date_=2023-01-20/hr_=02/tras01.csv/transaction/date_=2023-01-20/hr_=03/tras02.csvWhere 'date_' and 'hr_' are my partitions and present in the dataset as well. But the streamReader does not read th...

Data Engineering

1270 Views
2 replies
0 kudos

02-11-2023 6:20:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:15:19 AM

0 kudos

Hi @Ravi Vishwakarma Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

0 kudos

04-10-2023 3:15:19 AM

1 More Replies

by zyang • Contributor

02-06-2023 1:32:13 AM

4803 Views
2 replies
1 kudos

Set owner when creating a view in databricks sql

Hi,I would like to set the owner when create a view in databricks sql.Is it possible? I cannot find anything about it.Best

Data Engineering

4803 Views
2 replies
1 kudos

02-06-2023 1:32:13 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:12:42 AM

1 kudos

Hi @z yang Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

1 kudos

04-10-2023 3:12:42 AM

1 More Replies

by grazie • Contributor

02-13-2023 6:07:29 AM

1079 Views
3 replies
2 kudos

Do you need to be workspace admin to create jobs?

We're using a setup where we use gitlab ci to deploy workflows using a service principal, using the Jobs API (2.1) https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreateWhen we wanted to reduce permissions of the ci to minimu...

Data Engineering

1079 Views
3 replies
2 kudos

02-13-2023 6:07:29 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:12:13 AM

2 kudos

Hi @Geir Iversen Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

2 kudos

04-10-2023 3:12:13 AM

2 More Replies

by elgeo • Valued Contributor II

02-13-2023 5:07:31 AM

2629 Views
2 replies
0 kudos

Trasform SQL Cursor using Pyspark in Databricks

We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example i...

Data Engineering

2629 Views
2 replies
0 kudos

02-13-2023 5:07:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:11:21 AM

0 kudos

Hi @ELENI GEORGOUSI Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

0 kudos

04-10-2023 3:11:21 AM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Trying to figure out what is causing non-null values in my bronze tables to be returned as NULL in silver tables.

java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, kernel exited with exit code 1.

Unit test with Nutter

Pyspark RDD fails with pytest

Is there something like Oracle's VPD-Feature in Databricks?

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

best practices/steps for hive meta store backup and restore.

dbutils conflicts with a custom spark extension

Operations applied when running fs.write_table to overwrite existing feature table in hive metastore

Resolved! Security Analysis Tool (SAT) On Azure setup failed with error - [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with name `workspace_status` cannot be resolved.

TimeoutException: Futures timed out after [5 seconds]. I am getting this error while running few parallel jobs at an interval of 5 minutes.

Databricks Autoloader streamReader does not include the partition column as part of output.

Set owner when creating a view in databricks sql

Do you need to be workspace admin to create jobs?

Trasform SQL Cursor using Pyspark in Databricks

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...