cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

190809
by Contributor
  • 459 Views
  • 1 replies
  • 0 kudos

Trying to figure out what is causing non-null values in my bronze tables to be returned as NULL in silver tables.

I have a process which loads data from json to a bronze table. It then adds a couple of columns and creates a silver table. But the silver table has NULL values where there were values in the bronze tables. Process as follows:def load_to_silver(sourc...

  • 459 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Rachel Cunningham​ :One possible reason for this issue could be a data type mismatch between the bronze and silver tables. It is possible that the column in the bronze table has a non-null value, but the data type of that column is different from th...

  • 0 kudos
Harsh_Paliwal
by New Contributor
  • 1191 Views
  • 1 replies
  • 0 kudos

java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, kernel exited with exit code 1.

I am running a parameterized autoloader notebook in a workflow.This notebook is being called 29 times in parallel, and FYI UC is also enabled.I am facing this error:java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, ke...

image
  • 1191 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Harsh Paliwal​ :The error message suggests that there might be a conflict with the xtables lock.One thing you could try is to add the -w option as suggested by the error message. You can add the following command to the beginning of your notebook t...

  • 0 kudos
Chris_Konsur
by New Contributor III
  • 1482 Views
  • 1 replies
  • 0 kudos

Unit test with Nutter

When I run the simple test in a notebook, it works fine, but when I run it from the Azure ADO pipeline, it fails with the error.code;def __init__(self):  NutterFixture.__init__(self)  from runtime.nutterfixture import NutterFixture, tagclass uTestsDa...

  • 1482 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Chris Konsur​ :The error message suggests that there is an issue with the standard output buffer when the Python interpreter is shutting down, which could be related to daemon threads. This error is not specific to Databricks or Azure ADO pipeline, ...

  • 0 kudos
danniely
by New Contributor II
  • 2675 Views
  • 1 replies
  • 2 kudos

Pyspark RDD fails with pytest

when I call RDD Apis during pytest, it seems like module "serializer.py" cannot find any other modules under pyspark.I've already looked up on the internet, and it seems like pyspark modules are not properly importing other referring modules.I see ot...

  • 2675 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@hyunho lee​ : It sounds like you are encountering an issue with PySpark's serializer not being able to find the necessary modules during testing with Pytest. One solution you could try is to set the PYTHONPATH environment variable to include the pat...

  • 2 kudos
quakenbush
by Contributor
  • 2266 Views
  • 1 replies
  • 0 kudos

Is there something like Oracle's VPD-Feature in Databricks?

Since I am porting some code from Oracle to Databricks, I have another specific question.In Oracle there's something called Virtual Private Database, VPD. It's a simple security feature used to generate a WHERE-clause which the system will add to a u...

  • 2266 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Roger Bieri​ :In Databricks, you can use the UserDefinedFunction (UDF) feature to create a custom function that will be applied to a DataFrame. You can use this feature to add a WHERE clause to a DataFrame based on the user context. Here's an exampl...

  • 0 kudos
Fed
by New Contributor III
  • 4310 Views
  • 1 replies
  • 0 kudos

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

Tree-based estimators in pyspark.ml have an argument called checkpointIntervalcheckpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will ...

  • 4310 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Federico Trifoglio​ :If sc.getCheckpointDir() returns None, it means that no checkpoint directory is set in the SparkContext. In this case, the checkpointInterval argument will indeed be ignored. To set a checkpoint directory, you can use the SparkC...

  • 0 kudos
Phani1
by Valued Contributor
  • 2106 Views
  • 1 replies
  • 0 kudos

best practices/steps for hive meta store backup and restore.

Hi Team,Could you share with us the best practices/steps for hive meta store backup and restore?Regards,Phanindra

  • 2106 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Janga Reddy​ :Certainly! Here are the steps for Hive metastore backup and restore on Databricks:Backup:Stop all running Hive services and jobs on the Databricks cluster.Create a backup directory in DBFS (Databricks File System) where the metadata fi...

  • 0 kudos
maaaxx
by New Contributor III
  • 1078 Views
  • 4 replies
  • 0 kudos

dbutils conflicts with a custom spark extension

Hello dear community,we have installed a custom spark extension to filter the files allowed to be read into the notebook. It was all good if we use the spark functions.However, the files are not filtered properly if the user would use e.g., dbutils.f...

  • 1078 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vartika
Moderator
  • 0 kudos

Hi @Yuan Gao​,Checking in. If @tayyab vohra​'s answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information? Thanks!

  • 0 kudos
3 More Replies
Direo
by Contributor
  • 595 Views
  • 1 replies
  • 0 kudos

Operations applied when running fs.write_table to overwrite existing feature table in hive metastore

Hi,there was a need to query an older snapshot of a table. Therefore ran:deltaTable = DeltaTable.forPath(spark, 'dbfs:/<path>') display(deltaTable.history())and noticed that every fs.write_table run triggers two operations:Write and CREATE OR REPLACE...

image
  • 595 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Direo Direo​ :When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:It writes the new data to disk in the Delta format, andIt at...

  • 0 kudos
harry546
by New Contributor III
  • 2059 Views
  • 6 replies
  • 3 kudos

Resolved! Security Analysis Tool (SAT) On Azure setup failed with error - [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with name `workspace_status` cannot be resolved.

Hi All,I was trying to setup Security Analysis Tool (SAT) on Azure Databricks cluster. I followed the setup steps motioned over here - https://github.com/databricks-industry-solutions/security-analysis-tool/blob/main/docs/setup.mdI started to run "se...

  • 2059 Views
  • 6 replies
  • 3 kudos
Latest Reply
arun_pamulapati
New Contributor III
  • 3 kudos

For those who may be coming here to this questions, thanks to @Arnold Souza​ and @Harish Koduru​  We not only updated our setup instructions https://github.com/databricks-industry-solutions/security-analysis-tool/blob/main/docs/setup.md but we also c...

  • 3 kudos
5 More Replies
Data_Analytics1
by Contributor III
  • 7126 Views
  • 8 replies
  • 2 kudos

TimeoutException: Futures timed out after [5 seconds]. I am getting this error while running few parallel jobs at an interval of 5 minutes.

java.util.concurrent.TimeoutException: Futures timed out after [5 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at scala.concurrent.Await$.$...

  • 7126 Views
  • 8 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Mahesh Chahare​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 2 kudos
7 More Replies
tech2cloud
by New Contributor II
  • 1270 Views
  • 2 replies
  • 0 kudos

Databricks Autoloader streamReader does not include the partition column as part of output.

I have folder structure at source such as/transaction/date_=2023-01-20/hr_=02/tras01.csv/transaction/date_=2023-01-20/hr_=03/tras02.csvWhere 'date_' and 'hr_' are my partitions and present in the dataset as well. But the streamReader does not read th...

image
  • 1270 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
zyang
by Contributor
  • 4803 Views
  • 2 replies
  • 1 kudos

Set owner when creating a view in databricks sql

Hi,I would like to set the owner when create a view in databricks sql.Is it possible? I cannot find anything about it.Best

  • 4803 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @z yang​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

  • 1 kudos
1 More Replies
grazie
by Contributor
  • 1079 Views
  • 3 replies
  • 2 kudos

Do you need to be workspace admin to create jobs?

We're using a setup where we use gitlab ci to deploy workflows using a service principal, using the Jobs API (2.1) https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreateWhen we wanted to reduce permissions of the ci to minimu...

  • 1079 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Geir Iversen​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 2 kudos
2 More Replies
elgeo
by Valued Contributor II
  • 2629 Views
  • 2 replies
  • 0 kudos

Trasform SQL Cursor using Pyspark in Databricks

We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example i...

  • 2629 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @ELENI GEORGOUSI​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
1 More Replies
Labels
Top Kudoed Authors