cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Fed
by New Contributor III
  • 9793 Views
  • 1 replies
  • 0 kudos

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

Tree-based estimators in pyspark.ml have an argument called checkpointIntervalcheckpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will ...

  • 9793 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Federico Trifoglio​ :If sc.getCheckpointDir() returns None, it means that no checkpoint directory is set in the SparkContext. In this case, the checkpointInterval argument will indeed be ignored. To set a checkpoint directory, you can use the SparkC...

  • 0 kudos
Phani1
by Databricks MVP
  • 5828 Views
  • 1 replies
  • 0 kudos

best practices/steps for hive meta store backup and restore.

Hi Team,Could you share with us the best practices/steps for hive meta store backup and restore?Regards,Phanindra

  • 5828 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Janga Reddy​ :Certainly! Here are the steps for Hive metastore backup and restore on Databricks:Backup:Stop all running Hive services and jobs on the Databricks cluster.Create a backup directory in DBFS (Databricks File System) where the metadata fi...

  • 0 kudos
maaaxx
by New Contributor III
  • 3924 Views
  • 4 replies
  • 0 kudos

dbutils conflicts with a custom spark extension

Hello dear community,we have installed a custom spark extension to filter the files allowed to be read into the notebook. It was all good if we use the spark functions.However, the files are not filtered properly if the user would use e.g., dbutils.f...

  • 3924 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @Yuan Gao​,Checking in. If @tayyab vohra​'s answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information? Thanks!

  • 0 kudos
3 More Replies
Direo
by Contributor II
  • 2518 Views
  • 1 replies
  • 0 kudos

Operations applied when running fs.write_table to overwrite existing feature table in hive metastore

Hi,there was a need to query an older snapshot of a table. Therefore ran:deltaTable = DeltaTable.forPath(spark, 'dbfs:/<path>') display(deltaTable.history())and noticed that every fs.write_table run triggers two operations:Write and CREATE OR REPLACE...

image
  • 2518 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Direo Direo​ :When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:It writes the new data to disk in the Delta format, andIt at...

  • 0 kudos
harry546
by New Contributor III
  • 7810 Views
  • 6 replies
  • 3 kudos

Resolved! Security Analysis Tool (SAT) On Azure setup failed with error - [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with name `workspace_status` cannot be resolved.

Hi All,I was trying to setup Security Analysis Tool (SAT) on Azure Databricks cluster. I followed the setup steps motioned over here - https://github.com/databricks-industry-solutions/security-analysis-tool/blob/main/docs/setup.mdI started to run "se...

  • 7810 Views
  • 6 replies
  • 3 kudos
Latest Reply
arun_pamulapati
Databricks Employee
  • 3 kudos

For those who may be coming here to this questions, thanks to @Arnold Souza​ and @Harish Koduru​  We not only updated our setup instructions https://github.com/databricks-industry-solutions/security-analysis-tool/blob/main/docs/setup.md but we also c...

  • 3 kudos
5 More Replies
Data_Analytics1
by Contributor III
  • 23239 Views
  • 8 replies
  • 2 kudos

TimeoutException: Futures timed out after [5 seconds]. I am getting this error while running few parallel jobs at an interval of 5 minutes.

java.util.concurrent.TimeoutException: Futures timed out after [5 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at scala.concurrent.Await$.$...

  • 23239 Views
  • 8 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Mahesh Chahare​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 2 kudos
7 More Replies
tech2cloud
by New Contributor II
  • 3889 Views
  • 2 replies
  • 0 kudos

Databricks Autoloader streamReader does not include the partition column as part of output.

I have folder structure at source such as/transaction/date_=2023-01-20/hr_=02/tras01.csv/transaction/date_=2023-01-20/hr_=03/tras02.csvWhere 'date_' and 'hr_' are my partitions and present in the dataset as well. But the streamReader does not read th...

image
  • 3889 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
zyang
by Contributor II
  • 11485 Views
  • 2 replies
  • 1 kudos

Set owner when creating a view in databricks sql

Hi,I would like to set the owner when create a view in databricks sql.Is it possible? I cannot find anything about it.Best

  • 11485 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @z yang​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

  • 1 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 6913 Views
  • 2 replies
  • 0 kudos

Trasform SQL Cursor using Pyspark in Databricks

We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example i...

  • 6913 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @ELENI GEORGOUSI​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
1 More Replies
Rahul2025
by Databricks Partner
  • 6465 Views
  • 4 replies
  • 4 kudos

Make environment variables defined in init script available to Spark JVM job?

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

  • 6465 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rahul K​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
3 More Replies
bluesky
by New Contributor II
  • 4841 Views
  • 2 replies
  • 1 kudos

Identity error Spark Sql:not enough data columns;target has 3 but the inserted data has 2, it's the identity column which is missing here

While inserting into target table i am getting an error '"not enough data columns;target has 3 but the inserted data has 2" but it's the identity column which is the 8th column ".insert into table A(col 1,col 2,col3)select col2,col3from table Bjoin t...

  • 4841 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @sky blue​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
KaushikMaji
by New Contributor II
  • 4458 Views
  • 2 replies
  • 0 kudos

Error org.apache.spark.SparkSQLException: Unsupported type TIMESTAMP_WITH_TIMEZONE

Hi Guys,I am trying to load data from a custom JDBC data source like below. I am getting this error, in spite of specifying "customSchema" attribute on the timestamp column. Do you know how to resolve this?material_master = spark.read \.format("jdbc"...

  • 4458 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Kaushik Maji​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 0 kudos
1 More Replies
DBX-Beginer
by New Contributor
  • 6663 Views
  • 2 replies
  • 0 kudos

Display count of records in all tables in hive meta store based on one of the column value.

I have a DB name called Test in Hive meta store of data bricks. This DB contains around 100 tables. Each table has the column name called sourcesystem and many other columns. Now I need to display the count of records in each table group by source sy...

  • 6663 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Krish K​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
doalmeida
by New Contributor
  • 6241 Views
  • 2 replies
  • 0 kudos

Endpoint not found when creating automl experiment through API and UI

Hi everyone, I'm getting this exact error as shown bellow, when trying to create an automl experiment. This happens both through the UI and the API with my code or with databrick's example code. I've tried looking into this but had no luck finding an...

image
  • 6241 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Diogo Almeida​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
1 More Replies
Dataengineer_mm
by New Contributor
  • 7062 Views
  • 2 replies
  • 0 kudos

Passing a date parameter through workflow

Hi , when we pass the parameter through workflows in DB, should we need to manually provide the parameter all the time? or any dynamic way of passing the parameter?

  • 7062 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Menaka Murugesan​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 0 kudos
1 More Replies
Labels