cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

brickster_2018
by Databricks Employee
  • 1397 Views
  • 1 replies
  • 0 kudos

Resolved! When should I run the FSCK REPAIR command on my Delta table

Is it a good practice to run the FSCK REPAIR command on a regular basis? I have Optimize and VACUUM commands scheduled to run every day. 

  • 1397 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Unlike OPTIMIZE and VACUUM, FSCK REPAIR is not an operational command that has to be executed on a regular basis. FSCK REPAIR is useful to repair the Delta metadata and remove the reference of the files from the metadata that are no longer accessible...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1300 Views
  • 1 replies
  • 0 kudos

Resolved! How to update table using merge from value rather than from a table

My question is how can we do an upsert directly, that is, without using a source table. I would like to give the values myself directly.s there a simple way to do that for Delta tables?

  • 1300 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

A source table can be a subquery so the following should give you what you're after.MERGE INTO events   USING (VALUES(...)) // round brackets are required to denote a subquery   ON false // an artificial merge condition   WHEN NOT MATCHED ...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1436 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to drop a table

I have a Table which and I do not have access to the underlying data any longer. We do not need this dataset anymore, but unable to drop the table

  • 1436 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Use the below code snippet to forcefully drop the table:package org.apache.spark.sql.hive { import org.apache.spark.sql.hive.HiveUtils import org.apache.spark.SparkContext   object utils { def dropTable(sc: SparkContext, dbName: String, tableName...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1081 Views
  • 1 replies
  • 0 kudos

Delta Table to Spark Streaming to Synapse Table in azure databricks

Is there a way to keep my synapse database always in sync with latest data from delta table, My synapse database I believe doesn't support the stream as sink, can i get any workaround

  • 1081 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

You could try to keep the data in sync by appending the new data dataframe in a forEachBatch on your write stream, this method allows for arbitrary ways to write data, you can connect to the Datawarehouse with jdbc if necessary:with your batch functi...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1401 Views
  • 2 replies
  • 0 kudos

Resolved! Unable to run any commands on the cluster.

All the commands get canceled. even 1+1 is failing, the cluster is completely unusable.

  • 1401 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

More details on similar issues here: https://kb.databricks.com/python/python-command-cancelled.html

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 882 Views
  • 1 replies
  • 0 kudos
  • 882 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

1. AnalysisThe first phase of Spark SQL optimization is the analysis. Spark SQL starts with a relationship to be processed that can be in two ways. A serious form from an AST (abstract syntax tree) returned by an SQL parser, and on the other hand fro...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1647 Views
  • 1 replies
  • 0 kudos
  • 1647 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

setting the parameter ‘spark.cleaner.ttl’ or by dividing the long running jobs into different batches and writing the intermediary results to the disk.

  • 0 kudos
User16826994223
by Honored Contributor III
  • 451 Views
  • 0 replies
  • 0 kudos

Avro fileJune 11, 2021Apache Avro is a data serialization system. Avro provides:Rich data structures.A compact, fast, binary data format.A container f...

Avro fileJune 11, 2021Apache Avro is a data serialization system. Avro provides:Rich data structures.A compact, fast, binary data format.A container file, to store persistent data.Remote procedure call (RPC).Simple integration with dynamic languages....

  • 451 Views
  • 0 replies
  • 0 kudos
User16765131552
by Contributor III
  • 2339 Views
  • 1 replies
  • 0 kudos

Resolved! Connect to Microstrategy

Can Azure Databricks be connected through Microstrategy?

  • 2339 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16765131552
Contributor III
  • 0 kudos

Found this ...Azure Databricks to Microstrategy JDBC/ODBC Setup TipsPurposeThis is a quick reference for common Microstrategy configuration tips, tricks, and common pitfalls when setting up a connection to Databricks:NetworkingFor Azure, we recommend...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1605 Views
  • 1 replies
  • 1 kudos

File path Not recognisable for notebook jobs in DBFS

we are working on IDEs and once code is developed we put the .py file in DBFS and I am uisng that DBFS path to create a job , but I am getting an error dbfs:/artifacts/kg/bootstrap.py. I get the error notebook not found errror.what could be the is...

  • 1605 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

The actual notebooks that you create are not stored in Data plane but it is stored in but in control plane, you can import the notebooks through import in Databricks UI or using API , The notebook placed in DBFS cannot be used to create a job

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1125 Views
  • 1 replies
  • 1 kudos

How do i see all the dataframe column if I have more than 1000 column in dataframe

 I tried printSchema() of a Dataframe in Databricks. The Dataframe is having more than 1500 columns and apparently the printscheam function is truncating results and displaying only 1000 items. How to see all columns

  • 1125 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

Databricks also shows the schema of the Dataframe when it's created - click on the icon next to the name of variable that holds the dataframeIf you have output of more than limit, then I would imagine outputting the schema into file,

  • 1 kudos
Srikanth_Gupta_
by Valued Contributor
  • 1201 Views
  • 1 replies
  • 0 kudos
  • 1201 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Yes we can using below code snippetspark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribePattern", "topic.*") .load()

  • 0 kudos
User16826994223
by Honored Contributor III
  • 578 Views
  • 0 replies
  • 0 kudos

VM bootstrap and authentication When a VM boots up, it automatically authenticates with Databricks control plane using Managed Identity (MI), a per-VM...

VM bootstrap and authenticationWhen a VM boots up, it automatically authenticates with Databricks control plane using Managed Identity (MI), a per-VM credential signed by Azure AD. Once authenticated, the VM fetches secrets from the control plane, in...

  • 578 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels