cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

j02424
by New Contributor
  • 4196 Views
  • 1 replies
  • 4 kudos

Best practice to delete /dbfs/tmp ?

What is best practice regarding the tmp folder? We have a very large amount of data in that folder and not sure whether to delete, back up etc?

  • 4196 Views
  • 1 replies
  • 4 kudos
Latest Reply
Debayan
Databricks Employee
  • 4 kudos

/dbfs/tmp can contain a lot of files including temporary system files used for intermediary calculations or other sub directories which can contain packages of user defined installations. It is always better to backup the files.

  • 4 kudos
Akshith_Rajesh
by New Contributor III
  • 6970 Views
  • 3 replies
  • 6 kudos

Unable to write Data frame to Azure Synapse Table

When I am trying to insert records into the azure synapse Table using JDBC Its throwing below error com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column 'COMPANY_ADDRESS_STATE' has a data type that cannot participate ...

  • 6970 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Columns that use any of the following data types cannot be included in a columnstore index:nvarchar(max), varchar(max), and varbinary(max) (Applies to SQL Server 2016 and prior versions, and nonclustered columnstore indexes)so the issue is on the Azu...

  • 6 kudos
2 More Replies
him
by New Contributor III
  • 1783 Views
  • 1 replies
  • 3 kudos
  • 1783 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

You can try to refer the example below: https://docs.databricks.com/dev-tools/api/latest/examples.html#upload-a-big-file-into-dbfs

  • 3 kudos
Bharath_1610
by New Contributor
  • 3147 Views
  • 2 replies
  • 1 kudos

Resolved! Check Existence of table

Hi Team,How do we check the existence of a table in ADF container using SQL query in Databricks?Thanks in advance.

  • 3147 Views
  • 2 replies
  • 1 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 1 kudos

Hi, please elaborate on the issue for us to help you resolve it.

  • 1 kudos
1 More Replies
Mr__E
by Contributor II
  • 1978 Views
  • 1 replies
  • 3 kudos

Sync prod WS DBs to dev WS DBs

We have a couple sources we'd already set up to stream to prod using a 3p system. Is there a way to sync this directly to our dev workspace to build pipelines? eg. directly connecting to a cluster in prod and pull with a job cluster, dump to S3 and u...

  • 1978 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

DBFS can be used in many ways. Please refer below: Allows you to interact with object storage using directory and file semantics instead of cloud-specific API commands.Allows you to mount cloud object storage locations so that you can map storage cre...

  • 3 kudos
parthibsg
by New Contributor II
  • 2101 Views
  • 1 replies
  • 2 kudos

When to use Dataframes API over Spark SQL

Hello Experts,I am new to Databricks. Building data pipelines, I have both batch and streaming data.Should I use Dataframes API to read csv files then convert to parquet format then do the transformation? orwrite to table using CSV then use Spark SQL...

  • 2101 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi Rathinam, It would be better to understand the pipeline more in this situation. Writing to table using CSV and then using spark SQL will be faster in few cases than the other one.

  • 2 kudos
lokeshr
by New Contributor
  • 2046 Views
  • 2 replies
  • 1 kudos

Clarity on usage STREAM while defining DLT tables

Hi, I am currently trying to learn Databricks and going through tutorials and learning materials. I came across this link https://databricks.com/discover/pages/getting-started-with-delta-live-tablesWhile I get most of what is described in page, I fin...

  • 2046 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Lokesh Raju​,Just a friendly follow-up. Did Tomasz's response help you to resolved your question? If it did, please mark it as best.

  • 1 kudos
1 More Replies
yatharthmahesh
by New Contributor III
  • 4776 Views
  • 3 replies
  • 6 kudos

ENABLE CHANGE DATA FEED FOR EXISTING DELTA-TABLE

I have a delta table already created, now I want to enable the change data feed. I read that I have to set delta.enableChangeDataFeed property to true. But however, this cannot be done using the Scala API. I tried using this but it didn't work. I am ...

  • 4776 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

'delta.enableChangeDataFeed' have to be without quotes. spark.sql("ALTER TABLE delta_training.onaudience_dpm SET TBLPROPERTIES (delta.enableChangeDataFeed = true)").show()

  • 6 kudos
2 More Replies
KumarShiv
by New Contributor III
  • 3135 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Spark SQL function "PERCENTILE_DISC()" output not accurate.

I am try to get the percentile values on different splits but I got that the result of Databricks PERCENTILE_DISC() function is not accurate . I have run the same query on MS SQL but getting different result set.Here are both result sets for Pyspark ...

  • 3135 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Databricks Employee
  • 2 kudos

The reason might be that in SQL PERCENTILE_DISC is nondeterministic

  • 2 kudos
1 More Replies
Trung
by Contributor
  • 4628 Views
  • 5 replies
  • 5 kudos

Job fail due to Access Denied

please help me to solve the problem that my data bricks account can not start the Job by triggering manually or scheduling although I can run the script without error.

image.png
  • 4628 Views
  • 5 replies
  • 5 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 5 kudos

Hi @trung nguyen​ , Please check if you have the necessary instance profile attached to the Job cluster. You are definitely missing something related to the IAM.

  • 5 kudos
4 More Replies
Anonymous
by Not applicable
  • 2210 Views
  • 4 replies
  • 4 kudos

Invalid shard address

I'm running pyspark through databricks-connect and getting an error saying```ERROR SparkClientManager: Fail to get the SparkClientjava.util.concurrent.ExecutionException: com.databricks.service.SparkServiceConnectionException: Invalid shard address:`...

  • 2210 Views
  • 4 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

hi @Marco Wong​ was this working before and failing now? Are you behind a VPN or firewall? If so can you check by disabling it?enable traces at wireshark and collected dump to check if there is traffic going to workspace?Check if you can get curl wor...

  • 4 kudos
3 More Replies
krsimons
by New Contributor
  • 1719 Views
  • 3 replies
  • 0 kudos

How do I automate my Databricks script?

How do I automate my Databricks script?

  • 1719 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hey there @Kayla Simons​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell...

  • 0 kudos
2 More Replies
fshimamoto
by New Contributor III
  • 3860 Views
  • 3 replies
  • 2 kudos

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of ch...

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of changes in the schema?​

  • 3860 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vartika
Databricks Employee
  • 2 kudos

Hey there @Fernando Martin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from...

  • 2 kudos
2 More Replies
RajeshRK
by Contributor II
  • 8397 Views
  • 10 replies
  • 4 kudos

Databricks job fails while creating table.

Hi Team,The Databricks job fails with the below error while creating EXTERNAL table.com.simba.spark.jdbc41.internal.apache.http.wire - Error running query: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.a...

  • 8397 Views
  • 10 replies
  • 4 kudos
Latest Reply
Vartika
Databricks Employee
  • 4 kudos

Hey there @Rajesh Kannan R​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from...

  • 4 kudos
9 More Replies
159312
by New Contributor III
  • 3680 Views
  • 1 replies
  • 0 kudos

How to set pipelines.incompatibleViewCheck.enabled = false

I tried to load a static table as source to a streaming dlt pipeline. I understand this is not optimum, but it provides the best path toward eventually having a full streaming pipeline. When I do I get the following error:pyspark.sql.utils.Analysis...

  • 3680 Views
  • 1 replies
  • 0 kudos
Latest Reply
kfoster
Contributor
  • 0 kudos

when you declare a table or view, you can pass use something as this: @dlt.table( spark_conf={ "pipelines.incompatibleViewCheck.enabled": "false" } )

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels