cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bharath_1610
by New Contributor
  • 3147 Views
  • 2 replies
  • 1 kudos

Resolved! Check Existence of table

Hi Team,How do we check the existence of a table in ADF container using SQL query in Databricks?Thanks in advance.

  • 3147 Views
  • 2 replies
  • 1 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 1 kudos

Hi, please elaborate on the issue for us to help you resolve it.

  • 1 kudos
1 More Replies
Mr__E
by Contributor II
  • 1978 Views
  • 1 replies
  • 3 kudos

Sync prod WS DBs to dev WS DBs

We have a couple sources we'd already set up to stream to prod using a 3p system. Is there a way to sync this directly to our dev workspace to build pipelines? eg. directly connecting to a cluster in prod and pull with a job cluster, dump to S3 and u...

  • 1978 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

DBFS can be used in many ways. Please refer below: Allows you to interact with object storage using directory and file semantics instead of cloud-specific API commands.Allows you to mount cloud object storage locations so that you can map storage cre...

  • 3 kudos
parthibsg
by New Contributor II
  • 2100 Views
  • 1 replies
  • 2 kudos

When to use Dataframes API over Spark SQL

Hello Experts,I am new to Databricks. Building data pipelines, I have both batch and streaming data.Should I use Dataframes API to read csv files then convert to parquet format then do the transformation? orwrite to table using CSV then use Spark SQL...

  • 2100 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi Rathinam, It would be better to understand the pipeline more in this situation. Writing to table using CSV and then using spark SQL will be faster in few cases than the other one.

  • 2 kudos
lokeshr
by New Contributor
  • 2046 Views
  • 2 replies
  • 1 kudos

Clarity on usage STREAM while defining DLT tables

Hi, I am currently trying to learn Databricks and going through tutorials and learning materials. I came across this link https://databricks.com/discover/pages/getting-started-with-delta-live-tablesWhile I get most of what is described in page, I fin...

  • 2046 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Lokesh Raju​,Just a friendly follow-up. Did Tomasz's response help you to resolved your question? If it did, please mark it as best.

  • 1 kudos
1 More Replies
yatharthmahesh
by New Contributor III
  • 4776 Views
  • 3 replies
  • 6 kudos

ENABLE CHANGE DATA FEED FOR EXISTING DELTA-TABLE

I have a delta table already created, now I want to enable the change data feed. I read that I have to set delta.enableChangeDataFeed property to true. But however, this cannot be done using the Scala API. I tried using this but it didn't work. I am ...

  • 4776 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

'delta.enableChangeDataFeed' have to be without quotes. spark.sql("ALTER TABLE delta_training.onaudience_dpm SET TBLPROPERTIES (delta.enableChangeDataFeed = true)").show()

  • 6 kudos
2 More Replies
KumarShiv
by New Contributor III
  • 3135 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Spark SQL function "PERCENTILE_DISC()" output not accurate.

I am try to get the percentile values on different splits but I got that the result of Databricks PERCENTILE_DISC() function is not accurate . I have run the same query on MS SQL but getting different result set.Here are both result sets for Pyspark ...

  • 3135 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Databricks Employee
  • 2 kudos

The reason might be that in SQL PERCENTILE_DISC is nondeterministic

  • 2 kudos
1 More Replies
Trung
by Contributor
  • 4621 Views
  • 5 replies
  • 5 kudos

Job fail due to Access Denied

please help me to solve the problem that my data bricks account can not start the Job by triggering manually or scheduling although I can run the script without error.

image.png
  • 4621 Views
  • 5 replies
  • 5 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 5 kudos

Hi @trung nguyen​ , Please check if you have the necessary instance profile attached to the Job cluster. You are definitely missing something related to the IAM.

  • 5 kudos
4 More Replies
Anonymous
by Not applicable
  • 2208 Views
  • 4 replies
  • 4 kudos

Invalid shard address

I'm running pyspark through databricks-connect and getting an error saying```ERROR SparkClientManager: Fail to get the SparkClientjava.util.concurrent.ExecutionException: com.databricks.service.SparkServiceConnectionException: Invalid shard address:`...

  • 2208 Views
  • 4 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

hi @Marco Wong​ was this working before and failing now? Are you behind a VPN or firewall? If so can you check by disabling it?enable traces at wireshark and collected dump to check if there is traffic going to workspace?Check if you can get curl wor...

  • 4 kudos
3 More Replies
krsimons
by New Contributor
  • 1719 Views
  • 3 replies
  • 0 kudos

How do I automate my Databricks script?

How do I automate my Databricks script?

  • 1719 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hey there @Kayla Simons​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell...

  • 0 kudos
2 More Replies
fshimamoto
by New Contributor III
  • 3858 Views
  • 3 replies
  • 2 kudos

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of ch...

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of changes in the schema?​

  • 3858 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vartika
Databricks Employee
  • 2 kudos

Hey there @Fernando Martin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from...

  • 2 kudos
2 More Replies
RajeshRK
by Contributor II
  • 8394 Views
  • 10 replies
  • 4 kudos

Databricks job fails while creating table.

Hi Team,The Databricks job fails with the below error while creating EXTERNAL table.com.simba.spark.jdbc41.internal.apache.http.wire - Error running query: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.a...

  • 8394 Views
  • 10 replies
  • 4 kudos
Latest Reply
Vartika
Databricks Employee
  • 4 kudos

Hey there @Rajesh Kannan R​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from...

  • 4 kudos
9 More Replies
159312
by New Contributor III
  • 3680 Views
  • 1 replies
  • 0 kudos

How to set pipelines.incompatibleViewCheck.enabled = false

I tried to load a static table as source to a streaming dlt pipeline. I understand this is not optimum, but it provides the best path toward eventually having a full streaming pipeline. When I do I get the following error:pyspark.sql.utils.Analysis...

  • 3680 Views
  • 1 replies
  • 0 kudos
Latest Reply
kfoster
Contributor
  • 0 kudos

when you declare a table or view, you can pass use something as this: @dlt.table( spark_conf={ "pipelines.incompatibleViewCheck.enabled": "false" } )

  • 0 kudos
PrebenOlsen
by New Contributor III
  • 2909 Views
  • 1 replies
  • 1 kudos

Resolved! Why does @dlt.table from a table give different results than from a view?

I have some data in silver that I read in as a view using the __apply_changes function on. I create a table based on this, and I then want to create my gold-table, after doing a .groupBy() and .pivot(). The transformations I do in the gold-table aren...

image image
  • 2909 Views
  • 1 replies
  • 1 kudos
Latest Reply
PrebenOlsen
New Contributor III
  • 1 kudos

I have found a temporary solution to solve this. The .pivot("columnName") should automatically grab all the values it can find, but for some reason it does not. I need to specify the values, using.pivot("group_name", "group0", "group1", "group2"...) ...

  • 1 kudos
SatishGunjal
by New Contributor
  • 3406 Views
  • 1 replies
  • 0 kudos

Data frame takes long time to print count of rows

We have a pyspark data frame with 50 MN records. We can display records from it, but it takes around 10 minutes to print the shape of dataframe. We aim to use this data for modelling that will take some numerical features based on the final data fra...

  • 3406 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hanna08
New Contributor II
  • 0 kudos

Thanks for the detailed explanation. For those who want to have constant technical support for their work processes, I recommend JD Young. Here is only the latest information about the update in the world of information technology solutions and cyber...

  • 0 kudos
Cano
by New Contributor III
  • 1926 Views
  • 1 replies
  • 2 kudos

How to add notebook to my Databricks jdbc url?

Please how do I add a notebook to the jdbc url in order to run queries externally?jdbc:databricks://dbc-a1b2345c-d6e7.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/1234567890123456/1234-567890-reef123;AuthMech=3;...

  • 1926 Views
  • 1 replies
  • 2 kudos
Latest Reply
ranged_coop
Valued Contributor II
  • 2 kudos

Not sure if it is possible.Alternatively you could try adding your notebook to a job, and then triggering that job via jobs api.Please refer below link Jobs API 2.1 | Databricks on AWS

  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels