cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Christine
by Contributor II
  • 6690 Views
  • 9 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 6690 Views
  • 9 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor III
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
8 More Replies
SaraCorralLou
by New Contributor III
  • 19353 Views
  • 5 replies
  • 2 kudos

Resolved! Error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

What is the problem?I am getting this error every time I run a python notebook on my Repo in Databricks.BackgroundThe notebook where I am getting the error is a notebook that creates a dataframe and the last step is to write the dataframe to a Delta ...

  • 19353 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sara Corral​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
4 More Replies
umair_hanif
by New Contributor II
  • 1812 Views
  • 2 replies
  • 1 kudos

Ingesting more than 7 million rows into a SQL Server Table

Hi All, I hope you're super well. I need your recommendations and solution for my problem.I am using a Databricks instance DS12_v2 which has 28GB RAM and 4 cores. I am ingesting 7.2 million rows into a SQL Server table and it is taking 57 min - 1 hou...

  • 1812 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

You can try to use BULK INSERT.https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver16Also using Data Factory instead of Databricks for the copy can be helpful.

  • 1 kudos
1 More Replies
js54123875
by New Contributor III
  • 2344 Views
  • 3 replies
  • 3 kudos

Setup for Unity Catalog, autoloader, three-level namespace, SCD2

I am trying to setup delta live tables pipelines to ingest data to bronze and silver tables. Bronze and Silver are separate schema. This will be triggered by a daily job. It appears to run fine when set as continuous, but fails when triggered.Table...

  • 2344 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Jennette Shepard​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 3 kudos
2 More Replies
del1000
by New Contributor III
  • 580 Views
  • 0 replies
  • 0 kudos

Problem with sparkContext.parallelize and volatile functions?

I have a code:from time import sleep from random import random from operator import add   def f(a: int) -> float: sleep(0.1) return random() rdd1 = sc.parallelize(range(20), 2) rdd2 = sc.parallelize(range(20), 2) rdd3 = sc.parallelize(rang...

  • 580 Views
  • 0 replies
  • 0 kudos
Paras
by New Contributor II
  • 2095 Views
  • 4 replies
  • 7 kudos
  • 2095 Views
  • 4 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Paras Gadhiya​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 7 kudos
3 More Replies
Ulf
by New Contributor II
  • 1007 Views
  • 1 replies
  • 0 kudos

Github and task integration

I have the same problem as described in this post (https://community.databricks.com/s/question/0D58Y00009ObQgdSAF/running-jobs-using-notebooks-in-a-remote-azure-devops-services-repos-git-repository-is-generating-notebook-not-found-error) and get this...

  • 1007 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi,Could you please check and let us know if this helps. https://community.databricks.com/s/question/0D53f00001GHVTNCA5/notebook-path-cant-be-in-dbfs

  • 0 kudos
Wout
by Contributor
  • 5831 Views
  • 6 replies
  • 7 kudos

Resolved! Wrong X-Axis Order when Visualization is Put on Dashboard

I have a visualization in which the X-axis values are displayed correctly in the Query Editor, in the order produced by the SQL query. However, when I add the visualization to a dashboard, the values are suddenly not sorted anymore.How is this possib...

correct wrong
  • 5831 Views
  • 6 replies
  • 7 kudos
Latest Reply
Wout
Contributor
  • 7 kudos

We have further analyzed the visualization problem and found two solutions.The original visualization consists of 1 series and has aggregation enabled in the UI (but is unused, since the query itself aggregates already).We found that the following tw...

  • 7 kudos
5 More Replies
DataRabbit
by New Contributor II
  • 10517 Views
  • 2 replies
  • 0 kudos

Resolved! py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.feature.VectorAssembler(java.lang.String) is not whitelisted.

Hello, i have a problem.When I try to run the MLlib Assembler (from pyspark.ml.feature import VectorAssembler) I get this error and I don't know what to do anymore. Please help.

  • 10517 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

Is it High Concurrency cluster with credential passthrough enabled? In that case, you can use a different cluster mode.https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.htmlThis exception is thrown when you have access...

  • 0 kudos
1 More Replies
saira1122
by New Contributor
  • 408 Views
  • 0 replies
  • 0 kudos

bit.ly

If you are having trouble with any study problem, you should first find the source of the problem.https://bit.ly/3AoiotQ

  • 408 Views
  • 0 replies
  • 0 kudos
davidvb
by New Contributor II
  • 1965 Views
  • 2 replies
  • 1 kudos

I have a big problem creating a community account

It is impossible for me create a community account. I put my data on web and in the next step, when the website show me the 3 type of data ( google, amazn etc) and I click on the “ "Get started with community account" the web show me this  I have try...

problem
  • 1965 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @david vazquez​,It seems like the website was down due to maintenance. You can check the status page next time to check why the website is down https://status.databricks.com/

  • 1 kudos
1 More Replies
Labels