Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hello, i have a problem.When I try to run the MLlib Assembler (from pyspark.ml.feature import VectorAssembler) I get this error and I don't know what to do anymore. Please help.
How could we whitelist this error below with DBR 13.3 and above? Py4JError: An error occurred while calling None.org.apache.spark.ml.recommendation.ALS. Trace: py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.recommendation...
Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...
What is the problem?I am getting this error every time I run a python notebook on my Repo in Databricks.BackgroundThe notebook where I am getting the error is a notebook that creates a dataframe and the last step is to write the dataframe to a Delta ...
Hi @Sara Corral Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
Hi All, I hope you're super well. I need your recommendations and solution for my problem.I am using a Databricks instance DS12_v2 which has 28GB RAM and 4 cores. I am ingesting 7.2 million rows into a SQL Server table and it is taking 57 min - 1 hou...
You can try to use BULK INSERT.https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver16Also using Data Factory instead of Databricks for the copy can be helpful.
I am trying to setup delta live tables pipelines to ingest data to bronze and silver tables. Bronze and Silver are separate schema. This will be triggered by a daily job. It appears to run fine when set as continuous, but fails when triggered.Table...
Hi @Jennette Shepard Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...
I have a code:from time import sleep
from random import random
from operator import add
def f(a: int) -> float:
sleep(0.1)
return random()
rdd1 = sc.parallelize(range(20), 2)
rdd2 = sc.parallelize(range(20), 2)
rdd3 = sc.parallelize(rang...
Hi @Paras Gadhiya Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...
I have the same problem as described in this post (https://community.databricks.com/s/question/0D58Y00009ObQgdSAF/running-jobs-using-notebooks-in-a-remote-azure-devops-services-repos-git-repository-is-generating-notebook-not-found-error) and get this...
I have a visualization in which the X-axis values are displayed correctly in the Query Editor, in the order produced by the SQL query. However, when I add the visualization to a dashboard, the values are suddenly not sorted anymore.How is this possib...
We have further analyzed the visualization problem and found two solutions.The original visualization consists of 1 series and has aggregation enabled in the UI (but is unused, since the query itself aggregates already).We found that the following tw...
It is impossible for me create a community account. I put my data on web and in the next step, when the website show me the 3 type of data ( google, amazn etc) and I click on the “ "Get started with community account" the web show me this I have try...
Hi @david vazquez,It seems like the website was down due to maintenance. You can check the status page next time to check why the website is down https://status.databricks.com/