cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dbx-user7354
by New Contributor III
  • 2593 Views
  • 3 replies
  • 4 kudos

Create a Job via SKD with JobSettings Object

Hey, I want to create a Job via the Python SDK with a JobSettings object.import os import time from databricks.sdk import WorkspaceClient from databricks.sdk.service import jobs from databricks.sdk.service.jobs import JobSettings w = WorkspaceClien...

  • 2593 Views
  • 3 replies
  • 4 kudos
Latest Reply
nenetto
New Contributor II
  • 4 kudos

I just faced the same problem. The issue is that the when you do JobSettings.as_dict()the settings are parsed to a dict where all the values are also parsed recursively. When you pass the parameters as **params, the create method again tries to parse...

  • 4 kudos
2 More Replies
noname123
by New Contributor III
  • 4336 Views
  • 1 replies
  • 0 kudos

Resolved! Delta table version protocol

I do:df.write.format("delta").mode("append").partitionBy("timestamp").option("mergeSchema", "true").save(destination)If table doesn't exist, it creates new table with "minReaderVersion":3,"minWriterVersion":7.Yesterday it was creating table with "min...

  • 4336 Views
  • 1 replies
  • 0 kudos
Latest Reply
noname123
New Contributor III
  • 0 kudos

Thanks for help.Issue was caused by "Auto-Enable Deletion Vectors" setting. 

  • 0 kudos
nihar_ghude
by New Contributor II
  • 3303 Views
  • 1 replies
  • 0 kudos

OSError: [Errno 107] Transport endpoint is not connected

Hi,I am facing this error when performing write operation in foreach() on a dataframe. The piece of code was working fine for over 3 months but started failing since last week.To give some context, I have a dataframe extract_df which contains 2 colum...

nihar_ghude_0-1710175215407.png
Data Engineering
ADLS
azure
python
spark
  • 3303 Views
  • 1 replies
  • 0 kudos
GOW
by New Contributor II
  • 1626 Views
  • 1 replies
  • 0 kudos

Databricks to s3

I am new to data engineering in Databricks. I need some guidance surrounding Databricks to s3. Can I get an example job or approach to do this?

  • 1626 Views
  • 1 replies
  • 0 kudos
Latest Reply
GOW
New Contributor II
  • 0 kudos

Thank you for the reply. Can I apply this to dbt or using a dbt macro to unload the data? So dbt models running in Databricks?

  • 0 kudos
dasiekr
by New Contributor II
  • 1568 Views
  • 3 replies
  • 0 kudos

Merge operation replaces most of the underlying parquets

Hello,I have the following situation which I would like to fully understand.I have the delta table that consists of 10k active parquet files. Everyday I run merge operation based on new deliveries and joining by product_id key attribute. I checked me...

  • 1568 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @dasiekr , Please refer to the below content that might help you -MERGE: Under the hoodDelta Lake completes a MERGE in two steps.Perform an inner join between the target table and source table to select all files that have matches.Perform an outer...

  • 0 kudos
2 More Replies
Gray
by Contributor
  • 32277 Views
  • 24 replies
  • 18 kudos

Resolved! Errors Using Selenium/Chromedriver in DataBricks

Hello,I’m programming in a notebook and attempting to use the python library Selenium to automate Chrome/chromedriver. I’ve successfully managed to install selenium using%sh  pip install seleniumI then attempt the following code, which results in the...

  • 32277 Views
  • 24 replies
  • 18 kudos
Latest Reply
aa_204
New Contributor II
  • 18 kudos

I also tried the script and am getting similar error. Can anyone please give some resolution for it?Error in Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/s/systemd/udev_245.4-4ubuntu3.18_amd64.deb and Unable to fetch some archives

  • 18 kudos
23 More Replies
William_Scardua
by Valued Contributor
  • 3216 Views
  • 3 replies
  • 1 kudos

Magic Pip Install Error

Hi guys,I receive that erro when try to use pip install, have any idea ?CalledProcessError Traceback (most recent call last) <command-3492276838775365> in <module> ----> 1 get_ipython().run_line_magic('pip', 'install /dbfs/File...

  • 3216 Views
  • 3 replies
  • 1 kudos
Latest Reply
Bartosz
New Contributor II
  • 1 kudos

Hi @William_Scardua !I changed the cluster runtime to 10.4 LTS and the error disappeared. Just letting you know, maybe it will help you too!Cheers!

  • 1 kudos
2 More Replies
Brad
by Contributor II
  • 1309 Views
  • 1 replies
  • 1 kudos

Colon sign operator for JSON

Hi,I have a streaming source loading data to a raw table, which has a string type col (whose value is JSON) to hold all data. I want to use colon sign operator to get fields from the JSON string. Is this going to have some perf issues vs. I use a sch...

  • 1309 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brad
Contributor II
  • 1 kudos

Thanks Kaniz.Yes, I did some testing. With some schema, I read the same data source and write the parsing results to diff tables. For 586K rows, the perf diff is 9sec vs. 37sec. For 2.3 million rows, 16sec vs. 133sec. 

  • 1 kudos
vemash
by New Contributor
  • 1697 Views
  • 1 replies
  • 0 kudos

How to create a docker image to deploy and run in different environments in databricks?

I am new to databricks, and trying to implement below task.Task:Once code merges to main branch and build is successful  CI pipeline and all tests are passed, docker build should start and create a docker image and push to different environments (fro...

  • 1697 Views
  • 1 replies
  • 0 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 0 kudos

Hi,This is no different for building docker image for various environmentsLet us try a simple high level CI/CD pipeline for building Docker images and deploying them to different environments:. It works in all environments including Databricks     ...

  • 0 kudos
Stellar
by New Contributor II
  • 1597 Views
  • 0 replies
  • 0 kudos

DLT DatePlane Error

Hi everyone,I am trying to build the pipeline but when I run it I receive an errorDataPlaneException: Failed to start the DLT service on the cluster. Please check the driver logs for more details or contact Databricks support.This is from the driver ...

  • 1597 Views
  • 0 replies
  • 0 kudos
Surya0
by New Contributor III
  • 4570 Views
  • 3 replies
  • 0 kudos

Resolved! Unit hive-metastore.service not found

Hi Everyone,I've encountered an issue while trying to make use of the hive-metastore capability in Databricks to create a new database and table for our latest use case. The specific command I used was "create database if not exists newDB". However, ...

  • 4570 Views
  • 3 replies
  • 0 kudos
Latest Reply
rakeshprasad1
New Contributor III
  • 0 kudos

@Surya0 : i am facing same issue. stack trace is  Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-2.mysql.database.azure.com)(port=3306)(type=master) : Socket fail to connect to host:consolidated-northeuropec2-prod-metast...

  • 0 kudos
2 More Replies
alexgv12
by New Contributor III
  • 1276 Views
  • 1 replies
  • 0 kudos

how to deploy sql functions in pool

we have some function definitions which we have to have available for our bi tools e.g.  CREATE FUNCTION CREATEDATE(year INT, month INT, day INT) RETURNS DATE RETURN make_date(year, month, day); how can we always have this function definition in our ...

  • 1276 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexgv12
New Contributor III
  • 0 kudos

looking at some alternatives with other databricks components, I think that a CI/CD process should be created where the view can be created through the databricks api. https://docs.databricks.com/api/workspace/functions/createhttps://community.databr...

  • 0 kudos
dbal
by New Contributor III
  • 3989 Views
  • 2 replies
  • 0 kudos

Resolved! Spark job task fails with "java.lang.NoClassDefFoundError: org/apache/spark/SparkContext$"

Hi.I am trying to run a Spark Job in Databricks (Azure) using the JAR type.I can't figure out why the job fails to run by not finding the SparkContext.Databricks Runtime: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)Error message: java.lang.NoCl...

  • 3989 Views
  • 2 replies
  • 0 kudos
Latest Reply
dbal
New Contributor III
  • 0 kudos

Update 2: I found the reason in the documentation. This is documented under "Access Mode", and it is a limitation of the Shared access mode.Link: https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations#spark-api-limitations...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels