cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lilo_z
by New Contributor III
  • 4121 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Asset Bundles - job specific "run_as" user/service_principle

Was wondering if this was possible, since a use case came up in my team. Would it be possible to use a different service principle for a single job than what is specified for that target environment? For example:bundle: name: hello-bundle resource...

  • 4121 Views
  • 2 replies
  • 0 kudos
Latest Reply
lilo_z
New Contributor III
  • 0 kudos

Found a working solution, posting it here for anyone else hitting the same issue - trick was to redefine "resources" under the target you want to make an exception for:bundle: name: hello_bundle include: - resources/*.yml targets: dev: w...

  • 0 kudos
1 More Replies
dbx-user7354
by New Contributor III
  • 3771 Views
  • 3 replies
  • 4 kudos

Create a Job via SKD with JobSettings Object

Hey, I want to create a Job via the Python SDK with a JobSettings object.import os import time from databricks.sdk import WorkspaceClient from databricks.sdk.service import jobs from databricks.sdk.service.jobs import JobSettings w = WorkspaceClien...

  • 3771 Views
  • 3 replies
  • 4 kudos
Latest Reply
nenetto
New Contributor II
  • 4 kudos

I just faced the same problem. The issue is that the when you do JobSettings.as_dict()the settings are parsed to a dict where all the values are also parsed recursively. When you pass the parameters as **params, the create method again tries to parse...

  • 4 kudos
2 More Replies
nihar_ghude
by New Contributor II
  • 4718 Views
  • 1 replies
  • 0 kudos

OSError: [Errno 107] Transport endpoint is not connected

Hi,I am facing this error when performing write operation in foreach() on a dataframe. The piece of code was working fine for over 3 months but started failing since last week.To give some context, I have a dataframe extract_df which contains 2 colum...

nihar_ghude_0-1710175215407.png
Data Engineering
ADLS
azure
python
spark
  • 4718 Views
  • 1 replies
  • 0 kudos
GOW
by New Contributor II
  • 2196 Views
  • 1 replies
  • 0 kudos

Databricks to s3

I am new to data engineering in Databricks. I need some guidance surrounding Databricks to s3. Can I get an example job or approach to do this?

  • 2196 Views
  • 1 replies
  • 0 kudos
Latest Reply
GOW
New Contributor II
  • 0 kudos

Thank you for the reply. Can I apply this to dbt or using a dbt macro to unload the data? So dbt models running in Databricks?

  • 0 kudos
dasiekr
by New Contributor II
  • 2850 Views
  • 3 replies
  • 0 kudos

Merge operation replaces most of the underlying parquets

Hello,I have the following situation which I would like to fully understand.I have the delta table that consists of 10k active parquet files. Everyday I run merge operation based on new deliveries and joining by product_id key attribute. I checked me...

  • 2850 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @dasiekr , Please refer to the below content that might help you -MERGE: Under the hoodDelta Lake completes a MERGE in two steps.Perform an inner join between the target table and source table to select all files that have matches.Perform an outer...

  • 0 kudos
2 More Replies
Gray
by Contributor
  • 45876 Views
  • 24 replies
  • 18 kudos

Resolved! Errors Using Selenium/Chromedriver in DataBricks

Hello,I’m programming in a notebook and attempting to use the python library Selenium to automate Chrome/chromedriver. I’ve successfully managed to install selenium using%sh  pip install seleniumI then attempt the following code, which results in the...

  • 45876 Views
  • 24 replies
  • 18 kudos
Latest Reply
aa_204
New Contributor II
  • 18 kudos

I also tried the script and am getting similar error. Can anyone please give some resolution for it?Error in Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/s/systemd/udev_245.4-4ubuntu3.18_amd64.deb and Unable to fetch some archives

  • 18 kudos
23 More Replies
William_Scardua
by Valued Contributor
  • 4788 Views
  • 3 replies
  • 1 kudos

Magic Pip Install Error

Hi guys,I receive that erro when try to use pip install, have any idea ?CalledProcessError Traceback (most recent call last) <command-3492276838775365> in <module> ----> 1 get_ipython().run_line_magic('pip', 'install /dbfs/File...

  • 4788 Views
  • 3 replies
  • 1 kudos
Latest Reply
Bartosz
New Contributor II
  • 1 kudos

Hi @William_Scardua !I changed the cluster runtime to 10.4 LTS and the error disappeared. Just letting you know, maybe it will help you too!Cheers!

  • 1 kudos
2 More Replies
Brad
by Contributor II
  • 1686 Views
  • 1 replies
  • 1 kudos

Colon sign operator for JSON

Hi,I have a streaming source loading data to a raw table, which has a string type col (whose value is JSON) to hold all data. I want to use colon sign operator to get fields from the JSON string. Is this going to have some perf issues vs. I use a sch...

  • 1686 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brad
Contributor II
  • 1 kudos

Thanks Kaniz.Yes, I did some testing. With some schema, I read the same data source and write the parsing results to diff tables. For 586K rows, the perf diff is 9sec vs. 37sec. For 2.3 million rows, 16sec vs. 133sec. 

  • 1 kudos
vemash
by New Contributor
  • 2569 Views
  • 1 replies
  • 0 kudos

How to create a docker image to deploy and run in different environments in databricks?

I am new to databricks, and trying to implement below task.Task:Once code merges to main branch and build is successful  CI pipeline and all tests are passed, docker build should start and create a docker image and push to different environments (fro...

  • 2569 Views
  • 1 replies
  • 0 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 0 kudos

Hi,This is no different for building docker image for various environmentsLet us try a simple high level CI/CD pipeline for building Docker images and deploying them to different environments:. It works in all environments including Databricks     ...

  • 0 kudos
Stellar
by New Contributor II
  • 1942 Views
  • 0 replies
  • 0 kudos

DLT DatePlane Error

Hi everyone,I am trying to build the pipeline but when I run it I receive an errorDataPlaneException: Failed to start the DLT service on the cluster. Please check the driver logs for more details or contact Databricks support.This is from the driver ...

  • 1942 Views
  • 0 replies
  • 0 kudos
Surya0
by New Contributor III
  • 6252 Views
  • 3 replies
  • 0 kudos

Resolved! Unit hive-metastore.service not found

Hi Everyone,I've encountered an issue while trying to make use of the hive-metastore capability in Databricks to create a new database and table for our latest use case. The specific command I used was "create database if not exists newDB". However, ...

  • 6252 Views
  • 3 replies
  • 0 kudos
Latest Reply
rakeshprasad1
New Contributor III
  • 0 kudos

@Surya0 : i am facing same issue. stack trace is  Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-2.mysql.database.azure.com)(port=3306)(type=master) : Socket fail to connect to host:consolidated-northeuropec2-prod-metast...

  • 0 kudos
2 More Replies
alexgv12
by New Contributor III
  • 1606 Views
  • 1 replies
  • 0 kudos

how to deploy sql functions in pool

we have some function definitions which we have to have available for our bi tools e.g.  CREATE FUNCTION CREATEDATE(year INT, month INT, day INT) RETURNS DATE RETURN make_date(year, month, day); how can we always have this function definition in our ...

  • 1606 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexgv12
New Contributor III
  • 0 kudos

looking at some alternatives with other databricks components, I think that a CI/CD process should be created where the view can be created through the databricks api. https://docs.databricks.com/api/workspace/functions/createhttps://community.databr...

  • 0 kudos
dbal
by New Contributor III
  • 6664 Views
  • 2 replies
  • 0 kudos

Resolved! Spark job task fails with "java.lang.NoClassDefFoundError: org/apache/spark/SparkContext$"

Hi.I am trying to run a Spark Job in Databricks (Azure) using the JAR type.I can't figure out why the job fails to run by not finding the SparkContext.Databricks Runtime: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)Error message: java.lang.NoCl...

  • 6664 Views
  • 2 replies
  • 0 kudos
Latest Reply
dbal
New Contributor III
  • 0 kudos

Update 2: I found the reason in the documentation. This is documented under "Access Mode", and it is a limitation of the Shared access mode.Link: https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations#spark-api-limitations...

  • 0 kudos
1 More Replies
Tam
by New Contributor III
  • 1937 Views
  • 1 replies
  • 0 kudos

TABLE_REDIRECTION_ERROR in AWS Athena After Databricks Upgrade to 14.3 LTS

I have a Databricks pipeline set up to create Delta tables on AWS S3, using Glue Catalog as the Metastore. I was able to query the Delta table via Athena successfully. However, after upgrading Databricks Cluster from 13.3 LTS to 14.3 LTS, I began enc...

Tam_1-1707445843989.png
  • 1937 Views
  • 1 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels