cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

GOW
by New Contributor II
  • 1841 Views
  • 1 replies
  • 0 kudos

Databricks to s3

I am new to data engineering in Databricks. I need some guidance surrounding Databricks to s3. Can I get an example job or approach to do this?

  • 1841 Views
  • 1 replies
  • 0 kudos
Latest Reply
GOW
New Contributor II
  • 0 kudos

Thank you for the reply. Can I apply this to dbt or using a dbt macro to unload the data? So dbt models running in Databricks?

  • 0 kudos
dasiekr
by New Contributor II
  • 2062 Views
  • 3 replies
  • 0 kudos

Merge operation replaces most of the underlying parquets

Hello,I have the following situation which I would like to fully understand.I have the delta table that consists of 10k active parquet files. Everyday I run merge operation based on new deliveries and joining by product_id key attribute. I checked me...

  • 2062 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @dasiekr , Please refer to the below content that might help you -MERGE: Under the hoodDelta Lake completes a MERGE in two steps.Perform an inner join between the target table and source table to select all files that have matches.Perform an outer...

  • 0 kudos
2 More Replies
Gray
by Contributor
  • 37812 Views
  • 24 replies
  • 18 kudos

Resolved! Errors Using Selenium/Chromedriver in DataBricks

Hello,I’m programming in a notebook and attempting to use the python library Selenium to automate Chrome/chromedriver. I’ve successfully managed to install selenium using%sh  pip install seleniumI then attempt the following code, which results in the...

  • 37812 Views
  • 24 replies
  • 18 kudos
Latest Reply
aa_204
New Contributor II
  • 18 kudos

I also tried the script and am getting similar error. Can anyone please give some resolution for it?Error in Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/s/systemd/udev_245.4-4ubuntu3.18_amd64.deb and Unable to fetch some archives

  • 18 kudos
23 More Replies
William_Scardua
by Valued Contributor
  • 3761 Views
  • 3 replies
  • 1 kudos

Magic Pip Install Error

Hi guys,I receive that erro when try to use pip install, have any idea ?CalledProcessError Traceback (most recent call last) <command-3492276838775365> in <module> ----> 1 get_ipython().run_line_magic('pip', 'install /dbfs/File...

  • 3761 Views
  • 3 replies
  • 1 kudos
Latest Reply
Bartosz
New Contributor II
  • 1 kudos

Hi @William_Scardua !I changed the cluster runtime to 10.4 LTS and the error disappeared. Just letting you know, maybe it will help you too!Cheers!

  • 1 kudos
2 More Replies
Brad
by Contributor II
  • 1466 Views
  • 1 replies
  • 1 kudos

Colon sign operator for JSON

Hi,I have a streaming source loading data to a raw table, which has a string type col (whose value is JSON) to hold all data. I want to use colon sign operator to get fields from the JSON string. Is this going to have some perf issues vs. I use a sch...

  • 1466 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brad
Contributor II
  • 1 kudos

Thanks Kaniz.Yes, I did some testing. With some schema, I read the same data source and write the parsing results to diff tables. For 586K rows, the perf diff is 9sec vs. 37sec. For 2.3 million rows, 16sec vs. 133sec. 

  • 1 kudos
vemash
by New Contributor
  • 2044 Views
  • 1 replies
  • 0 kudos

How to create a docker image to deploy and run in different environments in databricks?

I am new to databricks, and trying to implement below task.Task:Once code merges to main branch and build is successful  CI pipeline and all tests are passed, docker build should start and create a docker image and push to different environments (fro...

  • 2044 Views
  • 1 replies
  • 0 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 0 kudos

Hi,This is no different for building docker image for various environmentsLet us try a simple high level CI/CD pipeline for building Docker images and deploying them to different environments:. It works in all environments including Databricks     ...

  • 0 kudos
Stellar
by New Contributor II
  • 1732 Views
  • 0 replies
  • 0 kudos

DLT DatePlane Error

Hi everyone,I am trying to build the pipeline but when I run it I receive an errorDataPlaneException: Failed to start the DLT service on the cluster. Please check the driver logs for more details or contact Databricks support.This is from the driver ...

  • 1732 Views
  • 0 replies
  • 0 kudos
Surya0
by New Contributor III
  • 5242 Views
  • 3 replies
  • 0 kudos

Resolved! Unit hive-metastore.service not found

Hi Everyone,I've encountered an issue while trying to make use of the hive-metastore capability in Databricks to create a new database and table for our latest use case. The specific command I used was "create database if not exists newDB". However, ...

  • 5242 Views
  • 3 replies
  • 0 kudos
Latest Reply
rakeshprasad1
New Contributor III
  • 0 kudos

@Surya0 : i am facing same issue. stack trace is  Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-2.mysql.database.azure.com)(port=3306)(type=master) : Socket fail to connect to host:consolidated-northeuropec2-prod-metast...

  • 0 kudos
2 More Replies
alexgv12
by New Contributor III
  • 1370 Views
  • 1 replies
  • 0 kudos

how to deploy sql functions in pool

we have some function definitions which we have to have available for our bi tools e.g.  CREATE FUNCTION CREATEDATE(year INT, month INT, day INT) RETURNS DATE RETURN make_date(year, month, day); how can we always have this function definition in our ...

  • 1370 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexgv12
New Contributor III
  • 0 kudos

looking at some alternatives with other databricks components, I think that a CI/CD process should be created where the view can be created through the databricks api. https://docs.databricks.com/api/workspace/functions/createhttps://community.databr...

  • 0 kudos
dbal
by New Contributor III
  • 5080 Views
  • 2 replies
  • 0 kudos

Resolved! Spark job task fails with "java.lang.NoClassDefFoundError: org/apache/spark/SparkContext$"

Hi.I am trying to run a Spark Job in Databricks (Azure) using the JAR type.I can't figure out why the job fails to run by not finding the SparkContext.Databricks Runtime: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)Error message: java.lang.NoCl...

  • 5080 Views
  • 2 replies
  • 0 kudos
Latest Reply
dbal
New Contributor III
  • 0 kudos

Update 2: I found the reason in the documentation. This is documented under "Access Mode", and it is a limitation of the Shared access mode.Link: https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations#spark-api-limitations...

  • 0 kudos
1 More Replies
Tam
by New Contributor III
  • 1715 Views
  • 1 replies
  • 0 kudos

TABLE_REDIRECTION_ERROR in AWS Athena After Databricks Upgrade to 14.3 LTS

I have a Databricks pipeline set up to create Delta tables on AWS S3, using Glue Catalog as the Metastore. I was able to query the Delta table via Athena successfully. However, after upgrading Databricks Cluster from 13.3 LTS to 14.3 LTS, I began enc...

Tam_1-1707445843989.png
  • 1715 Views
  • 1 replies
  • 0 kudos
Coders
by New Contributor II
  • 2583 Views
  • 1 replies
  • 0 kudos

How to do perform deep clone for data migration from one Datalake to another?

 I'm attempting to migrate data from Azure Data Lake to S3 using deep clone. The data in the source Data Lake is stored in Parquet format and partitioned. I've tried to follow the documentation from Databricks, which suggests that I need to register ...

  • 2583 Views
  • 1 replies
  • 0 kudos
chakradhar545
by New Contributor
  • 979 Views
  • 0 replies
  • 0 kudos

DatabricksThrottledException Error

Hi,Our scheduled job runs into below error once in a while and job fails. Any leads or thoughts please why we run into this once in a while and how to fix it pleaseshaded.databricks.org.apache.hadoop.fs.s3a.DatabricksThrottledException: Instantiate s...

  • 979 Views
  • 0 replies
  • 0 kudos
Poonam17
by New Contributor II
  • 1154 Views
  • 1 replies
  • 2 kudos

Not able to deploy cluster in databricks community edition

 Hello team, I am not able to launch databricks cluster in community edition. automatically its getting terminated. Can someone please help here ? Regards.,poonam

IMG_6296.jpeg
  • 1154 Views
  • 1 replies
  • 2 kudos
Latest Reply
kakalouk
New Contributor II
  • 2 kudos

I face the exact same problem. The message i get is this:"Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-062042a9d4be8725e @ 10.172.197.194. Please check network connectivity between the data plane and the control plane."

  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels