cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

J15S
by New Contributor III
  • 1758 Views
  • 4 replies
  • 4 kudos

RStudio on Databricks user experience

Is anybody actually using the RStudio app integration on Databricks? I'm surprised to find so little discussion in this forum. My team has been using it for about 3 months and it seems under-developed.1) No automated backup, you have to do it yoursel...

  • 1758 Views
  • 4 replies
  • 4 kudos
Latest Reply
J15S
New Contributor III
  • 4 kudos

@Jonathan Dufault​ Thanks for the response, and glad I'm not alone. My problem (and this is probably just a preference thing) is that the 'reward' of using a full-fledged IDE is huge, compared to bouncing between notebooks in multiple tabs. The integ...

  • 4 kudos
3 More Replies
Prototype998
by New Contributor III
  • 1240 Views
  • 0 replies
  • 0 kudos

Singleton Design Principle for pyspark database connector A singleton is a design pattern that ensures that a class has only one instance, and provide...

Singleton Design Principle for pyspark database connectorA singleton is a design pattern that ensures that a class has only one instance, and provides a global access point to that instance. Here is an example of how you could implement a singleton d...

  • 1240 Views
  • 0 replies
  • 0 kudos
Jfoxyyc
by Valued Contributor
  • 1968 Views
  • 2 replies
  • 2 kudos

How to use partial_parse.msgpack with workflow dbt task?

I'm looking for direction on how to get the dbt task in workflows to use the partial_parse.msgpack file to skip parsing files that haven't changed. I'm downloading my artifacts after each run and the partial_parse file is being saved back to adls.Wha...

  • 1968 Views
  • 2 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, Could you please confirm what will be your expectation and the used case? Do you want the file to be saved somewhere else?

  • 2 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 3243 Views
  • 4 replies
  • 6 kudos

Resolved! Connecting azure synapse through data bricks note books

Hi All, Happy new year!how can we connect to azure synapse serverless sql pool through databricks notebooks and execute DDLs

  • 3243 Views
  • 4 replies
  • 6 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 6 kudos

@KVNARK .​ https://joeho.xyz/blog-posts/how-to-connect-to-azure-synapse-in-azure-databricks/

  • 6 kudos
3 More Replies
APol
by New Contributor II
  • 2655 Views
  • 2 replies
  • 2 kudos

Read/Write concurrency issue

Hi. I assume that it can be concurrency issue. (a Read thread from Databricks and a Write thread from another system)From the start:I read 12-16 csv files (approximately 250Mb each of them) to dataframe. df = spark.read.option("header", "False").opti...

  • 2655 Views
  • 2 replies
  • 2 kudos
Latest Reply
FerArribas
Contributor
  • 2 kudos

Hi @Anastasiia Polianska​,I agree, it looks like a concurrency issue. Very possibly this concurrency problem will be caused by an erroneous ETAG in the HTTP call to the Azure Storage API (https://azure.microsoft.com/de-de/blog/managing-concurrency-in...

  • 2 kudos
1 More Replies
maddy_081063
by New Contributor II
  • 4168 Views
  • 2 replies
  • 4 kudos
  • 4168 Views
  • 2 replies
  • 4 kudos
Latest Reply
FerArribas
Contributor
  • 4 kudos

Hi @maddy v​ ,I recommend that you use the Databricks SQL module for this type of reports and email alerts. It is a very interesting module with multiple options for your use case.https://learn.microsoft.com/en-us/azure/databricks/sql/user/dashboards...

  • 4 kudos
1 More Replies
pvm26042000
by New Contributor III
  • 952 Views
  • 1 replies
  • 3 kudos

Spark SQL & Spark ML

I am using Spark SQL to import their data into a machine learning pipeline. Once data is imported I want performs machine learning tasks using Spark ML. So I should use what compute tools is best suited for this use case? Please help me!!! Thank you ...

  • 952 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi, please refer https://docs.databricks.com/machine-learning/index.html, please let us know if this helps.

  • 3 kudos
pvm26042000
by New Contributor III
  • 926 Views
  • 1 replies
  • 2 kudos

I am using Spark SQL to import their data into a machine learning pipeline. Once data is imported I want performs machine learning tasks using Spark...

I am using Spark SQL to import their data into a machine learning pipeline. Once data is imported I want performs machine learning tasks using Spark ML. So I should use what compute tools is best suited for this use case? Please help me!!! Thank y...

  • 926 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, please refer https://docs.databricks.com/machine-learning/index.html, please let us know if this helps.

  • 2 kudos
vishallakha
by New Contributor II
  • 1166 Views
  • 1 replies
  • 2 kudos

How to Enable Files in Repos in DBR 7.3 LTS ML ?

we need a custom version of a GPU cluster with following requirements for a certain project:Ubuntu 18.04Cuda 10.1.Tesla T4 GPU.Availability of /Workspace/Repos folder.All of these requirements are available with DBR ML 7.3 LTS. But one critical compo...

  • 1166 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, To work with non-notebook files in Databricks Repos, you must be running Databricks Runtime 8.4 or above.https://docs.databricks.com/files/workspace.html#configure-support-for-workspace-files

  • 2 kudos
Azure_databric1
by New Contributor II
  • 1839 Views
  • 1 replies
  • 2 kudos

How to find the road distance between two cities? We can use Azure databricks and azure map.

We will be given an excel file, in which we can see column sender_city and destination_city. We have to find the distance between these two cities and the distance calculated should be written in a column total_distance. All these processes should be...

  • 1839 Views
  • 1 replies
  • 2 kudos
Latest Reply
sher
Valued Contributor II
  • 2 kudos

heywithout using latitude and longitude it is hard to find out but you can try some distance-based algorithm

  • 2 kudos
Benji0934
by New Contributor II
  • 2171 Views
  • 2 replies
  • 3 kudos

Auto Loader: Empty fields (discovery_time, commit_time, archive_time) in cloud_files_state

Hi! Why are the fields discovery_time, commit_time, and archive_time NULL in cloud_files_state? Do I need to configure anything when creating my Auto Loader? df = spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "json") \ ...

  • 2171 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Please be sure that the DBR version is 10.5 or highercommit_time and archive_time can be null but discovery_time is set even as NOT NULL in the table definition so it is a bit strange. Please change the DBR version first.

  • 3 kudos
1 More Replies
Juhani
by New Contributor II
  • 2449 Views
  • 3 replies
  • 4 kudos

Resolved! Bug in Delta Live Tables when missing files option?

When using Delta Live Tables with SQL-syntax ignoreMissingFiles-option is not working and you are getting error anyway.(See picture below)Link to feature: https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/options#generic-option...

image
  • 2449 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You could also use inferSchema. ignoreMissingFiles option is to handle files that were accidentally deleted before being fully processed, so it has nothing related to the schema.

  • 4 kudos
2 More Replies
karthik_elavan
by New Contributor II
  • 4128 Views
  • 3 replies
  • 2 kudos

Azure DataBricks New Job Cluster Libraries Installation Issues

Dear Team,We are trying to install runtime Libraries from Azure Data factory to Azure DataBricks and linkedservice, we are using New Job Cluster to spin the notebooks to execute the python code. we are using the third party Libraries which is prophet...

  • 4128 Views
  • 3 replies
  • 2 kudos
Latest Reply
ramravi
Contributor II
  • 2 kudos

init scripts are great way to handle the dependent libraries installation in cluster.https://stackoverflow.com/questions/62516102/install-python-packages-using-init-scripts-in-a-databricks-cluster

  • 2 kudos
2 More Replies
Raghu101
by New Contributor III
  • 2748 Views
  • 3 replies
  • 4 kudos

How to execute Windows commands (.cmd file) from Databricks?

How to execute Windows commands (.cmd file) from Databricks?

  • 2748 Views
  • 3 replies
  • 4 kudos
Latest Reply
ramravi
Contributor II
  • 4 kudos

Databricks runs on linux servers. you can launch windows cmd or windows shell commands in databricks. you can run only linux commands.

  • 4 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels