cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

rt-slowth
by Contributor
  • 864 Views
  • 1 replies
  • 0 kudos

How to write test code in databricks

    from databricks.connect import DatabricksSession from data.dbx_conn_info import DbxConnInfo class SparkSessionManager: _instance = None _spark = None def __new__(cls): if cls._instance is None: cls._instance = s...

  • 864 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @rt-slowth ,  The databricks-connect module is not included by default in Python installations, so you must ensure it is installed before your code can import it. You can install databricks-connect using pip. Please run the following command in y...

  • 0 kudos
User16789201666
by Contributor II
  • 6701 Views
  • 3 replies
  • 4 kudos
  • 6701 Views
  • 3 replies
  • 4 kudos
Latest Reply
arun_pamulapati
New Contributor III
  • 4 kudos

Use Lakehouse Monitoring:  https://docs.databricks.com/en/lakehouse-monitoring/index.html Specifically:  https://docs.databricks.com/en/lakehouse-monitoring/monitor-output.html#drift-metrics-table

  • 4 kudos
2 More Replies
MarcinO
by New Contributor II
  • 3708 Views
  • 3 replies
  • 3 kudos

InputWidgetNotDefined exception when running a notebook as a job

I have a notebook that reads a value of a text input in a Scala command:var startTimeStr = dbutils.widgets.get("Run Date")What doesn't make any sense that this notebook fails with InputWidgetNotDefined error when being scheduled as a job, but works j...

  • 3708 Views
  • 3 replies
  • 3 kudos
Latest Reply
berserkersap
Contributor
  • 3 kudos

Have you used dbutils.widget.text() before dbutils.widget.get() ?

  • 3 kudos
2 More Replies
samye760
by New Contributor
  • 1036 Views
  • 0 replies
  • 0 kudos

Job Retry Wait Policy and Cluster Shutdown

Hi all,I have a Databricks Workflow job in which the final task makes an external API call. Sometimes this API will be overloaded and the call will fail. In the spirit of automation, I want this task to retry the call an hour later if it fails in the...

Data Engineering
clusters
jobs
retries
Workflows
  • 1036 Views
  • 0 replies
  • 0 kudos
js54123875
by New Contributor III
  • 8578 Views
  • 5 replies
  • 1 kudos

Connection to Azure SQL Server: ODBC Driver 18 for SQL Server

Task: Setup connection to Azure SQL Server.A couple things have changed...*We've started using Unity Catalog, so need Unity Catalog -enabled clusters*Legacy init scripts have been deprecated, and this is how we had our pyodbc setup, etc. defined.Code...

  • 8578 Views
  • 5 replies
  • 1 kudos
Latest Reply
diego_poggioli
Contributor
  • 1 kudos

Hi @js54123875 did you manage to find a solution for this? I'm facing a similar problem.ThanksDiego

  • 1 kudos
4 More Replies
RobiTakToRobi
by New Contributor II
  • 1373 Views
  • 1 replies
  • 1 kudos

How to allow non-ASCII characters to be stored in the view definition?

I've tried to create a view with a simple conditional statement containing Polish characters. The view is created without errors, but select on the view returns question marks in place of the non-ASCII characters. Why? How to fix it?Below on screens ...

example view_text
  • 1373 Views
  • 1 replies
  • 1 kudos
Latest Reply
andreas7891
New Contributor II
  • 1 kudos

Any solutions on this?We have the same problem with Greek characters.

  • 1 kudos
YS1
by Contributor
  • 2252 Views
  • 3 replies
  • 3 kudos

Live dashboard

Hello,I have a streaming dataset -I used delta live tables-, and I want to create a live dashboard that shows the changes instantly without the need to query the table every specific time -without the need to refresh-, What would be the best solution...

  • 2252 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @YS1 , Databricks Delta Lake and Delta Live Tables can be used to create a live dashboard for visualizing changes in streaming data.- Delta Live Tables are designed for growing datasets and handle each row only once.- Delta Lake can be used as a s...

  • 3 kudos
2 More Replies
User16826994223
by Honored Contributor III
  • 4525 Views
  • 3 replies
  • 2 kudos

How to attach multiple libraries to a cluster terraform in Databricks

I'm currently trying to attach more than one maven artifact to my terraform configuration of a cluster.How can we add more than one artifact in my terraform configuration ?

  • 4525 Views
  • 3 replies
  • 2 kudos
Latest Reply
Simranarora
New Contributor III
  • 2 kudos

Hi @KunalGaurav,This can be done by using a dynamic configuration block inside your databricks_cluster resource definition.In variable.tf make a library block as:-variable "listOfMavenPackages" { type = list(string) default = [ "com.google.gua...

  • 2 kudos
2 More Replies
Igor_100
by New Contributor
  • 4976 Views
  • 2 replies
  • 0 kudos

WORKSPACE IP RANGE

Hello, everybody! I need to know whats the IP range of my azure databricks workspace. My region is East US. Can anyone help me?

  • 4976 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

this is located in the azure portal (I hope you have access to it), in your Databricks Workspace settings.there you have 'virtual network' and 'private subnet name'.If you click on these, you get the address range (in CIDR notation, you can do a web ...

  • 0 kudos
1 More Replies
Sambit_S
by New Contributor III
  • 976 Views
  • 3 replies
  • 0 kudos

Failed_to_convert_the_JSON_string_'interval_day_to_second'_to_a_data_type

I am trying to access a delta share table which has a field of datatype interval day to second below.Sample data in the table:- The above table while accessing through delta share giving error as below.Any help in resolving this issue will be appreci...

Sambit_S_0-1696512262555.png Sambit_S_1-1696512337390.png Sambit_S_2-1696512405641.png
  • 976 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Sambit_S , did you have time to check Kaniz's response? could you please verity it and let us know if you still need help

  • 0 kudos
2 More Replies
RantoB
by Valued Contributor
  • 19787 Views
  • 8 replies
  • 4 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

  • 19787 Views
  • 8 replies
  • 4 kudos
Latest Reply
MartinIsti
New Contributor III
  • 4 kudos

I know it's a 2 years old thread but I needed to find a solution to this very thing today. I had one notebook using SparkContextfrom pyspark import SparkFilesfrom pyspark.sql.functions import *sc.addFile(url) But according to the runtime 14 release n...

  • 4 kudos
7 More Replies
vlado101
by New Contributor II
  • 3065 Views
  • 1 replies
  • 1 kudos

Resolved! ANALYZE TABLE is not updating columns stats

Hello everyone,So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I ...

  • 3065 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Valued Contributor
  • 1 kudos

Hello @vlado101  The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as ...

  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1277 Views
  • 1 replies
  • 2 kudos

foreachBatch

With parameterized SQL queries in Structured Streaming's foreachBatch, there's no longer a need to create temp views for the MERGE command.

structured1.png
  • 1277 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Thank you for sharing the valuable information @Hubert-Dudek 

  • 2 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1298 Views
  • 1 replies
  • 1 kudos

Structured Streaming Aggregation

Utilizing structured streaming to read the change data feed from your Delta table empowers you to execute incremental streaming aggregations, such as counting and summing.

structured2.png
  • 1298 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Thank you for sharing @Hubert-Dudek !!!

  • 1 kudos
Kayla
by Valued Contributor
  • 3397 Views
  • 3 replies
  • 2 kudos

Resolved! Paramiko SFTP Get fails on databricks file system

I have an SFTP server I need to routinely download Excel files from and put into GCP cloud storage buckets.Every variation of the filepath to either my GCP path or just the dbfs in-built file system is giving an error of " [Errno 2] No such file or d...

  • 3397 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Thank you for sharing the solution. Many more users will find this information very useful. 

  • 2 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels