cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

rt-slowth
by Contributor
  • 894 Views
  • 0 replies
  • 0 kudos

How to write test code in databricks

    from databricks.connect import DatabricksSession from data.dbx_conn_info import DbxConnInfo class SparkSessionManager: _instance = None _spark = None def __new__(cls): if cls._instance is None: cls._instance = s...

  • 894 Views
  • 0 replies
  • 0 kudos
User16789201666
by Contributor II
  • 7239 Views
  • 3 replies
  • 4 kudos
  • 7239 Views
  • 3 replies
  • 4 kudos
Latest Reply
arun_pamulapati
Contributor
  • 4 kudos

Use Lakehouse Monitoring:  https://docs.databricks.com/en/lakehouse-monitoring/index.html Specifically:  https://docs.databricks.com/en/lakehouse-monitoring/monitor-output.html#drift-metrics-table

  • 4 kudos
2 More Replies
MarcinO
by New Contributor II
  • 4028 Views
  • 2 replies
  • 2 kudos

InputWidgetNotDefined exception when running a notebook as a job

I have a notebook that reads a value of a text input in a Scala command:var startTimeStr = dbutils.widgets.get("Run Date")What doesn't make any sense that this notebook fails with InputWidgetNotDefined error when being scheduled as a job, but works j...

  • 4028 Views
  • 2 replies
  • 2 kudos
Latest Reply
berserkersap
Contributor
  • 2 kudos

Have you used dbutils.widget.text() before dbutils.widget.get() ?

  • 2 kudos
1 More Replies
samye760
by New Contributor
  • 1203 Views
  • 0 replies
  • 0 kudos

Job Retry Wait Policy and Cluster Shutdown

Hi all,I have a Databricks Workflow job in which the final task makes an external API call. Sometimes this API will be overloaded and the call will fail. In the spirit of automation, I want this task to retry the call an hour later if it fails in the...

Data Engineering
clusters
jobs
retries
Workflows
  • 1203 Views
  • 0 replies
  • 0 kudos
js54123875
by New Contributor III
  • 9133 Views
  • 4 replies
  • 1 kudos

Connection to Azure SQL Server: ODBC Driver 18 for SQL Server

Task: Setup connection to Azure SQL Server.A couple things have changed...*We've started using Unity Catalog, so need Unity Catalog -enabled clusters*Legacy init scripts have been deprecated, and this is how we had our pyodbc setup, etc. defined.Code...

  • 9133 Views
  • 4 replies
  • 1 kudos
Latest Reply
diego_poggioli
Contributor
  • 1 kudos

Hi @js54123875 did you manage to find a solution for this? I'm facing a similar problem.ThanksDiego

  • 1 kudos
3 More Replies
RobiTakToRobi
by New Contributor II
  • 1469 Views
  • 1 replies
  • 1 kudos

How to allow non-ASCII characters to be stored in the view definition?

I've tried to create a view with a simple conditional statement containing Polish characters. The view is created without errors, but select on the view returns question marks in place of the non-ASCII characters. Why? How to fix it?Below on screens ...

example view_text
  • 1469 Views
  • 1 replies
  • 1 kudos
Latest Reply
andreas7891
New Contributor II
  • 1 kudos

Any solutions on this?We have the same problem with Greek characters.

  • 1 kudos
YS1
by Contributor
  • 2604 Views
  • 2 replies
  • 2 kudos

Live dashboard

Hello,I have a streaming dataset -I used delta live tables-, and I want to create a live dashboard that shows the changes instantly without the need to query the table every specific time -without the need to refresh-, What would be the best solution...

  • 2604 Views
  • 2 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
Hello,I have a streaming dataset -I used delta live tables-, and I want to create a live dashboard that shows the changes instantly without the need to query the table every specific time -without the need to refresh-, What would be the best solution...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
1 More Replies
User16826994223
by Honored Contributor III
  • 4726 Views
  • 3 replies
  • 2 kudos

How to attach multiple libraries to a cluster terraform in Databricks

I'm currently trying to attach more than one maven artifact to my terraform configuration of a cluster.How can we add more than one artifact in my terraform configuration ?

  • 4726 Views
  • 3 replies
  • 2 kudos
Latest Reply
Simranarora
New Contributor III
  • 2 kudos

Hi @KunalGaurav,This can be done by using a dynamic configuration block inside your databricks_cluster resource definition.In variable.tf make a library block as:-variable "listOfMavenPackages" { type = list(string) default = [ "com.google.gua...

  • 2 kudos
2 More Replies
Igor_100
by New Contributor
  • 5317 Views
  • 2 replies
  • 0 kudos

WORKSPACE IP RANGE

Hello, everybody! I need to know whats the IP range of my azure databricks workspace. My region is East US. Can anyone help me?

  • 5317 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

this is located in the azure portal (I hope you have access to it), in your Databricks Workspace settings.there you have 'virtual network' and 'private subnet name'.If you click on these, you get the address range (in CIDR notation, you can do a web ...

  • 0 kudos
1 More Replies
Sambit_S
by New Contributor III
  • 1075 Views
  • 2 replies
  • 0 kudos

Failed_to_convert_the_JSON_string_'interval_day_to_second'_to_a_data_type

I am trying to access a delta share table which has a field of datatype interval day to second below.Sample data in the table:- The above table while accessing through delta share giving error as below.Any help in resolving this issue will be appreci...

Sambit_S_0-1696512262555.png Sambit_S_1-1696512337390.png Sambit_S_2-1696512405641.png
  • 1075 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Sambit_S , did you have time to check Kaniz's response? could you please verity it and let us know if you still need help

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 763 Views
  • 0 replies
  • 0 kudos

Notebook of Databricks's result

If there is no data abnormality in redshift connecting to spark from shared in databricks, and the data suddenly decreases, what cause should I check? Also, is there any way to check the variables in widget or code on each execution?

  • 763 Views
  • 0 replies
  • 0 kudos
RantoB
by Valued Contributor
  • 20632 Views
  • 7 replies
  • 4 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

  • 20632 Views
  • 7 replies
  • 4 kudos
Latest Reply
MartinIsti
New Contributor III
  • 4 kudos

I know it's a 2 years old thread but I needed to find a solution to this very thing today. I had one notebook using SparkContextfrom pyspark import SparkFilesfrom pyspark.sql.functions import *sc.addFile(url) But according to the runtime 14 release n...

  • 4 kudos
6 More Replies
vlado101
by New Contributor II
  • 3302 Views
  • 1 replies
  • 1 kudos

Resolved! ANALYZE TABLE is not updating columns stats

Hello everyone,So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I ...

  • 3302 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Valued Contributor
  • 1 kudos

Hello @vlado101  The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as ...

  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1364 Views
  • 1 replies
  • 1 kudos

Structured Streaming Aggregation

Utilizing structured streaming to read the change data feed from your Delta table empowers you to execute incremental streaming aggregations, such as counting and summing.

structured2.png
  • 1364 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Thank you for sharing @Hubert-Dudek !!!

  • 1 kudos
Kayla
by Valued Contributor
  • 3625 Views
  • 2 replies
  • 2 kudos

Resolved! Paramiko SFTP Get fails on databricks file system

I have an SFTP server I need to routinely download Excel files from and put into GCP cloud storage buckets.Every variation of the filepath to either my GCP path or just the dbfs in-built file system is giving an error of " [Errno 2] No such file or d...

  • 3625 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Thank you for sharing the solution. Many more users will find this information very useful. 

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels