cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pawelzak
by New Contributor III
  • 2809 Views
  • 2 replies
  • 1 kudos

Dashboard update through API

Hi,I would like to create / update dashboard definition based on the json file. How can one do it? I tried the following:databricks api post /api/2.0/preview/sql/dashboards/$dashboard_id --json @file.json  But it does not update the widgets...How can...

  • 2809 Views
  • 2 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

To programmatically create/update dashboards in Databricks using a JSON file, you can use the Databricks REST API's workspace/export and workspace/import endpoints. Generate a JSON representation of your dashboard using workspace/export, modify it as...

  • 1 kudos
1 More Replies
israelst
by New Contributor III
  • 3332 Views
  • 3 replies
  • 1 kudos

structured streaming schema inference

I want to stream data from kinesis using DLT. the Data is in json format. How can I use structured streaming to automatically infer the schema? I know auto-loader has this feature but it doesn't make sense for me to use autoloader since my data is st...

  • 3332 Views
  • 3 replies
  • 1 kudos
Latest Reply
israelst
New Contributor III
  • 1 kudos

I wanted to use Databricks for this. I don't want to depend on AWS Glue. Same way I could do it with AutoLoader...

  • 1 kudos
2 More Replies
seefoods
by New Contributor III
  • 1994 Views
  • 3 replies
  • 1 kudos

Resolved! cluster metrics databricks runtime 13.1

hello everyone, how to collect metrics provided by clusters metrics databricks runtime 13.1 using bash script

  • 1994 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16539034020
Databricks Employee
  • 1 kudos

hi, @Aubert: Currently, you could only use static downloadable snapshots.https://docs.databricks.com/en/compute/cluster-metrics.html Regards,

  • 1 kudos
2 More Replies
Simha
by New Contributor II
  • 2188 Views
  • 1 replies
  • 1 kudos

How to write only file on to the Blob or ADLS from Databricks?

Hi All,I am trying to write a csv file on to the blob and ADLS from databricks notebook using pyspark and a separate folder is created with the mentioned filename and a partition is created within the folder.I want only file to be written. Can anyone...

  • 2188 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Hi @Simha , This is expected behavior. Spark always creates an output directory when writing the data and it divides the result into multiple part files. This is because multiple executors write the result into the output directory. We cannot make th...

  • 1 kudos
GeorgeD
by New Contributor
  • 1348 Views
  • 0 replies
  • 0 kudos

Uncaught Error: Script error for jupyter-widgets/base

I have been using ipywidgets for quite a while in several notebooks in Databricks, but today things have completely stopped working with the following error;Uncaught Error: Script error for "@jupyter-widgets/base" http://requirejs.org/docs/errors.htm...

  • 1348 Views
  • 0 replies
  • 0 kudos
Mr__D
by New Contributor II
  • 27959 Views
  • 7 replies
  • 1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

  • 27959 Views
  • 7 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

  • 1 kudos
6 More Replies
Danielsg94
by New Contributor II
  • 36079 Views
  • 5 replies
  • 1 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

  • 36079 Views
  • 5 replies
  • 1 kudos
Latest Reply
Simha
New Contributor II
  • 1 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

  • 1 kudos
4 More Replies
Cas
by New Contributor III
  • 2881 Views
  • 1 replies
  • 1 kudos

Asset Bundles: Dynamic job cluster insertion in jobs

Hi!As we are migrating from dbx to asset bundles we are running into some problems with the dynamic insertion of job clusters in the job definition as with dbx we did this nicely with jinja and defined all the clusters in one place and a change in th...

Data Engineering
asset bundles
jobs
  • 2881 Views
  • 1 replies
  • 1 kudos
krocodl
by Contributor
  • 3590 Views
  • 2 replies
  • 0 kudos

Resolved! Thread leakage when connection cannot be established

During the execution of the next code we can observe a lost thread that will never end:@Testpublic void pureConnectionErrorTest() throws Exception { try { DriverManager.getConnection(DATABRICKS_JDBC_URL, DATABRICKS_USERNAME, DATABRICKS_PASS...

Data Engineering
JDBC
resource leaking
threading
  • 3590 Views
  • 2 replies
  • 0 kudos
Latest Reply
krocodl
Contributor
  • 0 kudos

This issue is reported as fixed since v2.6.34. I validated version 2.6.36- it works normal. Many thanks to the developers for the work done!

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 1129 Views
  • 1 replies
  • 0 kudos

Delta Live Table streaming pipeline

How do I do a simple left join of a static table and a streaming table under catalog in the streaming pipeline of a Delta Live Table?

  • 1129 Views
  • 1 replies
  • 0 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 0 kudos

Hi @rt-slowth I would like to share with you the Databricks documentation, which contains details about stream-static table joins https://docs.databricks.com/en/delta-live-tables/transform.html#stream-static-joins Stream-static joins are a good choic...

  • 0 kudos
JasonThomas
by New Contributor III
  • 2294 Views
  • 2 replies
  • 0 kudos

Row-level Concurrency and Liquid Clustering compatibility

The documentation is a little ambiguous:"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."https://docs.databricks.com/en/release-notes/runtime/14.2.html Tables with liquid clusterin...

  • 2294 Views
  • 2 replies
  • 0 kudos
Latest Reply
JasonThomas
New Contributor III
  • 0 kudos

Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.I found the following informative:https://www.youtube.com/watch?v=5t6wX28JC_M

  • 0 kudos
1 More Replies
RobsonNLPT
by Contributor III
  • 2470 Views
  • 1 replies
  • 1 kudos

IDENTIFIER clause

Hi all. Just trying to implement adb sql scripts using identifier clause but I have errors like that using an example: DECLARE mytab = 'tab1';  CREATE TABLE IDENTIFIER(mytab) (c1 INT);   [UNSUPPORTED_FEATURE.TEMP_VARIABLE_ON_DBSQL] The feature is not...

Data Engineering
identifier
  • 2470 Views
  • 1 replies
  • 1 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 1 kudos

@RobsonNLPT  - Engineering is still working on the feature that allow DECLARE statements in DBSQL. This is with a tentative ETA of Feb 20 available on preview channel.

  • 1 kudos
thedatacrew
by New Contributor III
  • 6280 Views
  • 4 replies
  • 0 kudos

DTL - Delta Live Tables & Handling De-Duplication from Source Data

Hello,Could anyone please help regarding the scenario below?Scenario• I'm using the DLT SQL Language• Parquet files are landed each day from a source system.• Each day, the data contains the 7 previous days of data. The source system can have very la...

Data Engineering
De-duplication
Delta Live Tables
dlt
  • 6280 Views
  • 4 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

Yes, it is available in DLT. Check this document: https://docs.databricks.com/en/delta-live-tables/cdc.html

  • 0 kudos
3 More Replies
Phani1
by Valued Contributor II
  • 2167 Views
  • 2 replies
  • 1 kudos

Snowflake connector

Hi Team, Databricks recommends storing data in a cloud storage location, but if we directly connect to Snowflake using the Snowflake connector, will we face any performance issues?Could you please suggest the best way to read a large volume of data f...

  • 2167 Views
  • 2 replies
  • 1 kudos
Latest Reply
Phani1
Valued Contributor II
  • 1 kudos

Thanks !!

  • 1 kudos
1 More Replies
Amit_Garg
by New Contributor
  • 1298 Views
  • 1 replies
  • 1 kudos

Calling a .py Function using DF from another file

I have created a file NBF_TextTranslationspark = SparkSession.builder.getOrCreate() df_TextTranslation = spark.read.format('delta').load(textTranslation_path) def getMediumText(TextID, PlantName): df1 = spark.sql("SELECT TextID, PlantName, Langu...

  • 1298 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

You should create a udf on top of getMediumText function and then use the udf in the sql statement.

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels