cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Twilight
by New Contributor III
  • 5815 Views
  • 5 replies
  • 3 kudos

Resolved! Bug - Databricks requires extra escapes in repl string in regexp_replace (compared to Spark)

In Spark (but not Databricks), these work:regexp_replace('1234567890abc', '^(?<one>\\w)(?<two>\\w)(?<three>\\w)', '$3$2$1') regexp_replace('1234567890abc', '^(?<one>\\w)(?<two>\\w)(?<three>\\w)', '${three}${two}${one}')In Databricks, you have to use ...

  • 5815 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Stephen Wilcoxon​ : No, it is not a bug. Databricks uses a different flavor of regular expression syntax than Apache Spark. In particular, Databricks uses Java's regular expression syntax, whereas Apache Spark uses Scala's regular expression syntax....

  • 3 kudos
4 More Replies
ChristianRRL
by Valued Contributor
  • 2085 Views
  • 1 replies
  • 1 kudos

Resolved! DLT Bronze: Incremental File Updates

Hi there, I would like to clarify if there's a way for bronze data to be ingested from "the same" CSV file if the file has been modified (i.e. new file with new records overwriting the old file)? Currently in my setup my bronze table is a `streaming ...

  • 2085 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

You can use the option "cloudFiles.allowOverwrites" in DLT. This option will allow you to read the same csv file again but you should use it cautiously, as it can lead to duplicate data being loaded.

  • 1 kudos
otum
by New Contributor II
  • 2779 Views
  • 6 replies
  • 0 kudos

[Errno 2] No such file or directory

I am reading a Json a file as in below location, using the below code,    file_path = "/dbfs/mnt/platform-data/temp/ComplexJSON/sample.json" # replace with the file path f = open(file_path, "r") print(f.read())     but it is failing for no such file...

otum_0-1704950000614.png otum_0-1704949958734.png
  • 2779 Views
  • 6 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, As Shan mentioned, could you please cat the file and see if it exists. 

  • 0 kudos
5 More Replies
ac0
by Contributor
  • 5210 Views
  • 3 replies
  • 0 kudos

Resolved! Setting environment variables to use in a SQL Delta Live Table Pipeline

I'm trying to use the Global Init Scripts in Databricks to set an environment variable to use in a Delta Live Table Pipeline. I want to be able to reference a value passed in as a path versus hard coding it. Here is the code for my pipeline:CREATE ST...

  • 5210 Views
  • 3 replies
  • 0 kudos
Latest Reply
ac0
Contributor
  • 0 kudos

I was able to accomplish this by creating a Cluster Policy that put in place the scripts, config settings, and environment variables I needed.

  • 0 kudos
2 More Replies
ChrisS
by New Contributor III
  • 5325 Views
  • 7 replies
  • 8 kudos

How to get data scraped from the web into your data storage

I learning data bricks for the first time following the book that is copywrited in 2020 so I imagine it might be a little outdated at this point. What I am trying to do is move data from an online source (in this specific case using shell script but ...

  • 5325 Views
  • 7 replies
  • 8 kudos
Latest Reply
CharlesReily
New Contributor III
  • 8 kudos

In Databricks, you can install external libraries by going to the Clusters tab, selecting your cluster, and then adding the Maven coordinates for Deequ. This represents the best b2b data enrichment services in Databricks.In your notebook or script, y...

  • 8 kudos
6 More Replies
aockenden
by New Contributor III
  • 2180 Views
  • 2 replies
  • 0 kudos

Switching SAS Tokens Mid-Script With Spark Dataframes

Hey all, my team has settled on using directory-scoped SAS tokens to provision access to data in our Azure Gen2 Datalakes. However, we have encountered an issue when switching from a first SAS token (which is used to read a first parquet table in the...

  • 2180 Views
  • 2 replies
  • 0 kudos
Latest Reply
aockenden
New Contributor III
  • 0 kudos

Bump

  • 0 kudos
1 More Replies
pyter
by New Contributor III
  • 6271 Views
  • 5 replies
  • 2 kudos

Resolved! [13.3] Vacuum on table fails if shallow clone without write access exists

Hello everyone,We use unity catalog, separating our dev, test and prod data into individual catalogs.We run weekly vacuums on our prod catalog using a service principal that only has (read+write) access to this production catalog, but no access to ou...

  • 6271 Views
  • 5 replies
  • 2 kudos
Latest Reply
Lakshay
Databricks Employee
  • 2 kudos

Are you using Unity Catalog in single user access mode? If yes, could you try using shared access mode.

  • 2 kudos
4 More Replies
pawelzak
by New Contributor III
  • 2579 Views
  • 2 replies
  • 1 kudos

Dashboard update through API

Hi,I would like to create / update dashboard definition based on the json file. How can one do it? I tried the following:databricks api post /api/2.0/preview/sql/dashboards/$dashboard_id --json @file.json  But it does not update the widgets...How can...

  • 2579 Views
  • 2 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

To programmatically create/update dashboards in Databricks using a JSON file, you can use the Databricks REST API's workspace/export and workspace/import endpoints. Generate a JSON representation of your dashboard using workspace/export, modify it as...

  • 1 kudos
1 More Replies
israelst
by New Contributor II
  • 2880 Views
  • 3 replies
  • 1 kudos

structured streaming schema inference

I want to stream data from kinesis using DLT. the Data is in json format. How can I use structured streaming to automatically infer the schema? I know auto-loader has this feature but it doesn't make sense for me to use autoloader since my data is st...

  • 2880 Views
  • 3 replies
  • 1 kudos
Latest Reply
israelst
New Contributor II
  • 1 kudos

I wanted to use Databricks for this. I don't want to depend on AWS Glue. Same way I could do it with AutoLoader...

  • 1 kudos
2 More Replies
seefoods
by New Contributor III
  • 1767 Views
  • 3 replies
  • 1 kudos

Resolved! cluster metrics databricks runtime 13.1

hello everyone, how to collect metrics provided by clusters metrics databricks runtime 13.1 using bash script

  • 1767 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16539034020
Databricks Employee
  • 1 kudos

hi, @Aubert: Currently, you could only use static downloadable snapshots.https://docs.databricks.com/en/compute/cluster-metrics.html Regards,

  • 1 kudos
2 More Replies
Simha
by New Contributor II
  • 2014 Views
  • 1 replies
  • 1 kudos

How to write only file on to the Blob or ADLS from Databricks?

Hi All,I am trying to write a csv file on to the blob and ADLS from databricks notebook using pyspark and a separate folder is created with the mentioned filename and a partition is created within the folder.I want only file to be written. Can anyone...

  • 2014 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Hi @Simha , This is expected behavior. Spark always creates an output directory when writing the data and it divides the result into multiple part files. This is because multiple executors write the result into the output directory. We cannot make th...

  • 1 kudos
GeorgeD
by New Contributor
  • 1241 Views
  • 0 replies
  • 0 kudos

Uncaught Error: Script error for jupyter-widgets/base

I have been using ipywidgets for quite a while in several notebooks in Databricks, but today things have completely stopped working with the following error;Uncaught Error: Script error for "@jupyter-widgets/base" http://requirejs.org/docs/errors.htm...

  • 1241 Views
  • 0 replies
  • 0 kudos
Mr__D
by New Contributor II
  • 24076 Views
  • 7 replies
  • 1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

  • 24076 Views
  • 7 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

  • 1 kudos
6 More Replies
Danielsg94
by New Contributor II
  • 34562 Views
  • 5 replies
  • 1 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

  • 34562 Views
  • 5 replies
  • 1 kudos
Latest Reply
Simha
New Contributor II
  • 1 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

  • 1 kudos
4 More Replies
Cas
by New Contributor III
  • 2567 Views
  • 1 replies
  • 1 kudos

Asset Bundles: Dynamic job cluster insertion in jobs

Hi!As we are migrating from dbx to asset bundles we are running into some problems with the dynamic insertion of job clusters in the job definition as with dbx we did this nicely with jinja and defined all the clusters in one place and a change in th...

Data Engineering
asset bundles
jobs
  • 2567 Views
  • 1 replies
  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels