cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Mathias
by New Contributor II
  • 15 Views
  • 2 replies
  • 0 kudos

Connecting to Blob storage using abfss not working with serverless compute

I tried to follow the instructions found here: Connect to Azure Data Lake Storage Gen2 and Blob Storage - Azure Databricks | Microsoft LearnE.g. this code:spark.conf.set("fs.azure.account.key.<storage-account>.dfs.core.windows.net",dbutils.secrets.ge...

  • 15 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mathias
New Contributor II
  • 0 kudos

Can you point me to some documentation on how to do that?

  • 0 kudos
1 More Replies
ggsmith
by New Contributor III
  • 438 Views
  • 4 replies
  • 3 kudos

dlt Streaming Checkpoint Not Found

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...

  • 438 Views
  • 4 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 3 kudos

Hi @ggsmith ,If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name. 

  • 3 kudos
3 More Replies
clentin
by Contributor
  • 346 Views
  • 4 replies
  • 0 kudos

Import Py File

How do i import a .py file in Databricks environment?Any help will be appreciated. 

  • 346 Views
  • 4 replies
  • 0 kudos
Latest Reply
tejaswi24
New Contributor
  • 0 kudos

Hi @clentin , did you get an answer to this. Im looking for a similar thing. The api only imports .py as notebooks. I want them to be deployed as file type in workspace

  • 0 kudos
3 More Replies
AngadSingh
by New Contributor II
  • 10 Views
  • 0 replies
  • 0 kudos

Delete delta live table without deleting DLT pipeline

Hi,I am wondering how do I delete the managed DLT table without deleting the DLT pipeline? I tried commenting the code for the table definition but the DLT pipeline complains that source code has no table to create or update. Thanks in advance.#datae...

  • 10 Views
  • 0 replies
  • 0 kudos
BaburamShrestha
by Visitor
  • 57 Views
  • 1 replies
  • 0 kudos

File Arrival Trigger

We are using Databricks in combination with Azure platforms, specifically working with Azure Blob Storage (Gen2). We frequently mount Azure containers in the Databricks file system and leverage external locations and volumes for Azure containers.Our ...

  • 57 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @BaburamShrestha ,.1. To set up a file arrival trigger you can follow below guides:- File Arrival Triggers in Databricks Workflows (linkedin.com)- Trigger jobs when new files arrive | Databricks on AWS.2. To capture the file path that triggered th...

  • 0 kudos
omsurapu
by Visitor
  • 25 Views
  • 2 replies
  • 0 kudos

One workspace can connects to the multiple AWS accounts/regions

HI,I'd like to know if one workspace can be used to connect to the multiple accounts (account A and account B) / regions. I knew that multiple accounts/regions can't be selected during the setup. is it possible?

  • 25 Views
  • 2 replies
  • 0 kudos
Latest Reply
omsurapu
Visitor
  • 0 kudos

ok, thanks! there is no DB official documentation available for this requirement. I assume it can be done with the cross account IAM roles, but never tested. any leads?

  • 0 kudos
1 More Replies
sticky
by New Contributor
  • 159 Views
  • 1 replies
  • 0 kudos

Running a cell with R-script keeps waiting status

So, i have a R-notebook with different cells and a '15.4 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)' cluster.If i select 'run all' all cells will be run immediately and the run finishes quickly and fine. But if i would like to run the cells one...

  • 159 Views
  • 1 replies
  • 0 kudos
Latest Reply
sticky
New Contributor
  • 0 kudos

Update: i used different types of clusters (single node, other drivers) with the same result. When i was checking my R-code by running cell by cell and line by line (in the responsible cell for this problem). it turns out that the glm-functions of th...

  • 0 kudos
camilo_s
by Contributor
  • 1327 Views
  • 7 replies
  • 3 kudos

Git credentials for service principals running Jobs

I know the documentation for setting up Git credentials for Service Principals: you have to use a PAT from your Git provider, which is inevitably tied to an user and has a lifecycle of its own.Doesn't this kind of defeats the purpose of running a job...

  • 1327 Views
  • 7 replies
  • 3 kudos
Latest Reply
camilo_s
Contributor
  • 3 kudos

That would be a valid workaround (with the caveat that if a job's tasks run long enough, the token may expire). I also agree that this authentication flow should be ideally provided by Databricks as a feature, given that likely many customers face th...

  • 3 kudos
6 More Replies
pthaenraj
by New Contributor III
  • 4943 Views
  • 13 replies
  • 8 kudos

Resolved! Databricks Certified Professional Data Scientist Exam Question Types

Hello,I am not seeing a lot of information regarding the Databricks Certified Professional Data Scientistexam. I took the Associate Developer in Apache Spark Exam last year and the materials for the exam seemed much more focused than what I found for...

  • 4943 Views
  • 13 replies
  • 8 kudos
Latest Reply
ivanabaquero
  • 8 kudos

Hello!I understand your concerns, and having recently cleared the Databricks Certified Professional Data Scientist exam, I can share some insight's.From my experience, the exam primarily focuses on machine learning and data science theory. The questi...

  • 8 kudos
12 More Replies
SagarJi
by New Contributor II
  • 44 Views
  • 2 replies
  • 1 kudos

SQL merge to update one of the nested column

 I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.========MERGE INTO mydataset AS targetUSING CURRENT_BA...

  • 44 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
New Contributor III
  • 1 kudos

Hi @SagarJi ,According to the documentation updates to the nested columns are not supported:What you can do you can construct the whole struct and update the parent:MERGE INTO mydataset AS target USING CURRENT_BATCH AS incoming ON target.parent1.comp...

  • 1 kudos
1 More Replies
Fz1
by New Contributor III
  • 7588 Views
  • 6 replies
  • 3 kudos

Resolved! SQL Warehouse Serverless - Not able to access the external tables in the hive_metastore

I have DLT tables created under the hive_metastore with external data stored in ADL gen2.The ADL blob storage is mounted into /mnt/<storage-account>The tables are successfully created and accessible from my notebooks, as well the ADL storage.I have c...

  • 7588 Views
  • 6 replies
  • 3 kudos
Latest Reply
TjommeV-Vlaio
New Contributor II
  • 3 kudos

Can this be done using Terraform as well?

  • 3 kudos
5 More Replies
jfpatenaude
by Visitor
  • 71 Views
  • 1 replies
  • 1 kudos

MalformedInputException when using extended ascii characters in dbutils.notebook.exit()

I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller notebook. The returned string has some...

  • 71 Views
  • 1 replies
  • 1 kudos
Latest Reply
jennie258fitz
New Contributor II
  • 1 kudos

@jfpatenaude starbuckssecretmenu wrote:I have a specific use case where I call another notebook using the dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller...

  • 1 kudos
Kody_Devl
by New Contributor II
  • 18832 Views
  • 3 replies
  • 2 kudos

Resolved! Export to Excel xlsx

Hi All Does anyone have some code or example of how to export my Databricks SQL results directly to an existing spreadsheet?Many ThanksKody_Devl

  • 18832 Views
  • 3 replies
  • 2 kudos
Latest Reply
Emit
Visitor
  • 2 kudos

There is an add-on directly import table to spreadsheet. https://workspace.google.com/marketplace/app/bricksheet/979793077657 

  • 2 kudos
2 More Replies
Brad
by Contributor
  • 85 Views
  • 3 replies
  • 0 kudos

How to control file size by OPTIMIZE

Hi,I have a delta table under UC, no partition, no liquid clustering. I tried  OPTIMIZE foo; -- OR ALTER TABLE foo SET TBLPROPERTIES(delta.targetFileSize = '128mb'); OPTIMIZE foo; I expect to see the files can have some change after above, but the OP...

  • 85 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

Hi @Brad ,Databricks is a big data processing engine. Instead of testing 3 files try to test 3000 files  OPTIMIZE isn't merging your small files because there may not be enough files or data for it to act upon.Regarding why DESC DETAILS shows 3 files...

  • 0 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels