cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

asingamaneni
by New Contributor II
  • 475 Views
  • 1 replies
  • 0 kudos

Databricks Summit 2023

Databricks summit 2023 have been fantastic and I got a chance to meet many authors and industry leaders whom I admire in the DataEngineering community! #DataAISummit

  • 475 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @asingamaneni, We're thrilled to hear that you had a great experience at DAIS 2023! Your feedback is valuable to us, and we appreciate you taking the time to share it on the community platform. We wanted to let you know that the Databricks Communi...

  • 0 kudos
Tidaldata
by New Contributor
  • 427 Views
  • 1 replies
  • 0 kudos

Loveing Databricks Summit

Loving the summit so far, awesome keynote speakers, great trainers and paid courses. Finished certification #databrickslearning

  • 427 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Tidaldata, We're thrilled to hear that you had a great experience at DAIS 2023! Your feedback is valuable to us, and we appreciate you taking the time to share it on the community platform. We wanted to let you know that the Databricks Community ...

  • 0 kudos
ws4100e
by New Contributor III
  • 2409 Views
  • 8 replies
  • 0 kudos

DLT piplines with UC

I try to run a (very simple) DLT pipeline in with a resulting materialized table is published in UC schema with a managed storage location defined (within an existing EXTERNAL LOCATION). Accoding to the documentation: Publishing to schemas that speci...

  • 2409 Views
  • 8 replies
  • 0 kudos
Latest Reply
DataGeek_JT
New Contributor II
  • 0 kudos

Did this get resolved?  I am getting the same issue.

  • 0 kudos
7 More Replies
Phani1
by Valued Contributor
  • 172 Views
  • 1 replies
  • 0 kudos

Databricks Platform Cleanup and baseline activities.

Hi Team, Kindly share the best practices for managing Databricks Platform Cleanup and baseline activities.

  • 172 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, Here are some best practices for managing Databricks Platform Cleanup and baseline activities: Platform Administration: Regularly monitor and manage your Databricks platform to ensure optimal performance.Compute Creation: Choose the ri...

  • 0 kudos
dataslicer
by Contributor
  • 416 Views
  • 2 replies
  • 0 kudos

How to export/clone Databricks Notebook without results via web UI?

When a Databricks Notebook exceeds size limit, it suggests to `clone/export without results`.  This is exactly what I want to do, but the current web UI does not provide the ability to bypass/skip the results in either the `clone` or `export` context...

  • 416 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataslicer
Contributor
  • 0 kudos

Thank you @Yeshwanth for the response. I am looking for a way without clearing up the current outputs. This is necessary because I want to preserve the existing outputs and fork off another notebook instance to run with few parameter changes and come...

  • 0 kudos
1 More Replies
Ramana
by Contributor
  • 1017 Views
  • 3 replies
  • 0 kudos

SHOW GROUPS is not giving groups available at the account level

I am trying to capture all the Databricks groups and their mapping to user/ad group(s).I tried to do this by using show groups, show users, and show grants by following the examples mentioned in the below article but the show groups command only fetc...

  • 1017 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramana
Contributor
  • 0 kudos

Yes, I can use the Rest API but I am looking for a SQL or Programming way to do this rather than doing the API calls and building the Comex Datatype Dataframe and then saving it as a Table.ThanksRamana

  • 0 kudos
2 More Replies
kseyser
by New Contributor II
  • 500 Views
  • 2 replies
  • 1 kudos

Predicting compute required to run Spark jobs

Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started? 

  • 500 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yeshwanth
Honored Contributor
  • 1 kudos

@kseyser good day, This documentation might help you in your use-case: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#compute-sizing-considerations Kind regards, Yesh

  • 1 kudos
1 More Replies
Lea
by New Contributor II
  • 1719 Views
  • 1 replies
  • 2 kudos

Resolved! Advice for generic file processing for ingestion of multiple data formats

Hello,We are using delta live tables to ingest data from multiple business groups, each with different input file formats and parsing requirements.  The input files are ingested from azure blob storage.  Right now, we are only servicing three busines...

  • 1719 Views
  • 1 replies
  • 2 kudos
Latest Reply
raphaelblg
Contributor III
  • 2 kudos

Hello @Lea , I'd like to inform you that our platform does not currently provide a built-in feature for ingesting multiple or interchangeable file formats. However, we highly value your input and encourage you to share your ideas through Databricks' ...

  • 2 kudos
thiagoawstest
by New Contributor III
  • 4362 Views
  • 3 replies
  • 2 kudos

Resolved! Migration Azure to AWS

Hello, today I use Azure Databricks, I want to migrate my wordspaces to AWS Databricks. What is the best practice, which path should I follow?, I didn't find anything in the documentation.thanks.

  • 4362 Views
  • 3 replies
  • 2 kudos
Latest Reply
thiagoawstest
New Contributor III
  • 2 kudos

Hello, as I already have a working Databricks environment on Azure, the best way would be to use tool-databricks-migrate?

  • 2 kudos
2 More Replies
orangepepino
by New Contributor II
  • 4212 Views
  • 2 replies
  • 1 kudos

SFTP connection using private key on Azure Databricks

I need to connect to a server to retrieve some files using spark and a private ssh key. However, to manage the private key safely I need to store it as a secret in Azure Key Vault, which means I don't have the key as a file to pass down in the keyFil...

  • 4212 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @orangepepino,  Instead of specifying the keyFilePath, you can pass the private key as a PEM string directly. This approach avoids the need for a physical key file.Since you’re already using Azure Key Vault, consider storing the private key as a s...

  • 1 kudos
1 More Replies
deng_dev
by New Contributor III
  • 424 Views
  • 3 replies
  • 0 kudos

Autoloader ignore one folder in path

Hi everyone!I am trying to setup Autoloader to read json file with specific name from all subfolders under the path except one.Could someone advice how this can be achieved? For example, I need to read from .../*/specific_name.json, but ignore test f...

  • 424 Views
  • 3 replies
  • 0 kudos
Latest Reply
standup1
New Contributor III
  • 0 kudos

I think you can use REGEXP to achieve this. This might not be the best way, but it should get the job done. It's all about filtering that file in the df from getting loaded. Try something like thisdf.select(“*”,”_metadata”).select(“*”,”_metadata.file...

  • 0 kudos
2 More Replies
Devsql
by New Contributor II
  • 612 Views
  • 3 replies
  • 1 kudos

How to find that given Parquet file got imported into Bronze Layer ?

Hi Team,Recently we had created new Databricks project/solution (based on Medallion architecture) having Bronze-Silver-Gold Layer based tables. So we have created Delta-Live-Table based pipeline for Bronze-Layer implementation. Source files are Parqu...

Data Engineering
Azure Databricks
Bronze Job
Delta Live Table
Delta Live Table Pipeline
  • 612 Views
  • 3 replies
  • 1 kudos
Latest Reply
raphaelblg
Contributor III
  • 1 kudos

Hello @Devsql , It appears that you are creating DLT bronze tables using a standard spark.read operation. This may explain why the DLT table doesn't include "new files" during a REFRESH operation. For incremental ingestion of bronze layer data into y...

  • 1 kudos
2 More Replies
shreya_20202
by New Contributor II
  • 530 Views
  • 1 replies
  • 1 kudos

copy file structure including files from one storage to another incrementally using pyspark

I have a storage account dexflex and two containers source and destination. Source container has directory and files as below:results search 03 Module19111.json Module19126.json 04 Module11291...

  • 530 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @shreya_20202, It looks like you’re trying to incrementally copy data from the source container to the destination container in Azure Databricks. To achieve this, you’ll need to compare the files in the source and destination directories and co...

  • 1 kudos
youssefmrini
by Honored Contributor III
  • 585 Views
  • 0 replies
  • 2 kudos

Delta Lake Liquid Clustering

Support for liquid clustering is now generally available using Databricks Runtime +15.2 Getting started with Delta Lake Liquid clustering https://lnkd.in/eaCZyhbF#DeltaLake #Databricks

  • 585 Views
  • 0 replies
  • 2 kudos
thecodecache
by New Contributor II
  • 465 Views
  • 3 replies
  • 0 kudos

Transpile a SQL Script into PySpark DataFrame API equivalent code

Input SQL Script (assume any dialect) : SELECT b.se10, b.se3, b.se_aggrtr_indctr, b.key_swipe_ind FROM (SELECT se10, se3, se_aggrtr_indctr, ROW_NUMBER() OVER (PARTITION BY SE10 ...

  • 465 Views
  • 3 replies
  • 0 kudos
Latest Reply
thecodecache
New Contributor II
  • 0 kudos

Hi @Kaniz, Thanks for your response. I'm looking for a utility or an automated way of translating any generic SQL into PySpark DataFrame code.So, the translate function should look like below:def translate(input_sql):    # translate/convert it into p...

  • 0 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels