cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ajay-Pandey
by Databricks MVP
  • 2969 Views
  • 2 replies
  • 2 kudos

Why Azure Databricks needs to store data in temp storage in Azure before writing to the synapse.

I was following the tutorial about data transformation with azure databricks, and it says before loading data into azure synapse analytics, the data transformed by azure databricks would be saved on temp storage in azure blob storage first before loa...

  • 2969 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Ajay Pandey​ Saving the transformed data to temporary storage in Azure Blob Storage before loading into Azure Synapse Analytics provides a number of benefits to ensure that the data is accurate, optimized, and performs well in the target environmen...

  • 2 kudos
1 More Replies
chhavibansal
by New Contributor III
  • 1605 Views
  • 1 replies
  • 0 kudos

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.

  • 1605 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Chhavi Bansal​ :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...

  • 0 kudos
nounou
by New Contributor II
  • 7370 Views
  • 1 replies
  • 1 kudos

how can i export my dashboard en format html using databriks api

hi everyone, i would like to export my dashbord in html format and embed it in my body of my email in order to send it to my teamso there is my code python for the databriks api  and i got this error  and when i put my htm in the body of my message i...

Capture Capture1 Capture3
  • 7370 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@mathild noun​ :import databricks.workspace as workspace_api import requests   # set up your Databricks workspace credentials domain = "<your Databricks workspace domain>" token = "<your Databricks API token>"   # set up the workspace client workspac...

  • 1 kudos
VinayEmmadi
by New Contributor
  • 12337 Views
  • 1 replies
  • 2 kudos

How does hash shuffle join work in Spark?

Hi All, I am trying to understand the internals shuffle hash join. I want to check if my understanding of it is correct. Let’s say I have two tables t1 and t2 joined on column country (8 distinct values). If I set the number of shuffle partitions as ...

  • 12337 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Vinay Emmadi​ : In Spark, a hash shuffle join is a type of join that is used when joining two data sets on a common key. The data is first partitioned based on the join key, and then each partition is shuffled and sent to a node in the cluster. The ...

  • 2 kudos
Bartek
by Contributor
  • 4607 Views
  • 1 replies
  • 1 kudos

Save Spark DataFrame to shape file (.shp format)

Hello,I know how to create .shp file from Geopandas dataframe using code similar to this, also mentioned on SO:gpd_df = geopandas.GeoDataFrame(pandas_df, geometry='geom') gpd_df .to_file("username/nh.shp")However I have .parquet files that I can load...

  • 4607 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Bartosz Maciejewski​ :Spark does not have native support for writing Shapefiles directly. However, you can use a third-party library such as GeoPandas or PyShp to write your Spark DataFrame to a Shapefile.Here's an example of how to use GeoPandas to...

  • 1 kudos
KVNARK
by Honored Contributor II
  • 7329 Views
  • 1 replies
  • 4 kudos

Resolved! Query related to Storage account authentication

Use Case: Copy data from SharePoint List to Blob using Power AutomateShort Description:To Access the blob storage account from Power Automate. There are three authentication type:1. Access Key2. Service Principal3. Azure AD IntegratedWhich authentica...

  • 7329 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@KVNARK .​ :It's recommended to use the Azure AD Integrated authentication type. This authentication type allows you to use Azure Active Directory (AD) to authenticate and manage access to Blob Storage resources at the folder or container level using...

  • 4 kudos
Lu_Wang_SA_DBX
by Databricks Employee
  • 1486 Views
  • 0 replies
  • 2 kudos

We will host the first Databricks Bay Area User Group meeting in the Databricks Mountain View office on March 14 2:30-5:00pm PT.We&#39;ll have Dave Ma...

We will host the first Databricks Bay Area User Group meeting in the Databricks Mountain View office on March 14 2:30-5:00pm PT.We'll have Dave Mariani - CTO & Founder at AtScale, and Riley Phillips - Enterprise Solution Engineer at Matillion to shar...

  • 1486 Views
  • 0 replies
  • 2 kudos
aki1
by New Contributor II
  • 3165 Views
  • 2 replies
  • 1 kudos

How to download a file in DBFS that contains multibyte characters in the file path?

I would like to download a file in DBFS using the FileStore Endpoint.If the file or folder name contains multibyte characters, the file path cannot be specified due to URL encoding and an error occurs.Question 1: If a file or folder name contains mul...

  • 3165 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi,Databricks CLI can be used to download a file from DBFS. https://docs.databricks.com/dev-tools/cli/index.htmlAlso, you can refer to https://stackoverflow.com/questions/49019706/databricks-download-a-dbfs-filestore-file-to-my-local-machine , which ...

  • 1 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 2721 Views
  • 0 replies
  • 4 kudos

lnkd.in

Databricks has introduced a new feature that allows users to send SQL statements to their database via REST API. Users can easily integrate this feature with any tool by simply posting their queries to the /api/2.0/sql/statements/ endpoint. With this...

statmentapi
  • 2721 Views
  • 0 replies
  • 4 kudos
Tewks
by New Contributor
  • 3860 Views
  • 2 replies
  • 5 kudos

Resolved! Databricks SQL External Connections

Lakehouse architectures seem enticing, especially from the standpoint of querying the data lake directly as it sits (as opposed to first migrating the data to an external data warehouse). While documentation and support seems pretty clear regarding ...

  • 3860 Views
  • 2 replies
  • 5 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 5 kudos

these are really awesome details

  • 5 kudos
1 More Replies
Erik_L
by Contributor II
  • 3321 Views
  • 1 replies
  • 1 kudos

Resolved! python 3.10 custom docker image

Goal: To use Python 3.10.4+Why: We have Python repos that are not backward compatible we want to use.What: I have created an image from Databricks example experimental containers already with Ubuntu 22.04 (2 major versions newer than the curre...

  • 3321 Views
  • 1 replies
  • 1 kudos
Latest Reply
Erik_L
Contributor II
  • 1 kudos

After searching for an hour, I realized what I needed to look for. It's the importing the Iterable from collections, which is deprecated in 3.10. I guess Databricks hasn't migrated code, yet. In which case, I'm at a cross-roads. Databricks 3.9, local...

  • 1 kudos
dotan
by New Contributor II
  • 2494 Views
  • 2 replies
  • 1 kudos

Resolved! How do I reduce the size of a hive table's S3 bucket

I have a hive table in Delta format with over 1B rows, when I check the Data Explorer in the SQL section of Databricks it notes that the table size is 139.3GiB with 401 files but when I check the S3 bucket where the files are located (dbfs:/user/hive...

  • 2494 Views
  • 2 replies
  • 1 kudos
Latest Reply
apingle
Contributor
  • 1 kudos

When you run updates, deletes etc on a delta table, new files are created. However, the old files are not automatically deleted. This is to allow for features like time travel on the Delta tables. In order to delete older files for a delta table, you...

  • 1 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1419 Views
  • 1 replies
  • 5 kudos

Exciting news for Databricks users! #databricks launched a new feature that allows users to run job workflows continuously. Setting up a continuous jo...

Exciting news for Databricks users! #databricks launched a new feature that allows users to run job workflows continuously. Setting up a continuous job workflow is straightforward: create a job and select the continuous trigger option in the scheduli...

ezgif-1-1c3322d3f9
  • 1419 Views
  • 1 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 5 kudos

Thank you for sharing!!!

  • 5 kudos
Sujitha
by Databricks Employee
  • 1821 Views
  • 1 replies
  • 1 kudos

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week. Databricks platform release notesFebruary 2...

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week.Databricks platform release notesFebruary 21 - 28, 2023Ray on Databricks (Public Preview)With Databricks Runtime 12.0 and above, you can create ...

  • 1821 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing!!!

  • 1 kudos
SteveGPT
by New Contributor III
  • 5155 Views
  • 3 replies
  • 3 kudos

How to by pass SSL cert verification, using Repos with Azure Devops

Hi all, after some time working with Devops and Repos and getting used to the convenience our SSL Cert situation got jacked up somehow. While not ideal, I'd like to be able to temporarily bypass cert verification. There are ways to do this in the she...

  • 5155 Views
  • 3 replies
  • 3 kudos
Latest Reply
SteveGPT
New Contributor III
  • 3 kudos

Guess I'm out of luck on this one...

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels