cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

guangyi
by Contributor III
  • 2445 Views
  • 4 replies
  • 0 kudos

Resolved! What is the correct way to measure the performance of a Databrick notebook?

Here is my code for converting one column field of a data frame to time data type:  col_value = df.select(df.columns[0]).first()[0] start_time = time.time() col_value = datetime.strftime(col_value, "%Y-%m-%d %H:%M:%S") \ if isinstance(co...

  • 2445 Views
  • 4 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

How many columns do you have?

  • 0 kudos
3 More Replies
Vetrivel
by Contributor
  • 4685 Views
  • 7 replies
  • 2 kudos

Connection Challenges with Azure Databricks and SQL Server On VM in Serverless compute

We have established an Azure Databricks workspace within our central subscription, which hosts all common platform resources. Additionally, we have a SQL Server running on a virtual machine in a separate sandbox subscription, containing data that nee...

  • 4685 Views
  • 7 replies
  • 2 kudos
Latest Reply
Vetrivel
Contributor
  • 2 kudos

@Mo I have tried and got below error.Private access to resource type 'Microsoft.Compute/virtualMachines' is not supported with group id 'sqlserver'.I hope it supports only if the destinations are Blob, ADLS and Azure SQL.

  • 2 kudos
6 More Replies
Erik_L
by Contributor II
  • 7284 Views
  • 4 replies
  • 4 kudos

Resolved! Support for Parquet brotli compression or a work around

Spark 3.3.1 supports the brotli compression codec, but when I use it to read parquet files from S3, I get:INVALID_ARGUMENT: Unsupported codec for Parquet page: BROTLIExample code:df = (spark.read.format("parquet") .option("compression", "brotli")...

  • 7284 Views
  • 4 replies
  • 4 kudos
Latest Reply
Erik_L
Contributor II
  • 4 kudos

Given the new information I appended, I looked into the Delta caching and I can disable it:.option("spark.databricks.io.cache.enabled", False)This works as a work around while I read these files in to save them locally in DBFS, but does it have perfo...

  • 4 kudos
3 More Replies
Mystagon
by New Contributor III
  • 5105 Views
  • 4 replies
  • 3 kudos

Performance Issues with Unity Catalog

Hey I need some help /  suggestions troubleshooting this, I have two DataBricks Workspaces Common and Lakehouse. There difference between them is: Major Differences:- Lakehouse is using Unity Catalog- Lakehouse is using External Locations whereas cre...

  • 5105 Views
  • 4 replies
  • 3 kudos
Latest Reply
arjun_kr
Databricks Employee
  • 3 kudos

- Listing directories in common is at least 4-8 times faster than Lakehouse environment.   Are you able to replicate the issue using simple a dbutils list operation (dbutils.fs.ls) or by performing a sample file (say 100 MB file) copy using dbutils.f...

  • 3 kudos
3 More Replies
Brad
by Contributor II
  • 5468 Views
  • 2 replies
  • 1 kudos

Resolved! How to disable all cache

Hi, I'm trying to test some SQL perf. I run below firstspark.conf.set('spark.databricks.io.cache.enabled', False) However, the 2nd run for the same query is still way faster than the first time run. Is there a way to make the query start from a clean...

  • 5468 Views
  • 2 replies
  • 1 kudos
Latest Reply
Brad
Contributor II
  • 1 kudos

Thanks @VZLA . How to runspark.sparkContext.getPersistentRDDs.values.foreach(_.unpersist())from databricks notebook? 

  • 1 kudos
1 More Replies
farbodr
by New Contributor II
  • 6657 Views
  • 5 replies
  • 1 kudos

Shapley Progressbar

The shapley progress bar or tqdm progress bar in general doesn't show in notebooks. Do I need to set something special to get this or any other similar widgets to work?

  • 6657 Views
  • 5 replies
  • 1 kudos
Latest Reply
richk7
New Contributor II
  • 1 kudos

I think you're looking for tqdm.notebookfrom time import sleepfrom tqdm.notebook import tqdmfor _ in tqdm(range(20)): sleep(5)

  • 1 kudos
4 More Replies
JacobLi_LN
by New Contributor II
  • 4168 Views
  • 1 replies
  • 1 kudos

Resolved! Where can I find those delta table log files?

I created a delta table with SQL command CREATE TABLE, and inserted several records into with INSERT statements. Now it can be seen from the catalog.But I want to understand how delta works, and would like to see where are those log files stored.Even...

JacobLi_LN_0-1731951036765.png JacobLi_LN_1-1731951121485.png JacobLi_LN_2-1731951201603.png JacobLi_LN_3-1731951363574.png
  • 4168 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

To locate the log files for your Delta table, please first note that Delta Lake stores its transaction log files in a specific directory within the table's storage location. These log files are for maintaining the ACID properties and enabling feature...

  • 1 kudos
Terraformuser
by New Contributor
  • 1670 Views
  • 1 replies
  • 0 kudos

Azure Databricks - Terraform errors while using workspace level provider

Hello All,I have a question about deploying Azure Databricks with Terraform. Does Databricks have any API call limits? I can deploy external location, a storage credential and it's tested and confirmed working. But when i try to deploy 2 additional e...

  • 1670 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @Terraformuser, Could you try to enable debug on while applying your terraform, that would give you more context on the failure. TF_LOG=DEBUG DATABRICKS_DEBUG_TRUNCATE_BYTES=250000 terraform apply -no-color 2>&1 |tee tf-debug.log

  • 0 kudos
TamD
by Contributor
  • 2161 Views
  • 1 replies
  • 1 kudos

TIME data type

Our business does a LOT of reporting and analysis by time-of-day and clock times, independent of day or date.  Databricks does not seem to support the TIME data type, that I can see.  If I attempt to import data recorded as a time (eg., 02:59:59.000)...

  • 2161 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @TamD ,Basically, it's just like you've written. There is no TIME data type, so you have 2 options which you already mentioned:-  you can use Timestamp data type and ignore its date part-  store it as string and do conversion each time you need it

  • 1 kudos
Phani1
by Databricks MVP
  • 2644 Views
  • 2 replies
  • 0 kudos

Code Review tools

Could you kindly recommend any Code Review tools that would be suitable for our Databricks tech stack?

Data Engineering
code review
  • 2644 Views
  • 2 replies
  • 0 kudos
Latest Reply
Phani1
Databricks MVP
  • 0 kudos

You can explore - SonarQube

  • 0 kudos
1 More Replies
TinasheChinyati
by New Contributor III
  • 3494 Views
  • 3 replies
  • 1 kudos

Resolved! Retention window from DLT created Delta tables

Hi guysI am working with data ingested from Azure EventHub using Delta Live Tables in databricks. Our data architecture includes the medallion approach. Our current requirement is to retain only the most recent 14 days of data in the silver layer. To...

Data Engineering
data engineer
Delta Live Tables
  • 3494 Views
  • 3 replies
  • 1 kudos
Latest Reply
TinasheChinyati
New Contributor III
  • 1 kudos

Hi @MuthuLakshmi Thank you for sharing the configurations. Here is a bit more clarity on our current workflow.DELETE and VACUUM WorkflowOur workflow involves the following:1. DELETE Operation:We delete records matching a specific predicate to mark th...

  • 1 kudos
2 More Replies
sathya08
by New Contributor III
  • 5027 Views
  • 9 replies
  • 4 kudos

Resolved! Trigger queries to SQL warehouse from Databricks notebook

Hello, I am trying to explore triggering for sql queries from Databricks notebook to serverless sql warehouse along with nest-asyncio module.Both the above are very new for me and need help on the same.For triggering the API from notebook, I am using...

  • 5027 Views
  • 9 replies
  • 4 kudos
Latest Reply
sathya08
New Contributor III
  • 4 kudos

Thankyou, it really helped.

  • 4 kudos
8 More Replies
ashap551
by New Contributor II
  • 4021 Views
  • 2 replies
  • 1 kudos

Best practices for code organization in large-scale Databricks ETL projects: Modular vs. Scripted

I’m curious about Data Engineering best practices for a large-scale data engineering project using Databricks to build a Lakehouse architecture (Bronze -> Silver -> Gold layers).I’m presently comparing two approaches of code writing to engineer the s...

  • 4021 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ashap551 ,I would vote for modular approach which lets you reuse code and write unit test in simpler manner. Notebooks are for me only "clients" of these shared modules. You can take a look at official documentation where they're following simila...

  • 1 kudos
1 More Replies
aahil824
by New Contributor
  • 1154 Views
  • 3 replies
  • 0 kudos

How to read zip folder that contains 4 .csv files

Hello Community, I have uploaded one zip folder  "dbfs:/FileStore/tables/bike_sharing.zip"  I was trying to unzip the folder and read the 4 .csv files. I was unable to do it. Any help from your side will really be grateful!

  • 1154 Views
  • 3 replies
  • 0 kudos
Latest Reply
SenthilRT
New Contributor III
  • 0 kudos

Hope this link will help. You can use cell command within a notebook to unzip (assuming you have the path access where do you want to unzip the file).https://stackoverflow.com/questions/74196011/databricks-reading-from-a-zip-file

  • 0 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 15863 Views
  • 2 replies
  • 0 kudos
  • 15863 Views
  • 2 replies
  • 0 kudos
Latest Reply
lchari
New Contributor II
  • 0 kudos

Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...

  • 0 kudos
1 More Replies
Labels