cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shoumitra
by New Contributor
  • 1541 Views
  • 1 replies
  • 0 kudos

Resolved! Pathway advice on how to Data Engineer Associate

Hi everyone,I am new to this community and I am a BI/Data Engineer by trade in Microsoft Azure/On prem context. I want some advice on how to be a certified Data Engineer Associate in Databiricks. The training, lesson or courses to be eligible for tak...

  • 1541 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @shoumitra ,You can register at databricks academy. There's a plenty of free learning paths depending on what you're interested in.https://customer-academy.databricks.com/For example below you can find free Data Engineer Learning Plan that will pr...

  • 0 kudos
jar
by Contributor
  • 2499 Views
  • 1 replies
  • 0 kudos

Disable Photon for serverless SQL DW

Hello.Is it possible to disable Photon for a serverless SQL DW? If yes, how?Best,Johan.

  • 2499 Views
  • 1 replies
  • 0 kudos
Latest Reply
CURIOUS_DE
Valued Contributor
  • 0 kudos

No, it is not possible to disable Photon for Databricks Serverless SQL Warehouses.Why Photon Cannot Be Disabled:Photon is always enabled on Serverless SQL Warehouses as part of Databricks’ architecture.Serverless SQL is built on Photon to ensure high...

  • 0 kudos
seefoods
by Valued Contributor
  • 1485 Views
  • 3 replies
  • 2 kudos

Resolved! batch process autoloader

My job continue to running after is finished susccessfully this i my case, i enable useNotification if self.autoloader_config.use_autoloader: logger_file_ingestion.info("debut d'ecriture en mode streaming") if self.write_mode.value.lower() == "...

  • 1485 Views
  • 3 replies
  • 2 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 2 kudos

Hi @seefoods ,If it works, you can mark my answer as a solution so that if someone has the same problem, it will be easier to find an answer.

  • 2 kudos
2 More Replies
rpshgupta
by New Contributor III
  • 3151 Views
  • 11 replies
  • 5 kudos

How to find the source code for the data engineering learning path?

Hi Everyone,I am taking data engineering learning path in customer-academy.databricks.com . I am not able to find any source code attached to the course. Can you please help me to find it so that I can try hands on as well ?ThanksRupesh

  • 3151 Views
  • 11 replies
  • 5 kudos
Latest Reply
sselvaganapathy
New Contributor II
  • 5 kudos

Please refer the below link, there is no more Demo code provided by Databricks.https://community.databricks.com/t5/databricks-academy-learners/how-to-download-demo-notebooks-for-data-engineer-learning-plan/td-p/105362 

  • 5 kudos
10 More Replies
Bob-
by New Contributor II
  • 2815 Views
  • 3 replies
  • 4 kudos

Resolved! Upload Screenshot

I am new to the Databricks Free Edition. I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it. It is not letting me upload a .png file. After several attempts I am being told that the root cause is p...

  • 2815 Views
  • 3 replies
  • 4 kudos
Latest Reply
Sharanya13
Contributor III
  • 4 kudos

@Bob-  Can you explain your use case? I'm not sure I understand "I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it."Are you trying to perform OCR?

  • 4 kudos
2 More Replies
Phani1
by Databricks MVP
  • 2509 Views
  • 4 replies
  • 2 kudos

Potential Challenges of Using Iceberg Format (Databricks + Iceberg)

 Hi Team,What are the potential challenges of using Iceberg format instead of Delta for saving data in databricks?Regards,Phani

  • 2509 Views
  • 4 replies
  • 2 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 2 kudos

Hi @Phani1 , Please find the below link which details out maintaining icerberg metadata along with delta metadata. https://community.databricks.com/t5/technical-blog/read-delta-tables-with-snowflake-via-unity-catalog/ba-p/115877

  • 2 kudos
3 More Replies
stevewb
by New Contributor III
  • 1082 Views
  • 1 replies
  • 0 kudos

Setting shuffle partitions in Databricks SQL Warehouse

I think it used to be possible to set shuffle partitions in databricks sql warehouse through e.g.: SET spark.sql.shuffle.partitions=20000. However, when I run this now, I get the error:[CONFIG_NOT_AVAILABLE] Configuration spark.sql.shuffle.partitions...

  • 1082 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @stevewb ,It's not available anymore. According with documentation:" Databricks SQL allows admins to configure Spark properties for data access in the workspace settings menu. See Data access configurationsOther than data access configurations, Da...

  • 0 kudos
Somia
by New Contributor III
  • 3328 Views
  • 7 replies
  • 2 kudos

Resolved! sql query is not returning _sqldf.

Notebooks in my workspace are not returning _sqldf when a sql query is run. If I run this code, it would give an error in second cell that _sqldf is not defined.First Cell:%sqlselect * from some_table limit 10Second Cell:%sqlselect * from _sqldfHowev...

  • 3328 Views
  • 7 replies
  • 2 kudos
Latest Reply
Somia
New Contributor III
  • 2 kudos

Changing the notebook to default python and all purpose compute have fixed the issue. I am able to access _sqldf in subsequent sql or python cell.

  • 2 kudos
6 More Replies
anilsampson
by New Contributor III
  • 2155 Views
  • 2 replies
  • 3 kudos

Resolved! How to get previous version of the table in databricks sql dynamically

hello, im trying to get the previous version of a delta table using timestamp but databricks sql does not allow to use variables the only thing i can do is use TIMESTAMP AS OF CURRENT_DATE() -1 if i have refreshed the table today.please let me know i...

  • 2155 Views
  • 2 replies
  • 3 kudos
Latest Reply
anilsampson
New Contributor III
  • 3 kudos

thank you @Vidhi_Khaitan  .Is there an upgrade or use case in works where we can pass parameters via workflow while triggering a databricks dashboard?

  • 3 kudos
1 More Replies
Divya_Bhadauria
by New Contributor III
  • 1430 Views
  • 1 replies
  • 0 kudos

Update databricks job parameter with CLI

Use Case:Updating a Databricks job with multiple tasks can be time-consuming and error-prone when changes (such as adding new parameters) need to be applied to each task manually.Possible Solutions:1. Using Databricks CLI – jobs reset commandYou can ...

Divya_Bhadauria_1-1751740411129.png Divya_Bhadauria_0-1751740346442.png
  • 1430 Views
  • 1 replies
  • 0 kudos
Latest Reply
anilsampson
New Contributor III
  • 0 kudos

hello Divya, Could you also try YAML and update your task accordingly and deploy it as a part of asset bundles? let me know if you feel both are same? Regards,Anil.

  • 0 kudos
zach
by New Contributor III
  • 1159 Views
  • 1 replies
  • 0 kudos

Get the total amount of S3 storage used per user

In Databricks is it possible to get the total amount of delta lake storage being used in the parquet format per user? Subsequently, what are the best practices on making sure that users saving delta files are not taking up storage unnecessarily, for ...

  • 1159 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sharanya13
Contributor III
  • 0 kudos

Hi @zach, can you expand on why you need to know the total storage per user?Best practices - If you use Databricks managed tables, optimization is taken care of. https://docs.databricks.com/aws/en/optimizations/predictive-optimization

  • 0 kudos
AbhayAgarwal
by New Contributor
  • 3782 Views
  • 1 replies
  • 0 kudos

DB to Snowflake connection error

We are getting below mentioned error after upgrading Databricks server to 15 version - error while making connection to Snowflake in Notebook " Bad request; operation not supported." .Has any one got this error  ? Any pointers how to fix it ? 

  • 3782 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

Hi Abhay, Here are some general troubleshooting steps and pointers to help you resolve this issue: Ensure that you are using the correct connection configurations for Snowflake. Configuration mismatches can sometimes lead to operation errors.  Using ...

  • 0 kudos
pooja_bhumandla
by New Contributor III
  • 1517 Views
  • 2 replies
  • 2 kudos

Resolved! Small Files Persist After OPTIMIZE with Target File Size Set to 100MB – Seeking Possible Reasons

I'm currently working on optimizing a Delta table in Databricks. As part of this, I’ve increased the target file size from the (~33MB) to 100MB using the OPTIMIZE command. However, after running the OPTIMIZE operation, I still observe a large number ...

  • 1517 Views
  • 2 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi pooja_bhumandla,Great question! How are you doing today? Even after running the OPTIMIZE command with a higher target file size like 100MB, it’s common to still see some small files in your Delta table—especially in partitions with very little dat...

  • 2 kudos
1 More Replies
PeSe
by New Contributor
  • 1187 Views
  • 2 replies
  • 1 kudos

How to fast sync large files (> 100GB)

I want to sync large files (>100GB) from my local system to a DBX Volume. I see 2 Options with different problems, do you have suggestions?Option 1: Needs to open the file completely -> Memory issues with open(local_file_path, 'rb') as file: ...

  • 1187 Views
  • 2 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi PeSe,How are you doing today? As per my understanding, You're absolutely right to think through both options carefully. Option 1 runs into memory issues because it's trying to read the whole large file into memory at once, which doesn't work well ...

  • 1 kudos
1 More Replies
Sainath368
by Contributor
  • 1799 Views
  • 6 replies
  • 2 kudos

Clarification Needed: COMPUTE STATISTICS vs COMPUTE DELTA STATISTICS on Delta Tables

Hi everyone,I’m trying to understand the difference between the two commands in Databricks:ANALYZE TABLE <table_name> COMPUTE STATISTICSANALYZE TABLE <table_name> COMPUTE DELTA STATISTICSSpecifically:What exactly does each command do, and how do they...

  • 1799 Views
  • 6 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

If you want to know more about query optimization I suggest you look in Spark's Catalyst Optimizer and Adaptive Query Execution (AQE).  You should alway run Analyze table compute statistics becuase this will help Spark's query optimization converge o...

  • 2 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels