cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shoumitra
by New Contributor
  • 936 Views
  • 1 replies
  • 0 kudos

Resolved! Pathway advice on how to Data Engineer Associate

Hi everyone,I am new to this community and I am a BI/Data Engineer by trade in Microsoft Azure/On prem context. I want some advice on how to be a certified Data Engineer Associate in Databiricks. The training, lesson or courses to be eligible for tak...

  • 936 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @shoumitra ,You can register at databricks academy. There's a plenty of free learning paths depending on what you're interested in.https://customer-academy.databricks.com/For example below you can find free Data Engineer Learning Plan that will pr...

  • 0 kudos
jar
by Contributor
  • 2320 Views
  • 1 replies
  • 0 kudos

Disable Photon for serverless SQL DW

Hello.Is it possible to disable Photon for a serverless SQL DW? If yes, how?Best,Johan.

  • 2320 Views
  • 1 replies
  • 0 kudos
Latest Reply
CURIOUS_DE
Contributor III
  • 0 kudos

No, it is not possible to disable Photon for Databricks Serverless SQL Warehouses.Why Photon Cannot Be Disabled:Photon is always enabled on Serverless SQL Warehouses as part of Databricks’ architecture.Serverless SQL is built on Photon to ensure high...

  • 0 kudos
seefoods
by Valued Contributor
  • 1266 Views
  • 3 replies
  • 2 kudos

Resolved! batch process autoloader

My job continue to running after is finished susccessfully this i my case, i enable useNotification if self.autoloader_config.use_autoloader: logger_file_ingestion.info("debut d'ecriture en mode streaming") if self.write_mode.value.lower() == "...

  • 1266 Views
  • 3 replies
  • 2 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 2 kudos

Hi @seefoods ,If it works, you can mark my answer as a solution so that if someone has the same problem, it will be easier to find an answer.

  • 2 kudos
2 More Replies
rpshgupta
by New Contributor III
  • 2688 Views
  • 11 replies
  • 5 kudos

How to find the source code for the data engineering learning path?

Hi Everyone,I am taking data engineering learning path in customer-academy.databricks.com . I am not able to find any source code attached to the course. Can you please help me to find it so that I can try hands on as well ?ThanksRupesh

  • 2688 Views
  • 11 replies
  • 5 kudos
Latest Reply
sselvaganapathy
New Contributor II
  • 5 kudos

Please refer the below link, there is no more Demo code provided by Databricks.https://community.databricks.com/t5/databricks-academy-learners/how-to-download-demo-notebooks-for-data-engineer-learning-plan/td-p/105362 

  • 5 kudos
10 More Replies
Bob-
by New Contributor II
  • 2616 Views
  • 3 replies
  • 4 kudos

Resolved! Upload Screenshot

I am new to the Databricks Free Edition. I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it. It is not letting me upload a .png file. After several attempts I am being told that the root cause is p...

  • 2616 Views
  • 3 replies
  • 4 kudos
Latest Reply
Sharanya13
Contributor III
  • 4 kudos

@Bob-  Can you explain your use case? I'm not sure I understand "I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it."Are you trying to perform OCR?

  • 4 kudos
2 More Replies
yzhang
by New Contributor III
  • 2025 Views
  • 5 replies
  • 0 kudos

iceberg with partitionedBy option

I am able to create a UnityCatalog iceberg format table:    df.writeTo(full_table_name).using("iceberg").create()However, if I am adding option partitionedBy I will get an error.  df.writeTo(full_table_name).using("iceberg").partitionedBy("ingest_dat...

  • 2025 Views
  • 5 replies
  • 0 kudos
Latest Reply
yzhang
New Contributor III
  • 0 kudos

I am not trying to alter the table with partitionedBy option. To clarify, I wanted to create the (new) table with option partitionedBy and iceberg format but it failed due to Databricks error. I had to create the table without partitionedBy with iceb...

  • 0 kudos
4 More Replies
Phani1
by Valued Contributor II
  • 1928 Views
  • 4 replies
  • 2 kudos

Potential Challenges of Using Iceberg Format (Databricks + Iceberg)

 Hi Team,What are the potential challenges of using Iceberg format instead of Delta for saving data in databricks?Regards,Phani

  • 1928 Views
  • 4 replies
  • 2 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 2 kudos

Hi @Phani1 , Please find the below link which details out maintaining icerberg metadata along with delta metadata. https://community.databricks.com/t5/technical-blog/read-delta-tables-with-snowflake-via-unity-catalog/ba-p/115877

  • 2 kudos
3 More Replies
stevewb
by New Contributor III
  • 916 Views
  • 1 replies
  • 0 kudos

Setting shuffle partitions in Databricks SQL Warehouse

I think it used to be possible to set shuffle partitions in databricks sql warehouse through e.g.: SET spark.sql.shuffle.partitions=20000. However, when I run this now, I get the error:[CONFIG_NOT_AVAILABLE] Configuration spark.sql.shuffle.partitions...

  • 916 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @stevewb ,It's not available anymore. According with documentation:" Databricks SQL allows admins to configure Spark properties for data access in the workspace settings menu. See Data access configurationsOther than data access configurations, Da...

  • 0 kudos
Somia
by New Contributor III
  • 2760 Views
  • 7 replies
  • 2 kudos

Resolved! sql query is not returning _sqldf.

Notebooks in my workspace are not returning _sqldf when a sql query is run. If I run this code, it would give an error in second cell that _sqldf is not defined.First Cell:%sqlselect * from some_table limit 10Second Cell:%sqlselect * from _sqldfHowev...

  • 2760 Views
  • 7 replies
  • 2 kudos
Latest Reply
Somia
New Contributor III
  • 2 kudos

Changing the notebook to default python and all purpose compute have fixed the issue. I am able to access _sqldf in subsequent sql or python cell.

  • 2 kudos
6 More Replies
anilsampson
by New Contributor III
  • 1491 Views
  • 2 replies
  • 3 kudos

Resolved! How to get previous version of the table in databricks sql dynamically

hello, im trying to get the previous version of a delta table using timestamp but databricks sql does not allow to use variables the only thing i can do is use TIMESTAMP AS OF CURRENT_DATE() -1 if i have refreshed the table today.please let me know i...

  • 1491 Views
  • 2 replies
  • 3 kudos
Latest Reply
anilsampson
New Contributor III
  • 3 kudos

thank you @Vidhi_Khaitan  .Is there an upgrade or use case in works where we can pass parameters via workflow while triggering a databricks dashboard?

  • 3 kudos
1 More Replies
Divya_Bhadauria
by New Contributor III
  • 1245 Views
  • 1 replies
  • 0 kudos

Update databricks job parameter with CLI

Use Case:Updating a Databricks job with multiple tasks can be time-consuming and error-prone when changes (such as adding new parameters) need to be applied to each task manually.Possible Solutions:1. Using Databricks CLI – jobs reset commandYou can ...

Divya_Bhadauria_1-1751740411129.png Divya_Bhadauria_0-1751740346442.png
  • 1245 Views
  • 1 replies
  • 0 kudos
Latest Reply
anilsampson
New Contributor III
  • 0 kudos

hello Divya, Could you also try YAML and update your task accordingly and deploy it as a part of asset bundles? let me know if you feel both are same? Regards,Anil.

  • 0 kudos
zach
by New Contributor III
  • 1067 Views
  • 1 replies
  • 0 kudos

Get the total amount of S3 storage used per user

In Databricks is it possible to get the total amount of delta lake storage being used in the parquet format per user? Subsequently, what are the best practices on making sure that users saving delta files are not taking up storage unnecessarily, for ...

  • 1067 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sharanya13
Contributor III
  • 0 kudos

Hi @zach, can you expand on why you need to know the total storage per user?Best practices - If you use Databricks managed tables, optimization is taken care of. https://docs.databricks.com/aws/en/optimizations/predictive-optimization

  • 0 kudos
Loinguyen318
by New Contributor II
  • 1990 Views
  • 2 replies
  • 0 kudos

Resolved! Public DBFS root is disabled in Databricks free edition

I am using notebook to execute a sample spark to write delta table in dbfs using free edition. However, I face an issue, that I can not access the public DBFS after the code executed.The spark code such as:data = spark.range(0, 5)data.write.format("d...

  • 1990 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sharanya13
Contributor III
  • 0 kudos

Can you use UCVolumes instead of DBFS?Can you use UCVolumes instead of DBFS? Databricks will disable DBFS as it moves to a serverless approach. I would use UCVoulmes - convenient and governed by UC.Databricks will disable DBFS, as it adopts a serverl...

  • 0 kudos
1 More Replies
AbhayAgarwal
by New Contributor
  • 3315 Views
  • 1 replies
  • 0 kudos

DB to Snowflake connection error

We are getting below mentioned error after upgrading Databricks server to 15 version - error while making connection to Snowflake in Notebook " Bad request; operation not supported." .Has any one got this error  ? Any pointers how to fix it ? 

  • 3315 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

Hi Abhay, Here are some general troubleshooting steps and pointers to help you resolve this issue: Ensure that you are using the correct connection configurations for Snowflake. Configuration mismatches can sometimes lead to operation errors.  Using ...

  • 0 kudos
pooja_bhumandla
by New Contributor III
  • 1121 Views
  • 2 replies
  • 2 kudos

Resolved! Small Files Persist After OPTIMIZE with Target File Size Set to 100MB – Seeking Possible Reasons

I'm currently working on optimizing a Delta table in Databricks. As part of this, I’ve increased the target file size from the (~33MB) to 100MB using the OPTIMIZE command. However, after running the OPTIMIZE operation, I still observe a large number ...

  • 1121 Views
  • 2 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi pooja_bhumandla,Great question! How are you doing today? Even after running the OPTIMIZE command with a higher target file size like 100MB, it’s common to still see some small files in your Delta table—especially in partitions with very little dat...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels