cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Hubert-Dudek
by Esteemed Contributor III
  • 8158 Views
  • 1 replies
  • 1 kudos

The perfect table

Unlock the Power of #Databricks: The Perfect Table in 8 Simple Steps! 

perfec_table8.png perfec_table7.png perfec_table6.png perfec_table5.png
  • 8158 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Hubert-Dudek, Thank you for sharing this great post

  • 1 kudos
Madhur
by New Contributor
  • 1233 Views
  • 1 replies
  • 0 kudos
  • 1233 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Madhur, The difference between Auto Optimize set on Spark Session and the one set on Delta Table lies in their scope and precedence. Auto Optimize on Spark Session will apply to all Delta tables in the current session. It is a global configuratio...

  • 0 kudos
krishnaarige
by New Contributor
  • 2069 Views
  • 1 replies
  • 0 kudos

OperationalError: 250003: Failed to get the response. Hanging? method: get

OperationalError: 250003: Failed to get the response. Hanging? method: get, url: https://cdodataplatform.east-us-2.privatelink.snowflakecomputing.com:443/queries/01ae7ab6-0c04-e4bd-011c-e60552f6cf63/result?request_guid=315c25b7-f17d-4123-a2e5-6d82605...

  • 2069 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

could you please share the full error stack trace? 

  • 0 kudos
igorgatis
by New Contributor II
  • 3597 Views
  • 1 replies
  • 1 kudos

How to improve Spark UI Job Description for pyspark?

I find it quite hard to understand Spark UI for my pyspark pipelines. For example, when one writes `spark.read.table("sometable").show()` it shows:I learned that `DataFrame` API actually may spawn jobs before running the actual job. In the example ab...

igorgatis_0-1697034219608.png igorgatis_1-1697034492125.png igorgatis_2-1697034528335.png
  • 3597 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @igorgatis, A polite reminder. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

  • 1 kudos
pygreg
by New Contributor
  • 1760 Views
  • 0 replies
  • 0 kudos

Workflows "Run now with different parameters" UI proposal

Hello everyone!I've been working with the Databricks platform for a few months now and I have a suggestion/proposal regarding the UI interface of Workflows.First, let me explain what I find not so ideal.Let's say we have a job with three Notebook Tas...

  • 1760 Views
  • 0 replies
  • 0 kudos
Rafal9
by New Contributor II
  • 4382 Views
  • 1 replies
  • 1 kudos

DAB: NameError: name '__file__' is not defined

Hi Everyone,I am running job task using Asset Bundle.Bundle has been validated and deployed according to: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/work-tasksPart of the databricks.yml bundle: name: etldatabricks resourc...

  • 4382 Views
  • 1 replies
  • 1 kudos
Akshay9
by New Contributor
  • 806 Views
  • 0 replies
  • 0 kudos

Databricks Optimization

I am trying to read 30 xml files and create a dataframe of the data of each node but i takes alot of time approximately 8 mins to run those files what i can i do to optimize the databricks notebook and i append the data in a databricks delta table 

  • 806 Views
  • 0 replies
  • 0 kudos
ilarsen
by Contributor
  • 4862 Views
  • 2 replies
  • 0 kudos

Resolved! Dynamically detect if any dataframe column is an array type, to perform logic on that column

Hi, I'd like to put this out here in case there are some helpful suggestions to be found. What am I trying to achieve?Generate a hash of certain columns in a dataframe (as in a row hash, but not the whole row) where currently one of the columns is an...

  • 4862 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

That is totally possible.f.e. here is a function that trims all string columns in a dataframe.  You can change it to your needs:def trim_all_string_columns(df: dataframe) -> dataframe: for c in df.schema.fields: if isinstance(c.da...

  • 0 kudos
1 More Replies
Ruby8376
by Valued Contributor
  • 1409 Views
  • 0 replies
  • 0 kudos

LOB1 databricks lakehouse to LOB2 databricks lakehouse

Current in our organisation, data is streamed from salesforce to databricks (delta tables). now requirement is another LOB wants to access and query this data in our delta tables on demand into their lakehouse. How can this be done?One option is to u...

  • 1409 Views
  • 0 replies
  • 0 kudos
Chris_sh
by New Contributor II
  • 2164 Views
  • 0 replies
  • 0 kudos

[STREAMING_TABLE_OPERATION_NOT_ALLOWED.REQUIRES_SHARED_COMPUTE]

Currently trying to refresh a Delta Live Table using a Full Refresh but an error keeps coming up saying that we have to use a shared cluster or a SQL warehouse. I've tried both a shared cluster and a SQL warehouse and the same error keeps coming up. ...

  • 2164 Views
  • 0 replies
  • 0 kudos
tj-cycyota
by Databricks Employee
  • 9550 Views
  • 2 replies
  • 1 kudos

Whats the difference between magic commands %pip and %sh pip

In Databricks you can do either %pipor %sh pipWhats the difference? Is there a recommended approach?

  • 9550 Views
  • 2 replies
  • 1 kudos
Latest Reply
stefnhuy
New Contributor III
  • 1 kudos

Hey there, User16776431030.Great question about those magic commands in Databricks! Let me shed some light on this mystical matter.The %pip and %sh pip commands may seem similar on the surface, but they're quite distinct in their powers. %sh pip is l...

  • 1 kudos
1 More Replies
parimalpatil28
by New Contributor III
  • 10405 Views
  • 2 replies
  • 2 kudos

Resolved! Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"

Hello,I am facing issue while "Insert query or while .saveAsTable". The error is thrown by query is Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task...

  • 10405 Views
  • 2 replies
  • 2 kudos
Latest Reply
parimalpatil28
New Contributor III
  • 2 kudos

Hello @Retired_mod ,Thanks for the help.We have also investigated internally, we have found the root cause of it.Our products configuration overwriting the Databricks default spark.executor.extraclasspath confs. because of this our clusters was not a...

  • 2 kudos
1 More Replies
gopeshr
by New Contributor
  • 1890 Views
  • 0 replies
  • 0 kudos

Databricks <> snowflake connectivity

We are trying to establish connection  between databricks and  snowflake through the databricks workspaces running on cluster. Initially we assumed it would be the firewall/network blocking the traffic and tried to add a firewall rule but even after ...

gopeshr_1-1698199098184.png
  • 1890 Views
  • 0 replies
  • 0 kudos
boriste
by New Contributor II
  • 10604 Views
  • 11 replies
  • 10 kudos

Resolved! Upload to Volume inside unity catalog not possible?

 I want to upload a simple csv file to a volume which was created in our unity catalog. We are using secure cluster connectivity and our storage account (metastore) is not publicly accessable. We injected the storage in our vnet. I am getting the fol...

  • 10604 Views
  • 11 replies
  • 10 kudos
Latest Reply
jeroenvs
New Contributor III
  • 10 kudos

@AdrianaIspas We are running into the same issue. It took a while to figure out that the error message is related to this limitation. Any updates on when we can expect the limitation to be taken away? We want to secure access to our storage accounts ...

  • 10 kudos
10 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels