cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

igorgatis
by New Contributor II
  • 3731 Views
  • 1 replies
  • 1 kudos

How to improve Spark UI Job Description for pyspark?

I find it quite hard to understand Spark UI for my pyspark pipelines. For example, when one writes `spark.read.table("sometable").show()` it shows:I learned that `DataFrame` API actually may spawn jobs before running the actual job. In the example ab...

igorgatis_0-1697034219608.png igorgatis_1-1697034492125.png igorgatis_2-1697034528335.png
  • 3731 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @igorgatis, A polite reminder. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

  • 1 kudos
pygreg
by New Contributor
  • 1886 Views
  • 0 replies
  • 0 kudos

Workflows "Run now with different parameters" UI proposal

Hello everyone!I've been working with the Databricks platform for a few months now and I have a suggestion/proposal regarding the UI interface of Workflows.First, let me explain what I find not so ideal.Let's say we have a job with three Notebook Tas...

  • 1886 Views
  • 0 replies
  • 0 kudos
Rafal9
by New Contributor II
  • 4623 Views
  • 1 replies
  • 1 kudos

DAB: NameError: name '__file__' is not defined

Hi Everyone,I am running job task using Asset Bundle.Bundle has been validated and deployed according to: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/work-tasksPart of the databricks.yml bundle: name: etldatabricks resourc...

  • 4623 Views
  • 1 replies
  • 1 kudos
Akshay9
by New Contributor
  • 852 Views
  • 0 replies
  • 0 kudos

Databricks Optimization

I am trying to read 30 xml files and create a dataframe of the data of each node but i takes alot of time approximately 8 mins to run those files what i can i do to optimize the databricks notebook and i append the data in a databricks delta table 

  • 852 Views
  • 0 replies
  • 0 kudos
ilarsen
by Contributor
  • 5225 Views
  • 2 replies
  • 0 kudos

Resolved! Dynamically detect if any dataframe column is an array type, to perform logic on that column

Hi, I'd like to put this out here in case there are some helpful suggestions to be found. What am I trying to achieve?Generate a hash of certain columns in a dataframe (as in a row hash, but not the whole row) where currently one of the columns is an...

  • 5225 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

That is totally possible.f.e. here is a function that trims all string columns in a dataframe.  You can change it to your needs:def trim_all_string_columns(df: dataframe) -> dataframe: for c in df.schema.fields: if isinstance(c.da...

  • 0 kudos
1 More Replies
Ruby8376
by Valued Contributor
  • 1443 Views
  • 0 replies
  • 0 kudos

LOB1 databricks lakehouse to LOB2 databricks lakehouse

Current in our organisation, data is streamed from salesforce to databricks (delta tables). now requirement is another LOB wants to access and query this data in our delta tables on demand into their lakehouse. How can this be done?One option is to u...

  • 1443 Views
  • 0 replies
  • 0 kudos
tj-cycyota
by Databricks Employee
  • 10065 Views
  • 2 replies
  • 1 kudos

Whats the difference between magic commands %pip and %sh pip

In Databricks you can do either %pipor %sh pipWhats the difference? Is there a recommended approach?

  • 10065 Views
  • 2 replies
  • 1 kudos
Latest Reply
stefnhuy
New Contributor III
  • 1 kudos

Hey there, User16776431030.Great question about those magic commands in Databricks! Let me shed some light on this mystical matter.The %pip and %sh pip commands may seem similar on the surface, but they're quite distinct in their powers. %sh pip is l...

  • 1 kudos
1 More Replies
parimalpatil28
by New Contributor III
  • 10840 Views
  • 2 replies
  • 2 kudos

Resolved! Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"

Hello,I am facing issue while "Insert query or while .saveAsTable". The error is thrown by query is Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task...

  • 10840 Views
  • 2 replies
  • 2 kudos
Latest Reply
parimalpatil28
New Contributor III
  • 2 kudos

Hello @Retired_mod ,Thanks for the help.We have also investigated internally, we have found the root cause of it.Our products configuration overwriting the Databricks default spark.executor.extraclasspath confs. because of this our clusters was not a...

  • 2 kudos
1 More Replies
gopeshr
by New Contributor
  • 1975 Views
  • 0 replies
  • 0 kudos

Databricks <> snowflake connectivity

We are trying to establish connection  between databricks and  snowflake through the databricks workspaces running on cluster. Initially we assumed it would be the firewall/network blocking the traffic and tried to add a firewall rule but even after ...

gopeshr_1-1698199098184.png
  • 1975 Views
  • 0 replies
  • 0 kudos
boriste
by New Contributor II
  • 11079 Views
  • 11 replies
  • 10 kudos

Resolved! Upload to Volume inside unity catalog not possible?

 I want to upload a simple csv file to a volume which was created in our unity catalog. We are using secure cluster connectivity and our storage account (metastore) is not publicly accessable. We injected the storage in our vnet. I am getting the fol...

  • 11079 Views
  • 11 replies
  • 10 kudos
Latest Reply
jeroenvs
New Contributor III
  • 10 kudos

@AdrianaIspas We are running into the same issue. It took a while to figure out that the error message is related to this limitation. Any updates on when we can expect the limitation to be taken away? We want to secure access to our storage accounts ...

  • 10 kudos
10 More Replies
hujohnso
by New Contributor II
  • 1535 Views
  • 2 replies
  • 0 kudos

Databricks Connect V2 Never Returning Anything

I am trying to use databricks connect V2 using Azure Databricks from pycharm.  I haveCreated a cluster with runtime 13.2 in Shared Access ModeI have enabled unity catalog for the workspace and I am the account adminI have created a .databrickscfg fil...

  • 1535 Views
  • 2 replies
  • 0 kudos
Latest Reply
jackson-nline
New Contributor III
  • 0 kudos

We are also related issues, see https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/td-p/37096. However, this issue also highlights that the heartbeat that let you know a job was running on Databricks Con...

  • 0 kudos
1 More Replies
Priyag1
by Honored Contributor II
  • 4102 Views
  • 2 replies
  • 7 kudos

Migration of all Databricks SQL content to the workspace browser

Migration of all Databricks SQL content to the workspace browser Databricks will force-migrate all Databricks SQL content (dashboards, queries, alerts) to the workspace browser. Visit My Queries, My Alerts, and My Dashboards and look for any un-migra...

Data Engineering
Dashboards
Databricks SQL
Visualization
  • 4102 Views
  • 2 replies
  • 7 kudos
Latest Reply
joseheliomuller
New Contributor III
  • 7 kudos

The ability to easily migrate queries and dashboards across Databricks Workspace it extremely important.In my company we have dev, stg and production workspaces, with same pipeline creating the data.We create our dashboards in DEV and then we have to...

  • 7 kudos
1 More Replies
clapton79
by New Contributor II
  • 13155 Views
  • 5 replies
  • 7 kudos

Resolved! on-behalf-of token creation (for SPN)

I am trying to create an on-behalf-token for and SPN on my Azure Databricks Premium instance. The response is a FEATURE_DISABLED error message ("On-behalf-of token creation for service principals is not enabled for this workspace"). How do I turn on ...

  • 13155 Views
  • 5 replies
  • 7 kudos
Latest Reply
alexott
Databricks Employee
  • 7 kudos

There is no On-behalf-of token on Azure - just generate an AAD token for the Service Principal and use it to create PAT (make sure that SP has permission to use PATs). The easiest way of doing it is to use the new Databricks CLI that supports unified...

  • 7 kudos
4 More Replies
harish446
by New Contributor
  • 1794 Views
  • 1 replies
  • 0 kudos

Can a not null constraint be applied on a identity column

I had a table creation script as follows for example: CREATE TABLE default.test2          (  id BIGINT GENERATED BY DEFAULT AS IDENTITY(),                name  String)using deltalocation "/mnt/datalake/xxxx"  What are the possible ways to apply not n...

Data Engineering
data engineering
Databricks
Delta Lake
Delta tables
spark
  • 1794 Views
  • 1 replies
  • 0 kudos
Latest Reply
Krishnamatta
New Contributor III
  • 0 kudos

Hi Harish,Here is the documentation for this issuehttps://docs.databricks.com/en/tables/constraints.html

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels