cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Sascha
by New Contributor III
  • 4684 Views
  • 7 replies
  • 2 kudos

Resolved! Unable to connect to Confluent from Databricks

I'm facing the same issue as this post: https://community.databricks.com/s/question/0D58Y00009DE82zSAD/databricks-kafka-read-not-connectingIn my case I'm connecting to Confluent Cloud. I'm able to ping the bootstrap server, I'm able to netstat succes...

  • 4684 Views
  • 7 replies
  • 2 kudos
Latest Reply
Sascha
New Contributor III
  • 2 kudos

Hi @Debayan Mukherjee​ , no I haven't.But with the help of Confluent I changed the statement to the below, and somehow this solved it.inputDF = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", host) .option("kafka.ssl.en...

  • 2 kudos
6 More Replies
db-avengers2rul
by Contributor II
  • 3021 Views
  • 2 replies
  • 2 kudos

Resolved! unable to replace null with 0 in dataframe using Pyspark databricks notebook (community edition)

Hello Experts,I am unable to replace nulls with 0 in a dataframe ,please refer to the screen shotfrom pyspark.sql.functions import col emp_csv_df = emp_csv_df.na.fill(0).withColumn("Total_Sal",col('sal')+col('comm')) display(emp_csv_df)erorr desired ...

unable to fill nulls with 0 in dataframe using PySpark in databricks Screenshot 2022-10-03 at 20.26.23
  • 3021 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Rakesh Reddy Gopidi​ â€‹, We haven’t heard from you since the last response from @Hubert Dudek​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to othe...

  • 2 kudos
1 More Replies
Liza
by New Contributor
  • 474 Views
  • 0 replies
  • 0 kudos

Work that involves Shift work, difficulties with sleep, and varying circumstances It is possible that this Modalert guide may not cover all of the pos...

Work that involves Shift work, difficulties with sleep, and varying circumstances It is possible that this Modalert guide may not cover all of the possible applications for Modalert 200.Modafinil is included in the formulation known as Modalert 200 T...

  • 474 Views
  • 0 replies
  • 0 kudos
Stita
by New Contributor II
  • 2674 Views
  • 3 replies
  • 3 kudos

Resolved! How do we pass the row tags dynamically while reading a XML file into a dataframe?

I have a set of xml files where the row tags change dynamically. How can we achieve this scenario in databricks.df1=spark.read.format('xml').option('rootTag','XRoot').option('rowTag','PL1PLLL').load("dbfs:/FileStore/tables/ins/")We need to pass a val...

  • 2674 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

If it is dynamically for the whole file, you can just use variabletag = 'PL1PLLL' df1=spark.read.format('xml').option('rootTag','XRoot').option('rowTag' ,tag).load("dbfs:/FileStore/tables/ins/file.xml")

  • 3 kudos
2 More Replies
Taha_Hussain
by Valued Contributor II
  • 2166 Views
  • 3 replies
  • 8 kudos

Register for Databricks Office HoursOctober 12: 8:00 - 9:00 AM PT | 3:00 - 4:00 PM GMTOctober 26: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Databric...

Register for Databricks Office HoursOctober 12: 8:00 - 9:00 AM PT | 3:00 - 4:00 PM GMTOctober 26: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer all your Databricks questions.Join us to...

  • 2166 Views
  • 3 replies
  • 8 kudos
Latest Reply
Taha_Hussain
Valued Contributor II
  • 8 kudos

Here are some of the Questions and Answers from the 10/12 Office Hours (note: certain questions and answers have been condensed for reposting purposes):Q: What is the best approach for moving data from on-prem S3 storage into cloud blob storage into ...

  • 8 kudos
2 More Replies
Carlton
by Contributor
  • 4031 Views
  • 8 replies
  • 1 kudos

Resolved! How to Use the CharIndex with Databricks SQL

When applying the following T-SQL I don't get any errors on MS SQL ServerSELECT DISTINCT *   FROM dbo.account LEFT OUTER JOIN dbo.crm2cburl_lookup ON account.Id = CRM2CBURL_Lookup.[Key] LEFT OUTER JOIN dbo.organizations ON CRM2CBURL_Lookup.CB_UR...

  • 4031 Views
  • 8 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

cross apply is not a function in databricks sql.

  • 1 kudos
7 More Replies
Sulfikkar
by Contributor
  • 12732 Views
  • 5 replies
  • 3 kudos

Resolved! install a custom python package from azure devops artifact to databricks cluster

I am trying to install a package which was uploaded into the azure devops artifact into the databricks cluster by using pip.conf. Basically below are the steps I followed.(step 1 : install in local IDE)Uploaded the package to azure devops feed using ...

  • 12732 Views
  • 5 replies
  • 3 kudos
Latest Reply
Sulfikkar
Contributor
  • 3 kudos

Thanks for your time @Debayan Mukherjee​  and @Kaniz Fatma​ . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.I have checked the ip adress from the Spark cluster U...

  • 3 kudos
4 More Replies
kfoster
by Contributor
  • 1580 Views
  • 1 replies
  • 0 kudos

Resolved! DLT Pipelines call same table

Orchestration of when DLT runs is handled by Azure Data Factory. There are scenario's a table within a DLT pipeline needs to run on a different schedule.Is there a pipeline configuration option to be set to allow the same table to be ran by two diff...

  • 1580 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 0 kudos

Hi @Kristian Foster​ , It should not be possible. Every pipeline owns its table and multiple pipelines cannot write to the same table.

  • 0 kudos
hare
by New Contributor III
  • 3739 Views
  • 3 replies
  • 6 kudos

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Hi All, We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".We are getting an error "AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned scr...

  • 3739 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16856839485
New Contributor II
  • 6 kudos

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga...

  • 6 kudos
2 More Replies
leos1
by New Contributor II
  • 1333 Views
  • 2 replies
  • 0 kudos

Resolved! Question regarding ZORDER option of OPTIMIZE

Is the order of the columns in ZORDER important? For example, does ZORDER BY (product, site) and ZORDER BY (site, product) produce the same results?

  • 1333 Views
  • 2 replies
  • 0 kudos
Latest Reply
leos1
New Contributor II
  • 0 kudos

thanks for the quick reply

  • 0 kudos
1 More Replies
Matt101122
by Contributor
  • 1650 Views
  • 1 replies
  • 1 kudos

Resolved! why aren't rdds using all available cores of executor?

I'm extracting data from a custom format by day of month using a 32 core executor. I'm using rdds to distribute work across cores of the executor. I'm seeing an intermittent issue where for a run sometimes I see 31 cores being used as expected and ot...

image image
  • 1650 Views
  • 1 replies
  • 1 kudos
Latest Reply
Matt101122
Contributor
  • 1 kudos

I may have figured this out! I'm explicitly setting the number of slices instead of using the default.days_rdd = sc.parallelize(days_to_process,len(days_to_process))

  • 1 kudos
enavuio
by New Contributor II
  • 1323 Views
  • 2 replies
  • 3 kudos

Count on External Table to Azure Data Storage is taking too long

I have created an External table to Azure Data Lake Storage Gen2.The Container has about 200K Json files.The structure of the json files are created with```CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(    ComponentInfo STRUCT<ComponentHost: STRING, ...

  • 1323 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ena Vu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
1 More Replies
parthsalvi
by Contributor
  • 1845 Views
  • 3 replies
  • 1 kudos

Unable to update permissions in Unity Catalog object in Single User Mode DBR 11.2

We're trying to update permissions of catalogs in Single User Cluster Mode but running into following error We were able to update permission in Shared Mode. We used Shared mode to create objects but using single user mode to update permission seems...

image.png
  • 1845 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Parth Salvi​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 1 kudos
2 More Replies
plynton
by New Contributor II
  • 1406 Views
  • 2 replies
  • 2 kudos

Resolved! Dataframe to update subset of fields in table...

I have a table that I'll update with multiple inputs (csv). Is there a simple way to update my target when the source fields won't be a 1:1 match? Another challenge I've run into is that my sources don't have a header field, though I guess I could ...

  • 1406 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Peter Ott​ â€‹, We haven’t heard from you since the last response from @Hubert Dudek​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherw...

  • 2 kudos
1 More Replies
AJMorgan591
by New Contributor II
  • 2524 Views
  • 4 replies
  • 0 kudos

Temporarily disable Photon

Is it possible to temporarily disable Photon?I have a large workload that greatly benefits from Photon apart from a specific operation therein that is actually slowed by Photon. It's not worth creating a separate cluster for this operation however, s...

  • 2524 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Aaron Morgan​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 0 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels