cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Stita
by New Contributor II
  • 2960 Views
  • 1 replies
  • 2 kudos

Resolved! How do we pass the row tags dynamically while reading a XML file into a dataframe?

I have a set of xml files where the row tags change dynamically. How can we achieve this scenario in databricks.df1=spark.read.format('xml').option('rootTag','XRoot').option('rowTag','PL1PLLL').load("dbfs:/FileStore/tables/ins/")We need to pass a val...

  • 2960 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

If it is dynamically for the whole file, you can just use variabletag = 'PL1PLLL' df1=spark.read.format('xml').option('rootTag','XRoot').option('rowTag' ,tag).load("dbfs:/FileStore/tables/ins/file.xml")

  • 2 kudos
Taha_Hussain
by Databricks Employee
  • 2359 Views
  • 2 replies
  • 8 kudos

Register for Databricks Office HoursOctober 12: 8:00 - 9:00 AM PT | 3:00 - 4:00 PM GMTOctober 26: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Databric...

Register for Databricks Office HoursOctober 12: 8:00 - 9:00 AM PT | 3:00 - 4:00 PM GMTOctober 26: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer all your Databricks questions.Join us to...

  • 2359 Views
  • 2 replies
  • 8 kudos
Latest Reply
Taha_Hussain
Databricks Employee
  • 8 kudos

Here are some of the Questions and Answers from the 10/12 Office Hours (note: certain questions and answers have been condensed for reposting purposes):Q: What is the best approach for moving data from on-prem S3 storage into cloud blob storage into ...

  • 8 kudos
1 More Replies
Carlton
by Contributor
  • 4642 Views
  • 8 replies
  • 1 kudos

Resolved! How to Use the CharIndex with Databricks SQL

When applying the following T-SQL I don't get any errors on MS SQL ServerSELECT DISTINCT *   FROM dbo.account LEFT OUTER JOIN dbo.crm2cburl_lookup ON account.Id = CRM2CBURL_Lookup.[Key] LEFT OUTER JOIN dbo.organizations ON CRM2CBURL_Lookup.CB_UR...

  • 4642 Views
  • 8 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

cross apply is not a function in databricks sql.

  • 1 kudos
7 More Replies
Sulfikkar
by Contributor
  • 14250 Views
  • 4 replies
  • 3 kudos

Resolved! install a custom python package from azure devops artifact to databricks cluster

I am trying to install a package which was uploaded into the azure devops artifact into the databricks cluster by using pip.conf. Basically below are the steps I followed.(step 1 : install in local IDE)Uploaded the package to azure devops feed using ...

  • 14250 Views
  • 4 replies
  • 3 kudos
Latest Reply
Sulfikkar
Contributor
  • 3 kudos

Thanks for your time @Debayan Mukherjee​  and @Kaniz Fatma​ . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.I have checked the ip adress from the Spark cluster U...

  • 3 kudos
3 More Replies
joselita
by New Contributor III
  • 22935 Views
  • 4 replies
  • 8 kudos

AnalysisException: is not a Delta table.

Hello, I changed the DBR from 7.2 to 10.4 and I receive the following error: AnalysisException: is not a Delta table. The table is create , using DELTA. so for sure is a Delta table, even though, I read that I read that from vers. 8 all tables are De...

STG_DATA_LOAD
  • 22935 Views
  • 4 replies
  • 8 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 8 kudos

Hi @JOSELITA MOLTISANTI​ can you run the following commands and share the output? table_name = "stg_data_load" path = spark.sql(f"describe detail {table_name}").select("location").collect()[0][0].replace('dbfs:', '') dbutils.fs.ls(path)

  • 8 kudos
3 More Replies
kfoster
by Contributor
  • 1740 Views
  • 1 replies
  • 0 kudos

Resolved! DLT Pipelines call same table

Orchestration of when DLT runs is handled by Azure Data Factory. There are scenario's a table within a DLT pipeline needs to run on a different schedule.Is there a pipeline configuration option to be set to allow the same table to be ran by two diff...

  • 1740 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 0 kudos

Hi @Kristian Foster​ , It should not be possible. Every pipeline owns its table and multiple pipelines cannot write to the same table.

  • 0 kudos
StephanieAlba
by Databricks Employee
  • 8698 Views
  • 6 replies
  • 9 kudos

Resolved! How do I kick off Azure Data Factory from within Databricks?

I want to kick off ingestion in ADF from Databricks. When ADF ingestion is done, my DBX bronze-silver-gold pipeline follows within DBX.I see it is possible to call Databricks notebooks from ADF. Can I also go the other way? I want to start the ingest...

  • 8698 Views
  • 6 replies
  • 9 kudos
Latest Reply
KKo
Contributor III
  • 9 kudos

Are you looking to pass output of databricks notebook to ADF?

  • 9 kudos
5 More Replies
hare
by New Contributor III
  • 4033 Views
  • 1 replies
  • 5 kudos

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Hi All, We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".We are getting an error "AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned scr...

  • 4033 Views
  • 1 replies
  • 5 kudos
Latest Reply
User16856839485
Databricks Employee
  • 5 kudos

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga...

  • 5 kudos
leos1
by New Contributor II
  • 1509 Views
  • 2 replies
  • 0 kudos

Resolved! Question regarding ZORDER option of OPTIMIZE

Is the order of the columns in ZORDER important? For example, does ZORDER BY (product, site) and ZORDER BY (site, product) produce the same results?

  • 1509 Views
  • 2 replies
  • 0 kudos
Latest Reply
leos1
New Contributor II
  • 0 kudos

thanks for the quick reply

  • 0 kudos
1 More Replies
Trey
by New Contributor III
  • 2699 Views
  • 2 replies
  • 6 kudos

Resolved! Is it a good idea to use a managed delta table as a temporal table?

Hi all!I would like to use a managed delta table as a temporal table, meaning:to create a managed table in the middle of ETL processto drop the managed table right after the processThis way I can perform merge, insert, or delete oprations better than...

  • 2699 Views
  • 2 replies
  • 6 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 6 kudos

@Kwangwon Yi​ Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.If you have good use case on Reporting, best approach is to go with external storage location to store your managed t...

  • 6 kudos
1 More Replies
Matt101122
by Contributor
  • 1996 Views
  • 1 replies
  • 1 kudos

Resolved! why aren't rdds using all available cores of executor?

I'm extracting data from a custom format by day of month using a 32 core executor. I'm using rdds to distribute work across cores of the executor. I'm seeing an intermittent issue where for a run sometimes I see 31 cores being used as expected and ot...

image image
  • 1996 Views
  • 1 replies
  • 1 kudos
Latest Reply
Matt101122
Contributor
  • 1 kudos

I may have figured this out! I'm explicitly setting the number of slices instead of using the default.days_rdd = sc.parallelize(days_to_process,len(days_to_process))

  • 1 kudos
enavuio
by New Contributor II
  • 1485 Views
  • 2 replies
  • 3 kudos

Count on External Table to Azure Data Storage is taking too long

I have created an External table to Azure Data Lake Storage Gen2.The Container has about 200K Json files.The structure of the json files are created with```CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(    ComponentInfo STRUCT<ComponentHost: STRING, ...

  • 1485 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ena Vu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
1 More Replies
parthsalvi
by Contributor
  • 2079 Views
  • 3 replies
  • 1 kudos

Unable to update permissions in Unity Catalog object in Single User Mode DBR 11.2

We're trying to update permissions of catalogs in Single User Cluster Mode but running into following error We were able to update permission in Shared Mode. We used Shared mode to create objects but using single user mode to update permission seems...

image.png
  • 2079 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Parth Salvi​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 1 kudos
2 More Replies
AJMorgan591
by New Contributor II
  • 3027 Views
  • 4 replies
  • 0 kudos

Temporarily disable Photon

Is it possible to temporarily disable Photon?I have a large workload that greatly benefits from Photon apart from a specific operation therein that is actually slowed by Photon. It's not worth creating a separate cluster for this operation however, s...

  • 3027 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Aaron Morgan​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 0 kudos
3 More Replies
refint650
by New Contributor II
  • 7063 Views
  • 4 replies
  • 0 kudos

Resolved! String converstion to datetimestamp format

Hello i'm converting hana sql code in databricks. we have 4 columns all in string format, start date, start time, end date, endtime..1) what expression i can use to convert values of startdate & start time from string format to datetimeformat wit...

image
  • 7063 Views
  • 4 replies
  • 0 kudos
Latest Reply
refint650
New Contributor II
  • 0 kudos

Hello Mattconcat & to_timstamp function partially worked, values with 24 timestamp format not converted. any other approach i can think .? 

  • 0 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels