cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

irispan
by New Contributor II
  • 2148 Views
  • 4 replies
  • 1 kudos

Recommended Hive metastore pattern for Trino integration

Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...

test - Databricks
  • 2148 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunlinZeng
New Contributor II
  • 1 kudos

> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...

  • 1 kudos
3 More Replies
CDICSteph
by New Contributor
  • 1397 Views
  • 2 replies
  • 0 kudos

Need pattern for loading a million small XML files

Hi, looking for the right solution pattern for this scenario: We have millions of relatively small XML files (currently sitting in ADLS) that we have to load into delta lake. Each XML file has to be read, parsed, and pivoted before writing to a delta...

  • 1397 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Steph Swierenga​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
1 More Replies
Jfoxyyc
by Valued Contributor
  • 1296 Views
  • 2 replies
  • 0 kudos

DLT - deduplication pattern?

Say we have an incremental append happening using autoloader, where filename is being added to the dataframe and that's all. If we want to de-duplicate this data in a rolling window, we can do something like merge into logs using dedupedLogs on ...

  • 1296 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Jordan Fox​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Srikanth_Gupta_
by Valued Contributor
  • 757 Views
  • 1 replies
  • 0 kudos
  • 757 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Yes we can using below code snippetspark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribePattern", "topic.*") .load()

  • 0 kudos
Labels