cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alexiswl
by Contributor
  • 3794 Views
  • 4 replies
  • 0 kudos

Resolved! Create a UDF Table Function with DLT in UC

Hello, I am trying to generate a DLT but need to use a UDF Table Function in the process.  This is what I have so far, everything works (without e CREATE OR REFRESH LIVE TABLE wrapper)```sqlCREATE OR REPLACE FUNCTION silver.portal.get_workflows_from_...

  • 3794 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @alexiswl , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
3 More Replies
JonLaRose
by New Contributor III
  • 2150 Views
  • 4 replies
  • 0 kudos

Resolved! Max amount of tables

Hi!What is the maximum amount of tables that is possible to create in a Unity catalog?Is there any difference between managed and external tables? If so, what is the limit for external tables? Thanks,Jonathan.

  • 2150 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @JonLaRose , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
3 More Replies
js54123875
by New Contributor III
  • 4391 Views
  • 3 replies
  • 0 kudos

Resolved! Power BI - Databricks Connection using Service Principal PAT Refresh

What is best practice for automatically refreshing service princpal PAT in Power BI for a connection to a Databricks dataset? Ideally when the PAT is updated it will automatically be stored in Azure Key Vault, is there a way that Power BI can pick it...

Data Engineering
Azure Key Vault
Personal Access Token
Power BI
Service Principal
  • 4391 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @js54123875 , Certainly! Refreshing a Power BI dataset with a Service Principal and managing PATs can be achieved through a combination of best practices.    Let’s explore some approaches:   Service Principal and Azure Key Vault: Create a Service ...

  • 0 kudos
2 More Replies
rt-slowth
by Contributor
  • 2598 Views
  • 2 replies
  • 0 kudos

Resolved! how to run @dlt pipeline in vscode

I want to test a pipeline created using dlt and python in vscode.

  • 2598 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @rt-slowth , To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This will also help other community members who may have similar ques...

  • 0 kudos
1 More Replies
dev_puli
by New Contributor III
  • 3509 Views
  • 3 replies
  • 0 kudos

tracing the history of a workflow

Hi!I use Databricks in Azure and I find it inconvenient not knowing the last modified user and modified time. How can I trace the history of modified time and user details? Would it be possible to deploy the workflows into higher environments?Thanks!

Data Engineering
azure
Workflows
  • 3509 Views
  • 3 replies
  • 0 kudos
Latest Reply
dev_puli
New Contributor III
  • 0 kudos

Sorry! I added another issue at the end without mentioning it was a new issue I encountered. I had challenges in changing the owner of a workflow when I created a workflow. I ended up seeking help from another user with admin privileges to change the...

  • 0 kudos
2 More Replies
UtkarshTrehan
by New Contributor
  • 3652 Views
  • 3 replies
  • 1 kudos

Inconsistent Results When Writing to Oracle DB with Spark's dropDuplicates and foreachPartition

It's more a spark question then a databricks question, I'm encountering an issue when writing data to an Oracle database using Apache Spark. My workflow involves removing duplicate rows from a DataFrame and then writing the deduplicated DataFrame to ...

  • 3652 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @UtkarshTrehan , To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This will also help other community members who may have similar ...

  • 1 kudos
2 More Replies
carlosna
by New Contributor II
  • 12410 Views
  • 2 replies
  • 1 kudos

Resolved! Recover files from previous cluster execution

I saved a file with results by just opening a file via fopen("filename.csv", "a").Once the execution ended (and the cluster shutted down) I couldn't retrieve the file.I found that the file was stored in "/databricks/driver", and that folder empties w...

  • 12410 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @carlosna , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 1 kudos
1 More Replies
Joe1912
by New Contributor III
  • 1770 Views
  • 3 replies
  • 0 kudos

Resolved! Consume 2 kafka topic with different schemas on 1 cluster databricks

Hi everyone,I have a concern that is there any way to read stream from 2 different kafka topics with 2 different in 1 jobs or same cluster? or we need to create 2 separate jobs for it ? (Job will need to process continually)

  • 1770 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Joe1912 , It's certainly reasonable to run a number # of concurrent streams per driver node. Each .start() consumes a certain amount of driver resources in spark. Your limiting factor will be the load on the driver node and its available resour...

  • 0 kudos
2 More Replies
Joe1912
by New Contributor III
  • 867 Views
  • 2 replies
  • 1 kudos

Resolved! Strategy to add new table base on silver data

I have a merge function for streaming foreachBatch kind ofmergedf(df,i):    merge_func_1(df,i)     merge_func_2(df,i)Then I want to add new merge_func_3 into it. Is there any best practices for this case? when streaming always runs, how can I process...

  • 867 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Joe1912 , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 1 kudos
1 More Replies
DavidStarosta
by New Contributor II
  • 1825 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks Asset Bundles Jobs Updated instead of Create

Hello, is it possible to just update parameter values in different workspaces?YAML source code taken from workflow jobs always create a new job. I'd like to just change/update parameter values when I deploy bundle to different workspaces/environments...

  • 1825 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @DavidStarosta , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
2 More Replies
JohnJustus
by New Contributor III
  • 8843 Views
  • 3 replies
  • 0 kudos

Resolved! Accessing Excel file from Databricks

Hi,I am trying to access excel file that is stored in Azure Blob storage via Databricks.In my understanding, it is not possible to access using Pyspark. So accessing through Pandas is the option,Here is my code.%pip install openpyxlimport pandas as p...

  • 8843 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @JohnJustus ,  To resolve the FileNotFoundError when reading from Azure Blob Storage in Databricks, you need to use the "wasbs" protocol for the file path reference instead of the local file system path. Here's a summary of the steps to address th...

  • 0 kudos
2 More Replies
dbuser1234
by New Contributor
  • 1191 Views
  • 1 replies
  • 0 kudos

Resolved! How to readstream from multiple sources?

Hi I am trying to readstream from 2 sources and join them into a target table. How can I do this in pyspark? Egt1 + t2 as my bronze table. I want to readstream from t1 and t2, and merge the changes into t3 (silver table)

  • 1191 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @dbuser1234 , Certainly! To read stream data from two sources, join them, and merge the changes into a target table in PySpark, you can follow these steps:   Read Stream Data from Sources (t1 and t2): Use spark.readStream to read data from both t1...

  • 0 kudos
Abel_Martinez
by Contributor
  • 7467 Views
  • 9 replies
  • 6 kudos

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...

  • 7467 Views
  • 9 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @Abel_Martinez, I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 6 kudos
8 More Replies
7cb15
by New Contributor
  • 7338 Views
  • 1 replies
  • 0 kudos

Resolved! org.apache.spark.SparkException: Job aborted due to stage failure while saving to s3

Hello, I am having issues saving a spark dataframe generated in a databricks notebook to an s3 bucket. The dataframe contains approximately 1.1M rows and 5 columns. The error is as follows: org.apache.spark.SparkException: Job aborted due to stage fa...

  • 7338 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @7cb15, I understand you’re encountering issues while saving a Spark DataFrame to an S3 bucket. Let’s troubleshoot this together! Here are some steps and recommendations to address the problem: Check S3 Permissions: Ensure that the IAM role or us...

  • 0 kudos
svrdragon
by New Contributor
  • 888 Views
  • 1 replies
  • 0 kudos

Resolved! optimizeWrite takes too long

Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...

  • 888 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @svrdragon, It’s great you’re using Delta Lake features to optimize your Spark job.    Let’s explore some strategies to reduce the total job time potentially:   Optimize Write: You’ve already enabled delta.autoOptimize.optimizeWrite, which is a go...

  • 0 kudos
Labels