cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Valued Contributor II
  • 2092 Views
  • 2 replies
  • 1 kudos

Snowflake connector

Hi Team, Databricks recommends storing data in a cloud storage location, but if we directly connect to Snowflake using the Snowflake connector, will we face any performance issues?Could you please suggest the best way to read a large volume of data f...

  • 2092 Views
  • 2 replies
  • 1 kudos
Latest Reply
Phani1
Valued Contributor II
  • 1 kudos

Thanks !!

  • 1 kudos
1 More Replies
Amit_Garg
by New Contributor
  • 1261 Views
  • 1 replies
  • 1 kudos

Calling a .py Function using DF from another file

I have created a file NBF_TextTranslationspark = SparkSession.builder.getOrCreate() df_TextTranslation = spark.read.format('delta').load(textTranslation_path) def getMediumText(TextID, PlantName): df1 = spark.sql("SELECT TextID, PlantName, Langu...

  • 1261 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

You should create a udf on top of getMediumText function and then use the udf in the sql statement.

  • 1 kudos
Volker
by Contributor
  • 10493 Views
  • 4 replies
  • 2 kudos

Persisting and managing tables and table schemas in Unity Catalog

Hello Databricks Community,we are currently looking for a way to persist and manage our unity catalog tables in an IaC manner. That is, we want to trace any changes to a table's schema and properties and ideally be able to roll back those changes sea...

  • 10493 Views
  • 4 replies
  • 2 kudos
Latest Reply
CharlesReily
New Contributor III
  • 2 kudos

As you mentioned, using notebooks with Data Definition Language (DDL) scripts is a viable option. You can create notebooks that contain the table creation scripts and version control these notebooks along with your application code.

  • 2 kudos
3 More Replies
rt-slowth
by Contributor
  • 899 Views
  • 1 replies
  • 0 kudos

Handling files used more than once in a streaming pipeline

I am implementing Structured Streaming using Delta Live Table. I want to delete the parquet files once they are used. What options should I set so that the files loaded in S3 are not deleted?

  • 899 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi, It sounds like your Structured Streaming source is S3, in which case the easiest solution is likely to manage the stream source using an S3 Lifecycle Configuration (https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)...

  • 0 kudos
Heman2
by Valued Contributor II
  • 16864 Views
  • 4 replies
  • 21 kudos

Resolved! How to export the output data in the Excel format into the dbfs location

Is there any way to export the ​output data in the Excel format into the dbfs?, I'm only able to do it in the CSV format

  • 16864 Views
  • 4 replies
  • 21 kudos
Latest Reply
Sobreiro
New Contributor II
  • 21 kudos

The easiest way I fount is to create a dashboard and export from there. It will enable a context menu with options to export to some file types including csv and excel. 

  • 21 kudos
3 More Replies
Esther_Tomi
by New Contributor
  • 1625 Views
  • 0 replies
  • 0 kudos

Unable to Install Cluster-Scoped Libraries on Runtime >13.3

Hello team,I'm trying to upgrade our databricks runtime to 13.3 from 9.1, but i've been having issues installing libraries on the compute  from our internal artifactoryHowever, when I tried this on a unity-catalog enabled workspace, it works seamless...

  • 1625 Views
  • 0 replies
  • 0 kudos
pgruetter
by Contributor
  • 7936 Views
  • 7 replies
  • 2 kudos

Run Task as Service Principal with Code in Azure DevOps Repo

Hi allI have a task of type Notebook, source is Git (Azure DevOps). This task runs fine with my user, but if I change the Owner to a service principal, I get the following error:Run result unavailable: run failed with error message Failed to checkout...

  • 7936 Views
  • 7 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@pgruetter​ :To enable a service principal to access a specific Azure DevOps repository, you need to grant it the necessary permissions at both the organization and repository levels.Here are the steps to grant the service principal the necessary per...

  • 2 kudos
6 More Replies
sher
by Valued Contributor II
  • 1569 Views
  • 2 replies
  • 1 kudos

how to i able to read column mapping metadata for delta tables

i want to read column mapping metadatahttps://github.com/delta-io/delta/blob/master/PROTOCOL.md#column-mappingin above link we can able to find the code block with json data. the same data i want to read in pyspark.. is there any option to read that ...

  • 1569 Views
  • 2 replies
  • 1 kudos
Latest Reply
brockb
Databricks Employee
  • 1 kudos

Hi,Information about the delta table such as history information could be found by running a `describe history table_name`. A `rename column` operation could be found in the `operation` column with a value of `RENAME COLUMN`. If you then look at the ...

  • 1 kudos
1 More Replies
QQ
by New Contributor III
  • 3874 Views
  • 2 replies
  • 0 kudos

Resolved! How to fix (AWS SSO) Test Connection failed

What did I configure incorrectly? About SSO settings in my Databricks account. What troubleshooting should I do? I don't see any error message.I follow step instruction from AWS View step-by-step instructions. link belowView step-by-step instructions...

image0.png image1.png image2.png image5.png
  • 3874 Views
  • 2 replies
  • 0 kudos
Latest Reply
QQ
New Contributor III
  • 0 kudos

I got solution I forgot to create SaaS users with the same subject as the AD users.Preprovisioned usersPreprovisioned users, means users must already exist in the downstream SaaS application. For instance, you may need to create SaaS users with the s...

  • 0 kudos
1 More Replies
BobEng
by New Contributor
  • 2455 Views
  • 0 replies
  • 0 kudos

Delta Live Tables are dropped when pipeline is deleted

I created simplistic DLT pipeline that create one table. When I delete the pipeline the tables is dropped as well. That's not really desired behavior. Since I remember there was a strong distinction between data (stored in tables) and processing (spa...

  • 2455 Views
  • 0 replies
  • 0 kudos
dwfchu1
by New Contributor II
  • 2065 Views
  • 1 replies
  • 1 kudos

UC Volume access for spark and other config files

Hi All,Wondering if anyone else getting this problem:We trying to host krb5.conf and jaas.conf for our compute to be able to connect to Kerberised JDBC sources, we attempting to store these files in Catalog volumes, but at run time when initiating th...

  • 2065 Views
  • 1 replies
  • 1 kudos
Latest Reply
mbendana
New Contributor II
  • 1 kudos

Haven't been able to access volume path when using jdbc format.

  • 1 kudos
Sas
by New Contributor II
  • 968 Views
  • 1 replies
  • 0 kudos

Deltalke performance

HiI am new to databricks and i am trying to understand the use case of deta lakehouse. Is it good idea to build datawarehouse using deltalake architecture. Is it going to give same performance as that of RDBMS clouse datawarehous like snowflake? Whic...

  • 968 Views
  • 1 replies
  • 0 kudos
Latest Reply
Miguel_Suarez
Databricks Employee
  • 0 kudos

Hi @Sas , One of the benefits of the Data Lakehouse architecture, is that it combines the best of both Data Warehouses and Data Lakes all on one unified platform to help you reduce costs and deliver on your data and AI initiatives faster. It brings t...

  • 0 kudos
djhs
by New Contributor III
  • 2576 Views
  • 1 replies
  • 0 kudos

Resolved! Installing a private pypi package from Gitlab on a cluster

I have published a pypi package in a private Gitlab repository and I want to install it in my notebook but I don't know how and the documentation doesn't help me much either. I have created a Gitlab token that I use in the index url and I try to inst...

  • 2576 Views
  • 1 replies
  • 0 kudos
Latest Reply
djhs
New Contributor III
  • 0 kudos

This problem was solved by removing the `python>=3.11` requirement.

  • 0 kudos
DaveLeach
by New Contributor III
  • 5470 Views
  • 2 replies
  • 0 kudos

Resolved! Remove ZOrdering

Hi, I am trying to demonstrate the effectiveness of ZOrdering but to do this would like to remove the existing ZOrdering first.  So my plan is:1. Remove existing ZOrdering2. Run a query and show the explain plan3. Add ZOrdering to column used for Joi...

  • 5470 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@DaveLeach - you can try dropping the table and create it again instead of #1. 

  • 0 kudos
1 More Replies
rahulmadnawat
by New Contributor II
  • 2896 Views
  • 3 replies
  • 2 kudos

Resolved! Columns tab in Data Explorer doesn't reflect schema changes to table

Hey team, we've noticed that schema changes to a table after creation aren't reflected in the "Columns" tab in the Data Explorer. For example, we added a column called signal_description to a table but its addition isn't reflected in the UI. Is this ...

rahulmadnawat_0-1689893241839.png
  • 2896 Views
  • 3 replies
  • 2 kudos
Latest Reply
claudius_hini
New Contributor II
  • 2 kudos

@Tharun-Kumar Is this behavior the default behavior in case a schema change happens on a table registered in unity catalog?In that case I would have to run the repair command regularly in order to ensure that the schema displayed is actually the one ...

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels