cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

sher
by Valued Contributor II
  • 1331 Views
  • 2 replies
  • 1 kudos

how to i able to read column mapping metadata for delta tables

i want to read column mapping metadatahttps://github.com/delta-io/delta/blob/master/PROTOCOL.md#column-mappingin above link we can able to find the code block with json data. the same data i want to read in pyspark.. is there any option to read that ...

  • 1331 Views
  • 2 replies
  • 1 kudos
Latest Reply
brockb
Databricks Employee
  • 1 kudos

Hi,Information about the delta table such as history information could be found by running a `describe history table_name`. A `rename column` operation could be found in the `operation` column with a value of `RENAME COLUMN`. If you then look at the ...

  • 1 kudos
1 More Replies
QQ
by New Contributor III
  • 3559 Views
  • 2 replies
  • 0 kudos

Resolved! How to fix (AWS SSO) Test Connection failed

What did I configure incorrectly? About SSO settings in my Databricks account. What troubleshooting should I do? I don't see any error message.I follow step instruction from AWS View step-by-step instructions. link belowView step-by-step instructions...

image0.png image1.png image2.png image5.png
  • 3559 Views
  • 2 replies
  • 0 kudos
Latest Reply
QQ
New Contributor III
  • 0 kudos

I got solution I forgot to create SaaS users with the same subject as the AD users.Preprovisioned usersPreprovisioned users, means users must already exist in the downstream SaaS application. For instance, you may need to create SaaS users with the s...

  • 0 kudos
1 More Replies
BobEng
by New Contributor
  • 1967 Views
  • 0 replies
  • 0 kudos

Delta Live Tables are dropped when pipeline is deleted

I created simplistic DLT pipeline that create one table. When I delete the pipeline the tables is dropped as well. That's not really desired behavior. Since I remember there was a strong distinction between data (stored in tables) and processing (spa...

  • 1967 Views
  • 0 replies
  • 0 kudos
JKR
by Contributor
  • 2377 Views
  • 0 replies
  • 0 kudos

Databricks sql variables and if/else workflow

I have 2 tasks in databricks job workflow first task is of type SQL and SQL task is query.In that query I've declared 2 variables and SET the values by running query.e.g:DECLARE VARIABLE max_timestamp TIMESTAMP DEFAULT '1970-01-01'; SET VARIABLE max_...

Data Engineering
databricks-sql
Workflows
  • 2377 Views
  • 0 replies
  • 0 kudos
dwfchu1
by New Contributor II
  • 1718 Views
  • 1 replies
  • 1 kudos

UC Volume access for spark and other config files

Hi All,Wondering if anyone else getting this problem:We trying to host krb5.conf and jaas.conf for our compute to be able to connect to Kerberised JDBC sources, we attempting to store these files in Catalog volumes, but at run time when initiating th...

  • 1718 Views
  • 1 replies
  • 1 kudos
Latest Reply
mbendana
New Contributor II
  • 1 kudos

Haven't been able to access volume path when using jdbc format.

  • 1 kudos
Sas
by New Contributor II
  • 795 Views
  • 1 replies
  • 0 kudos

Deltalke performance

HiI am new to databricks and i am trying to understand the use case of deta lakehouse. Is it good idea to build datawarehouse using deltalake architecture. Is it going to give same performance as that of RDBMS clouse datawarehous like snowflake? Whic...

  • 795 Views
  • 1 replies
  • 0 kudos
Latest Reply
Miguel_Suarez
Databricks Employee
  • 0 kudos

Hi @Sas , One of the benefits of the Data Lakehouse architecture, is that it combines the best of both Data Warehouses and Data Lakes all on one unified platform to help you reduce costs and deliver on your data and AI initiatives faster. It brings t...

  • 0 kudos
djhs
by New Contributor III
  • 2098 Views
  • 1 replies
  • 0 kudos

Resolved! Installing a private pypi package from Gitlab on a cluster

I have published a pypi package in a private Gitlab repository and I want to install it in my notebook but I don't know how and the documentation doesn't help me much either. I have created a Gitlab token that I use in the index url and I try to inst...

  • 2098 Views
  • 1 replies
  • 0 kudos
Latest Reply
djhs
New Contributor III
  • 0 kudos

This problem was solved by removing the `python>=3.11` requirement.

  • 0 kudos
DaveLeach
by New Contributor III
  • 5237 Views
  • 2 replies
  • 0 kudos

Resolved! Remove ZOrdering

Hi, I am trying to demonstrate the effectiveness of ZOrdering but to do this would like to remove the existing ZOrdering first.  So my plan is:1. Remove existing ZOrdering2. Run a query and show the explain plan3. Add ZOrdering to column used for Joi...

  • 5237 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@DaveLeach - you can try dropping the table and create it again instead of #1. 

  • 0 kudos
1 More Replies
rahulmadnawat
by New Contributor II
  • 2402 Views
  • 3 replies
  • 2 kudos

Resolved! Columns tab in Data Explorer doesn't reflect schema changes to table

Hey team, we've noticed that schema changes to a table after creation aren't reflected in the "Columns" tab in the Data Explorer. For example, we added a column called signal_description to a table but its addition isn't reflected in the UI. Is this ...

rahulmadnawat_0-1689893241839.png
  • 2402 Views
  • 3 replies
  • 2 kudos
Latest Reply
claudius_hini
New Contributor II
  • 2 kudos

@Tharun-Kumar Is this behavior the default behavior in case a schema change happens on a table registered in unity catalog?In that case I would have to run the repair command regularly in order to ensure that the schema displayed is actually the one ...

  • 2 kudos
2 More Replies
rt-slowth
by Contributor
  • 2445 Views
  • 1 replies
  • 0 kudos

User: anonymous is not authorized to perform: sqs:receivemessage on resource

  from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/xxxx/' S3_SCHEMA = 's3://datalake-lab/xxxx/schemas/' @dl...

  • 2445 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
  from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/xxxx/' S3_SCHEMA = 's3://datalake-lab/xxxx/schemas/' @dl...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
SimhadriRaju
by New Contributor
  • 51152 Views
  • 7 replies
  • 0 kudos

How to check file exists in databricks

I Have a while loop there i have to check a file exists or not if exists read the file in a data frame else go to another file

  • 51152 Views
  • 7 replies
  • 0 kudos
Latest Reply
Amit_Dass
New Contributor II
  • 0 kudos

How to check if a file exists in DBFS?Let's write a Python function to check if the file exists or not-------------------------------------------------------------def file_exists(path):    try:        dbutils.fs.ls(path)        return True    except ...

  • 0 kudos
6 More Replies
AlexWeh
by New Contributor II
  • 13611 Views
  • 1 replies
  • 2 kudos

Universal Azure Credential Passthrough

At the moment, Azure Databricks has the feature to use AzureAD login for the workspace and create single user clusters with Azure Data Lake Storage credential passthrough. But this can only be used for Data Lake Storage.Is there already a way, or are...

  • 13611 Views
  • 1 replies
  • 2 kudos
Latest Reply
polivbr
New Contributor II
  • 2 kudos

I have exactly the same issue. I have the need to call a protected API within a notebook but have no access to the current user's access token. I've had to resort to nasty workarounds involving installing and running the Azure CLI from within the not...

  • 2 kudos
neha_ayodhya
by New Contributor II
  • 2659 Views
  • 1 replies
  • 0 kudos

pytesseract.pytesseract.TesseractNotFoundError in databricks notebook

I'm trying to extract the text data from image file in Databricks notebook I have installed below libraries using pip command: %pip install pytesseract tesseract pillow --upgradebut it didn't work and threw below error pytesseract.pytesseract.Tessera...

  • 2659 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Hi @neha_ayodhya  - can you please try the following via an init script to the Databricks cluster sudo apt-get update -y sudo apt-get install -y tesseract-ocr sudo apt-get install -y libtesseract-dev /databricks/python/bin/pip install pytesseract  a...

  • 0 kudos
wwarner
by New Contributor
  • 887 Views
  • 1 replies
  • 0 kudos

Delete S3 files after vacuum

Hi,I'm trying to purge a table of stale data. My databricks host is on cloud.databricks.com.I've set delta.deletedFileRetentionDuration=interval 7 days, deleted many (billions) rows, and followed up with VACUUM tablename RETAIN 168 HOURS, however my ...

  • 887 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Hi Bill - Based on the delta delete file retention period, the files will no longer be there after 7 days . Thanks, Shan

  • 0 kudos
achistef
by New Contributor III
  • 1599 Views
  • 3 replies
  • 0 kudos

Resolved! Refresh the Spark UI automatically

Hello!I'm using Databricks with Azure. On a daily basis, I check the status of numerous jobs through the Spark UI. At the moment, the Spark UI does not refresh by itself. I have to refresh the webpage to get the latest status. I wonder if there is a ...

  • 1599 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@achistef - you are welcome!!!

  • 0 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels