cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rt-slowth
by Contributor
  • 2561 Views
  • 1 replies
  • 0 kudos

User: anonymous is not authorized to perform: sqs:receivemessage on resource

  from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/xxxx/' S3_SCHEMA = 's3://datalake-lab/xxxx/schemas/' @dl...

  • 2561 Views
  • 1 replies
  • 0 kudos
SimhadriRaju
by New Contributor
  • 52684 Views
  • 7 replies
  • 0 kudos

How to check file exists in databricks

I Have a while loop there i have to check a file exists or not if exists read the file in a data frame else go to another file

  • 52684 Views
  • 7 replies
  • 0 kudos
Latest Reply
Amit_Dass
New Contributor II
  • 0 kudos

How to check if a file exists in DBFS?Let's write a Python function to check if the file exists or not-------------------------------------------------------------def file_exists(path):    try:        dbutils.fs.ls(path)        return True    except ...

  • 0 kudos
6 More Replies
AlexWeh
by New Contributor II
  • 13716 Views
  • 1 replies
  • 2 kudos

Universal Azure Credential Passthrough

At the moment, Azure Databricks has the feature to use AzureAD login for the workspace and create single user clusters with Azure Data Lake Storage credential passthrough. But this can only be used for Data Lake Storage.Is there already a way, or are...

  • 13716 Views
  • 1 replies
  • 2 kudos
Latest Reply
polivbr
New Contributor II
  • 2 kudos

I have exactly the same issue. I have the need to call a protected API within a notebook but have no access to the current user's access token. I've had to resort to nasty workarounds involving installing and running the Azure CLI from within the not...

  • 2 kudos
neha_ayodhya
by New Contributor II
  • 2854 Views
  • 1 replies
  • 0 kudos

pytesseract.pytesseract.TesseractNotFoundError in databricks notebook

I'm trying to extract the text data from image file in Databricks notebook I have installed below libraries using pip command: %pip install pytesseract tesseract pillow --upgradebut it didn't work and threw below error pytesseract.pytesseract.Tessera...

  • 2854 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Hi @neha_ayodhya  - can you please try the following via an init script to the Databricks cluster sudo apt-get update -y sudo apt-get install -y tesseract-ocr sudo apt-get install -y libtesseract-dev /databricks/python/bin/pip install pytesseract  a...

  • 0 kudos
wwarner
by New Contributor
  • 985 Views
  • 1 replies
  • 0 kudos

Delete S3 files after vacuum

Hi,I'm trying to purge a table of stale data. My databricks host is on cloud.databricks.com.I've set delta.deletedFileRetentionDuration=interval 7 days, deleted many (billions) rows, and followed up with VACUUM tablename RETAIN 168 HOURS, however my ...

  • 985 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Hi Bill - Based on the delta delete file retention period, the files will no longer be there after 7 days . Thanks, Shan

  • 0 kudos
achistef
by New Contributor III
  • 1746 Views
  • 3 replies
  • 0 kudos

Resolved! Refresh the Spark UI automatically

Hello!I'm using Databricks with Azure. On a daily basis, I check the status of numerous jobs through the Spark UI. At the moment, the Spark UI does not refresh by itself. I have to refresh the webpage to get the latest status. I wonder if there is a ...

  • 1746 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@achistef - you are welcome!!!

  • 0 kudos
2 More Replies
therealchainman
by New Contributor II
  • 4085 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks Auto Loader cloudFiles.backfillInterval

Hello, I have been reading databricks Auto Loader documentation about cloudFiles.backfillInterval configuration, and have a question about a specific detail on how it works still.  I was only able to find examples of it being set to 1 day or 1 week. ...

Data Engineering
Auto Loader
backfillInterval
cloudFiles
  • 4085 Views
  • 4 replies
  • 1 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 1 kudos

Hey @therealchainman The last backfill (lastBackfillFinishTimeMs) will be recorded as part of the checkpoint -> offset files, this helps the autoloader to know when the last backfill is triggered and to trigger the next periodic backfill.Hope this an...

  • 1 kudos
3 More Replies
Nathant93
by New Contributor III
  • 1056 Views
  • 1 replies
  • 0 kudos

alter function owner in UC

I have a number of functions in a schema in a catalog in Unity Catalog, is there a coding way to be able to change the owner of the functions created without having to do it manually via the gui?

  • 1056 Views
  • 1 replies
  • 0 kudos
Latest Reply
ossinova
Contributor II
  • 0 kudos

Check this notebook out, I assume you can change it a bit to do what you want. https://docs.databricks.com/en/_extras/notebooks/source/set-owners-notebook.htmlI assume you can loop through the rows in the resulting df (that has the ALTER statements),...

  • 0 kudos
191522
by New Contributor
  • 1188 Views
  • 1 replies
  • 0 kudos

Hostname could not be verified

Hi all,We have a job that combines historical tables with live tables to give us up to date information. It works for almost all of the tables in our source postgres database, but there's one table that keeps giving the following error. Any ideas why...

  • 1188 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Could you share the full error stack trace please? Also try to test your connectivity by doing "%sh nc -zv {hostname} {port}"

  • 0 kudos
janacc
by New Contributor
  • 1796 Views
  • 1 replies
  • 0 kudos

Error when running Spark-DL notebooks

I try several Spark Deep Learning inference notebooks on Windows. I run Spark in standalone mode with 1 worker with 12 cores (both driver-memory and executor-memory are set to 8G). I always get the same error when applying the deep learning model to ...

master.png worker.png
Data Engineering
Deep Learning
DL
  • 1796 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

This is a connectivity issue. Check your connectivity by doing "%sh nc -zv {hostname} {port}" from your notebook

  • 0 kudos
Gcabrera
by New Contributor
  • 1131 Views
  • 1 replies
  • 0 kudos

Issue importing library deltalake

Hello,I'm currently seeing a rather cryptic error message whenever I try to import the deltalake library into Databricks (without actually doing anything else).import datalake"ImportError: /local_disk0/.ephemeral_nfs/envs/pythonEnv-cbe496f6-d064-40ae...

Gcabrera_0-1701937296901.png Gcabrera_1-1701937506250.png Gcabrera_2-1701937513959.png
  • 1131 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Are you trying to import this library to a Databricks notebook? are you using open source spark in your local machine?  

  • 0 kudos
deng_dev
by New Contributor III
  • 1060 Views
  • 1 replies
  • 0 kudos

Getting "Job aborted" exception while saving data to the database

Hi!We have job, that runs every hour. It extracts data from the API and saves to the databricks table.Sometimes job fails with error "org.apache.spark.SparkException". Here is the full error:An error occurred while calling o7353.saveAsTable. : org.ap...

  • 1060 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Do you have any NULL values in your data? Please verify that you data is valid

  • 0 kudos
532664
by New Contributor III
  • 5042 Views
  • 11 replies
  • 3 kudos

Resolved! Replay(backfill) DLT CDC using kafka

Hello,We are receiving DB CDC binlogs through Kafka and synchronizing tables in OLAP system using the apply_changes function in Delta Live Table (DLT). A month ago, a column was added to our table, but due to a type mismatch, it's being stored incorr...

  • 5042 Views
  • 11 replies
  • 3 kudos
Latest Reply
jcozar
Contributor
  • 3 kudos

Thank you @532664 for your detailed response! That's seems to me a very good solution, and it also helps me with my doubts

  • 3 kudos
10 More Replies
Prashant777
by New Contributor II
  • 5878 Views
  • 4 replies
  • 0 kudos

Error in SQL statement: UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My code:- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT  Key_ID,  Distributor_ID,  Customer_ID,  Customer_Name,  ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source tableM...

  • 5878 Views
  • 4 replies
  • 0 kudos
Latest Reply
Tread
New Contributor II
  • 0 kudos

Hey as previously stated you could drop the duplicates of the columns that contain the said duplicates(code you can find online pretty easily), I have had this problem myself and it came when creating a temporary view from a dataframe, the dataframe ...

  • 0 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels