cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ashraf1395
by Honored Contributor
  • 237 Views
  • 1 replies
  • 2 kudos

Migrating data from hive metastore to unity catalog. data workflow is handled in fivetran

So in a uc migration project,we have a fivetran connection which handles most of the etl processes and writes data into hive metastore. we have migrated the schemas related to fivetran in UC. The workspace where fivetran was running had default catal...

  • 237 Views
  • 1 replies
  • 2 kudos
Latest Reply
saurabh18cs
Valued Contributor III
  • 2 kudos

Hi @ashraf1395 I can think of following :Fivetran needs to be aware of the new catalog structure. This typically involves updating the destination settings in Fivetran to point to the Unity Catalog. Navigate to the destination settings for your Datab...

  • 2 kudos
jb1z
by Contributor
  • 452 Views
  • 5 replies
  • 0 kudos

Resolved! Query separate data loads from python spark.readStream

I am using python spark.readStream in a Delta Live Tables pipeline to read json data files from a S3 folder path. Each load is a daily snapshot of a very similar set of products showing changes in price and inventory. How do i distinguish and query e...

  • 452 Views
  • 5 replies
  • 0 kudos
Latest Reply
jb1z
Contributor
  • 0 kudos

The problem was fixed by this importfrom pyspark.sql import functions as F then using F.lit() instead of F.col.withColumn('ingestion_date', F.lit(folder_date)) Sorry code formatting is not working at the moment.

  • 0 kudos
4 More Replies
abhinandan084
by New Contributor III
  • 24228 Views
  • 19 replies
  • 13 kudos

Community Edition signup issues

I am trying to sign up for the community edition (https://databricks.com/try-databricks) for use with a databricks academy course. However, I am unable to signup and I receive the following error (image attached). On going to login page (link in ora...

0693f000007OoQjAAK
  • 24228 Views
  • 19 replies
  • 13 kudos
Latest Reply
brokeTechBro
New Contributor II
  • 13 kudos

Hello,I get "An error occurred, try again"I am exhausted from trying... also from solving the puzzle to prove I'm not a robot

  • 13 kudos
18 More Replies
minhhung0507
by Contributor
  • 810 Views
  • 6 replies
  • 5 kudos

Resolved! Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log

Dear Databricks experts,I encountered the following error in Databricks:`com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_EMPTY_DIRECTORY] No file found in the directory: gs://cimb-prod-lakehouse/bronze-layer/losdb/pl_message/_...

minhhung0507_2-1736940030237.png
  • 810 Views
  • 6 replies
  • 5 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 5 kudos

Hi @minhhung0507,The VACUUM command on a Delta table does not delete the _delta_log folder, as this folder contains all the metadata related to the Delta table. The _delta_log folder acts as a pointer where all changes are tracked. In the event that ...

  • 5 kudos
5 More Replies
ErikJ
by New Contributor III
  • 2102 Views
  • 7 replies
  • 2 kudos

Errors calling databricks rest api /api/2.1/jobs/run-now with job_parameters

Hello! I have been using the databricks rest api for running workflows using this endpoint: /api/2.1/jobs/run-now. But now i wanted to also include job_parameters in my api call, i have put job parameters inside my workflow: param1, param2, and in my...

  • 2102 Views
  • 7 replies
  • 2 kudos
Latest Reply
slkdfuba
New Contributor II
  • 2 kudos

I encountered a null job_id in my post, when a notebook parameter was set in the job GUI. But it runs just fine (I get a valid job_id with active run) if I delete the notebook parameter in the job GUI.Is this a documented behavior, or a bug? If it's ...

  • 2 kudos
6 More Replies
diegohMoodys
by New Contributor
  • 288 Views
  • 1 replies
  • 0 kudos

JBDC RBMS Table Overwrite Transaction Incomplete

Spark version:  spark-3.4.1-bin-hadoop3JBDC Driver: mysql-connector-j-8.4.0.jarAssumptions:have all the proper read/write permissionsdataset isn't large: ~2 million recordsreading flat files, writing to a databaseDoes not read from the database at al...

diegohMoodys_0-1737041259601.png
  • 288 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @diegohMoodys, Can you try in debug mode? spark.sparkContext.setLogLevel("DEBUG")

  • 0 kudos
stevomcnevo007
by New Contributor III
  • 2418 Views
  • 16 replies
  • 2 kudos

agents.deploy NOT_FOUND: The directory being accessed is not found. error

I keep getting the following error although the model definitely does exist and version names and model name is correct RestException: NOT_FOUND: The directory being accessed is not found. when calling # Deploy the model to the review app and a model...

  • 2418 Views
  • 16 replies
  • 2 kudos
Latest Reply
ezermoysis
New Contributor III
  • 2 kudos

Does the model need to be served before deployment?

  • 2 kudos
15 More Replies
Aatma
by New Contributor
  • 1039 Views
  • 1 replies
  • 0 kudos

Resolved! DABs require library dependancies from GitHub private repository.

developing a python wheel file using DABs which require library dependancies from GitHub private repository. Please help me understand how to setup the git user and token in the resource.yml file and how to authenticate the GitHub package.pip install...

  • 1039 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

To install dependencies from a private GitHub repository in a Databricks Asset Bundle, you need to set up the GitHub user and token in the resource.yml file and authenticate the GitHub package. Here are the steps: Generate a GitHub Personal Access T...

  • 0 kudos
tonypiazza
by New Contributor II
  • 1061 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Asset Bundle - Job Cluster - JDBC HTTP Path

I am currently working on deploying dbt jobs using a Databricks Asset Bundle. In my existing job configuration, I am using an all-purpose cluster and the JDBC HTTP Path was manually copied from the web UI. Now that I am trying to switch to using a jo...

  • 1061 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

  To reference the HTTP Path using substitutions in Databricks Asset Bundles and job clusters, you can use the variables section in your databricks.yml configuration file In your databricks.yml file, you can define a variable for the HTTP Path. For e...

  • 0 kudos
Filippo
by New Contributor
  • 1305 Views
  • 1 replies
  • 0 kudos

Resolved! Issue with View Ownership Reassignment in Unity Catalog

Hello,It appears that the ownership rules for views and functions in Unity Catalog do not align with the guidelines provided in the “Manage Unity Catalog object ownership” documentation on Microsoft Learn.When attempting to reassign the ownership of ...

  • 1305 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

Hi @Filippo To prevent privilege escalations, only a metastore admin can transfer ownership of a view, function, or model to any user, service principal, or group in the account. Current owners and users with the MANAGE privilege are restricted to tr...

  • 0 kudos
lbdatauser
by New Contributor II
  • 1014 Views
  • 1 replies
  • 0 kudos

Resolved! dbx with serverless clusters

With dbx, is it impossible to create tasks that run on serverless clusters? Is it necessary to use Databricks bundles for it?https://dbx.readthedocs.io/en/latest/reference/deployment/https://learn.microsoft.com/en-us/azure/databricks/jobs/run-serverl...

  • 1014 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

It is possible to create tasks that run on serverless clusters using dbx. Also, please note that Databricks recommends that you use Databricks Asset Bundles instead of dbx by Databricks Labs. See What are Databricks Asset Bundles and Migrate from dbx...

  • 0 kudos
Roy
by New Contributor II
  • 61008 Views
  • 6 replies
  • 0 kudos

Resolved! dbutils.notebook.exit() executing from except in try/except block even if there is no error.

I am using Python notebooks as part of a concurrently running workflow with Databricks Runtime 6.1. Within the notebooks I am using try/except blocks to return an error message to the main concurrent notebook if a section of code fails. However I h...

  • 61008 Views
  • 6 replies
  • 0 kudos
Latest Reply
tonyliken
New Contributor II
  • 0 kudos

because the dbutils.notebook.exit() is an 'Exception' it will always trigger the except Exception as e: part of the code. When can use this to our advantage to solve the problem by adding an 'if else' to the except block. query = "SELECT 'a' as Colum...

  • 0 kudos
5 More Replies
ghofigjong
by New Contributor
  • 8021 Views
  • 4 replies
  • 2 kudos

Resolved! How does partition pruning work on a merge into statement?

I have a delta table that is partitioned by Year, Date and month. I'm trying to merge data to this on all three partition columns + an extra column (an ID). My merge statement is below:MERGE INTO delta.<path of delta table> oldData using df newData ...

  • 8021 Views
  • 4 replies
  • 2 kudos
Latest Reply
Umesh_S
New Contributor II
  • 2 kudos

Isn't the suggested idea only filtering the input dataframe (resulting in a smaller amount of data to match across the whole delta table) rather than prune the delta table for relevant partitions to scan?

  • 2 kudos
3 More Replies
halox6000
by New Contributor III
  • 1132 Views
  • 1 replies
  • 0 kudos

How do i stop pyspark from outputting text

I am using a tqdm progress bar to monitor the amount of data records I have collected via API. I am temporarily writing them to a file in the DBFS, then uploading to a Spark DataFrame. Each time I write to a file, I get a message like 'Wrote 8873925 ...

  • 1132 Views
  • 1 replies
  • 0 kudos
Latest Reply
MathieuDB
Databricks Employee
  • 0 kudos

Hello @halox6000, You could temporarily redirect console output to a null device for these write operations. Try this out: @contextlib.contextmanager def silence_dbutils(): with contextlib.redirect_stdout(io.StringIO()): yield # Usage in...

  • 0 kudos
Nathant93
by New Contributor III
  • 1113 Views
  • 2 replies
  • 0 kudos

remove empty folders with pyspark

Hi,I am trying to search a mnt point for any empty folders and remove them. Does anyone know of a way to do this? I have tried dbutils.fs.walk but this does not seem to work.Thanks

  • 1113 Views
  • 2 replies
  • 0 kudos
Latest Reply
MathieuDB
Databricks Employee
  • 0 kudos

Hello @Nathant93, You could use dbutils.fs.ls and iterate on all the directories found to accomplish this task. Something like this: def find_empty_dirs(path): directories = dbutils.fs.ls(path) for directory in directories: if directo...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels