cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RamaTeja
by New Contributor II
  • 2266 Views
  • 2 replies
  • 1 kudos

Unity Catalog metastore list is showing empty

Hi ,I am not able to list the meta-stores in databricks cli using the below command :databricks unity-catalog metastores list{}but when I tried databricks unity-catalog metastores get-summary I am able to get the meta-store info .Can anyone help me ...

  • 2266 Views
  • 2 replies
  • 1 kudos
Latest Reply
RamaTeja
New Contributor II
  • 1 kudos

Hi @Kaniz Fatma​ , Unity catalog is enabled in my workspace and i have been assigned metastore admin and account admin also.databricks unity-catalog metastores list --debugHTTP debugging enabledsend: b'GET /api/2.1/unity-catalog/metastores HTTP/1.1\r...

  • 1 kudos
1 More Replies
Jake2
by New Contributor III
  • 23270 Views
  • 2 replies
  • 2 kudos

Failed to Merge Fields Error on Delta Live Tables

I'm running into an issue during the "Setting up Tables" phase of our DLT pipelines where I'm told a particular field is unable to be merged due to incompatible datatypes. See this example: org.apache.spark.sql.AnalysisException: Failed to merge fiel...

  • 23270 Views
  • 2 replies
  • 2 kudos
data_loader
by New Contributor II
  • 3061 Views
  • 2 replies
  • 0 kudos

Creating External Table from partitioned parquet table

I am trying to create an external table in catalog using parquet where the parquet file is partitioned I have tried using the below syntax,%sqlCREATE TABLE table_name(col1 type1, col2 type2, col3 type3, )USING parquet PARTITIONED BY ( col4 type4)LOCA...

  • 3061 Views
  • 2 replies
  • 0 kudos
Latest Reply
Chizzy
New Contributor II
  • 0 kudos

@data_loader , How were you able to fix this problem? I am having same issue now

  • 0 kudos
1 More Replies
ArttuHer
by New Contributor III
  • 1590 Views
  • 2 replies
  • 0 kudos

Resolved! Error pushing changes Remote ref update was rejected. Make sure you have write access to this remote

Hello!I'm receiving the following error, when pushing from Databricks to GitHub. What I have done: Access token is set upRepository is publicI can pull from the repository to DatabricksI'm using standard tier"Error pushing changes Remote ref update w...

  • 1590 Views
  • 2 replies
  • 0 kudos
Latest Reply
ArttuHer
New Contributor III
  • 0 kudos

Hello!thanks for the answer. Just figured it out. It turned out that I accidentally had secrets in my code and it was not allowed by GitHub. After removing them, all worked out. Arttu 

  • 0 kudos
1 More Replies
rushi29
by New Contributor III
  • 1622 Views
  • 2 replies
  • 0 kudos

sparkContext in Runtime 15.3

Hello All, Our Azure databricks cluster is running under "Legacy Shared Compute" policy with 15.3 runtime. One of the python notebooks is used to connect to an Azure SQL database to read/insert data. The following snippet of code is responsible for r...

  • 1622 Views
  • 2 replies
  • 0 kudos
Latest Reply
jayct
New Contributor II
  • 0 kudos

Hi @rushi29 , did you ever get a solution to this?@Retired_mod there never was a response to the issue there

  • 0 kudos
1 More Replies
KacperG
by New Contributor III
  • 425 Views
  • 1 replies
  • 0 kudos

%sh fails to install mdbtools locally

HiI have a notebook that uses mdbtools:%sh sudo apt-get -y -S install mdbtoolsHowever, when I want to run it locally it returns error:sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure...

  • 425 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @KacperG ,Most likely the job is using different privileges than you.In your command -S needs to be specified before apt-get. Have you tried running: %sh sudo -S apt-get -y install mdbtools If it is just a typo, here are the options:1.Try to run t...

  • 0 kudos
semsim
by Contributor
  • 6227 Views
  • 9 replies
  • 1 kudos

Resolved! Init Script Failing

I am getting an error when I try to run the cluster scoped init script. The script itself is as follows:#!/bin/bashsudo apt update && sudo apt upgrade -ysudo apt install libreoffice-common libreoffice-java-common libreoffice-writer openjdk-8-jre-head...

  • 6227 Views
  • 9 replies
  • 1 kudos
Latest Reply
zmsoft
New Contributor III
  • 1 kudos

Hi @semsim , @jacovangelder ,I added the code you mentioned at the beginning of the script, but I still got errors. #!/bin/bash sudo rm -r /var/lib/apt/lists/* sudo apt clean && sudo apt update --fix-missing -y if ! [[ "18.04 20.04 22.04 23.04 24.04...

  • 1 kudos
8 More Replies
pragarwal
by New Contributor II
  • 3968 Views
  • 6 replies
  • 1 kudos

Adding Member to group using account databricks rest api

Hi All,I want to add a member to a group in databricks account level using rest api (https://docs.databricks.com/api/azure/account/accountgroups/patch) as mentioned in this link I could able to authenticate but not able to add member while using belo...

  • 3968 Views
  • 6 replies
  • 1 kudos
Latest Reply
Nikos
New Contributor II
  • 1 kudos

Does the above work? I still can't quite figure it out. Any help would be much appreciated.I know authentication is not an issue as I can use a lot of the other endpoints. I just can't figure out the correct body syntax to add a member to a group.url...

  • 1 kudos
5 More Replies
sashikanth
by New Contributor II
  • 660 Views
  • 2 replies
  • 0 kudos

Streaming or Batch Processing

How to decide whether to go for Streaming or Batch processing when the upstream is DELTA table?Please share suggestions to optimize the load timings.

  • 660 Views
  • 2 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Structured Streaming is one of the options, spark.readStream.format("delta")

  • 0 kudos
1 More Replies
priyanananthram
by New Contributor II
  • 8082 Views
  • 4 replies
  • 1 kudos

Delta live tables for large number of tables

Hi There I am hoping for some guidance I have some 850 tables that I need to ingest using  a DLT Pipeline. When I do this my event log shows that driver node dies becomes unresponsive likely due to GC.Can DLT be used to ingest large number of tablesI...

  • 8082 Views
  • 4 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

Delta Live Tables (DLT) can indeed be used to ingest a large number of tables. However, if you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the dr...

  • 1 kudos
3 More Replies
badari_narayan
by New Contributor II
  • 1360 Views
  • 6 replies
  • 1 kudos

How to create SQL Functions using Pysparkin local machine

I am trying to create spark SQL function in particular schema (i.e) spark.sql(" CREATE OR REPLACE FUNCTION <spark_catalog>.<schema_name>.<function_name()> RETURNS STRING RETURN <value>")This works perfectly fine on Databricks using notebooks.But, I n...

  • 1360 Views
  • 6 replies
  • 1 kudos
Latest Reply
filipniziol
Contributor III
  • 1 kudos

Hi @badari_narayan ,In general you may run pyspark project locally, but with limitations.Create virtual environmentInstall pyspark in your virtual environment (the same version you have on your cluster)Since spark version 2.x you even do not need to ...

  • 1 kudos
5 More Replies
sms101
by New Contributor
  • 953 Views
  • 1 replies
  • 0 kudos

Table lineage visibility in Databricks

I’ve observed differences in table lineage visibility in Databricks based on how data is referenced, and I would like to confirm if this is the expected behavior.1. When referencing a Delta table as the source in a query (e.g., df = spark.table("cata...

  • 953 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor
  • 0 kudos

Hi @sms101,How are you doing today?As per my understanding, It is correct that lineage tracking in Databricks works primarily at the table level, meaning when you reference a Delta table directly, the lineage is properly captured. However, when you u...

  • 0 kudos
Bilel
by New Contributor
  • 1033 Views
  • 1 replies
  • 1 kudos

Python library not installed when compute is resized

 Hi,I have a python notebook workflow that uses a job cluster. The cluster lost at least a node (due to Spot Instance Termination) and did an upsize. After that I got an error in my job "Module not found", but the python module was being used before ...

  • 1033 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor
  • 1 kudos

Hi @Bilel,How are you doing today?As per my understanding, Consider installing the library at the cluster level to ensure it's automatically applied across all nodes when a new one is added. You could also try using init scripts to guarantee the requ...

  • 1 kudos
fperry
by New Contributor III
  • 547 Views
  • 1 replies
  • 0 kudos

Question about stateful processing

I'm experiencing an issue that I don't understand. I am using Python's arbitrary stateful processing with structured streaming to calculate metrics for each item/ID. A timeout is set, after which I clear the state for that item/ID and display each ID...

  • 547 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor
  • 0 kudos

Hi @fperry,How are you doing today?As per my understanding, Consider checking for any differences in how the stateful streaming function is writing and persisting data. It's possible that while the state is cleared after the timeout, some state might...

  • 0 kudos
gabrieleladd
by New Contributor II
  • 2386 Views
  • 3 replies
  • 1 kudos

Clearing data stored by pipelines

Hi everyone! I'm new to Databricks and moving my first steps with Delta Live Tables, so please forgive my inexperience. I'm building my first DLT pipeline and there's something that I can't really grasp: how to clear all the objects generated or upda...

Data Engineering
Data Pipelines
Delta Live Tables
  • 2386 Views
  • 3 replies
  • 1 kudos
Latest Reply
ChKing
New Contributor II
  • 1 kudos

To clear all objects generated or updated by the DLT pipeline, you can drop the tables manually using the DROP command as you've mentioned. However, to get a completely clean slate, including metadata like the tracking of already processed files in t...

  • 1 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels