cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Hubert-Dudek
by Esteemed Contributor III
  • 33601 Views
  • 11 replies
  • 30 kudos

Selenium chrome driver on databricks driver On the databricks community, I see repeated problems regarding the selenium installation on the databricks...

Selenium chrome driver on databricks driverOn the databricks community, I see repeated problems regarding the selenium installation on the databricks driver. Installing selenium on databricks can be surprising, but for example, sometimes we need to g...

init install_library results import
  • 33601 Views
  • 11 replies
  • 30 kudos
Latest Reply
iSinnerman
New Contributor II
  • 30 kudos

Hi Hubert-Dudek,Are there any updates to your article?  I have struggling to get databricks to recognise a Seleniumbase driver. I think the error might actually be a permissions problem as the error is:WebDriverException: Message: Can not connect to ...

  • 30 kudos
10 More Replies
RamaTeja
by New Contributor II
  • 2662 Views
  • 2 replies
  • 1 kudos

Unity Catalog metastore list is showing empty

Hi ,I am not able to list the meta-stores in databricks cli using the below command :databricks unity-catalog metastores list{}but when I tried databricks unity-catalog metastores get-summary I am able to get the meta-store info .Can anyone help me ...

  • 2662 Views
  • 2 replies
  • 1 kudos
Latest Reply
RamaTeja
New Contributor II
  • 1 kudos

Hi @Kaniz Fatma​ , Unity catalog is enabled in my workspace and i have been assigned metastore admin and account admin also.databricks unity-catalog metastores list --debugHTTP debugging enabledsend: b'GET /api/2.1/unity-catalog/metastores HTTP/1.1\r...

  • 1 kudos
1 More Replies
Jake2
by New Contributor III
  • 29418 Views
  • 2 replies
  • 2 kudos

Failed to Merge Fields Error on Delta Live Tables

I'm running into an issue during the "Setting up Tables" phase of our DLT pipelines where I'm told a particular field is unable to be merged due to incompatible datatypes. See this example: org.apache.spark.sql.AnalysisException: Failed to merge fiel...

  • 29418 Views
  • 2 replies
  • 2 kudos
data_extractor
by New Contributor II
  • 3613 Views
  • 2 replies
  • 0 kudos

Creating External Table from partitioned parquet table

I am trying to create an external table in catalog using parquet where the parquet file is partitioned I have tried using the below syntax,%sqlCREATE TABLE table_name(col1 type1, col2 type2, col3 type3, )USING parquet PARTITIONED BY ( col4 type4)LOCA...

  • 3613 Views
  • 2 replies
  • 0 kudos
Latest Reply
Chizzy
New Contributor II
  • 0 kudos

@data_extractor , How were you able to fix this problem? I am having same issue now

  • 0 kudos
1 More Replies
ArttuHer
by New Contributor III
  • 2577 Views
  • 2 replies
  • 0 kudos

Resolved! Error pushing changes Remote ref update was rejected. Make sure you have write access to this remote

Hello!I'm receiving the following error, when pushing from Databricks to GitHub. What I have done: Access token is set upRepository is publicI can pull from the repository to DatabricksI'm using standard tier"Error pushing changes Remote ref update w...

  • 2577 Views
  • 2 replies
  • 0 kudos
Latest Reply
ArttuHer
New Contributor III
  • 0 kudos

Hello!thanks for the answer. Just figured it out. It turned out that I accidentally had secrets in my code and it was not allowed by GitHub. After removing them, all worked out. Arttu 

  • 0 kudos
1 More Replies
KacperG
by New Contributor III
  • 572 Views
  • 1 replies
  • 0 kudos

%sh fails to install mdbtools locally

HiI have a notebook that uses mdbtools:%sh sudo apt-get -y -S install mdbtoolsHowever, when I want to run it locally it returns error:sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure...

  • 572 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @KacperG ,Most likely the job is using different privileges than you.In your command -S needs to be specified before apt-get. Have you tried running: %sh sudo -S apt-get -y install mdbtools If it is just a typo, here are the options:1.Try to run t...

  • 0 kudos
semsim
by Contributor
  • 8632 Views
  • 9 replies
  • 1 kudos

Resolved! Init Script Failing

I am getting an error when I try to run the cluster scoped init script. The script itself is as follows:#!/bin/bashsudo apt update && sudo apt upgrade -ysudo apt install libreoffice-common libreoffice-java-common libreoffice-writer openjdk-8-jre-head...

  • 8632 Views
  • 9 replies
  • 1 kudos
Latest Reply
zmsoft
Contributor
  • 1 kudos

Hi @semsim , @jacovangelder ,I added the code you mentioned at the beginning of the script, but I still got errors. #!/bin/bash sudo rm -r /var/lib/apt/lists/* sudo apt clean && sudo apt update --fix-missing -y if ! [[ "18.04 20.04 22.04 23.04 24.04...

  • 1 kudos
8 More Replies
pragarwal
by New Contributor II
  • 5284 Views
  • 6 replies
  • 1 kudos

Adding Member to group using account databricks rest api

Hi All,I want to add a member to a group in databricks account level using rest api (https://docs.databricks.com/api/azure/account/accountgroups/patch) as mentioned in this link I could able to authenticate but not able to add member while using belo...

  • 5284 Views
  • 6 replies
  • 1 kudos
Latest Reply
Nikos
New Contributor II
  • 1 kudos

Does the above work? I still can't quite figure it out. Any help would be much appreciated.I know authentication is not an issue as I can use a lot of the other endpoints. I just can't figure out the correct body syntax to add a member to a group.url...

  • 1 kudos
5 More Replies
sashikanth
by New Contributor II
  • 836 Views
  • 2 replies
  • 0 kudos

Streaming or Batch Processing

How to decide whether to go for Streaming or Batch processing when the upstream is DELTA table?Please share suggestions to optimize the load timings.

  • 836 Views
  • 2 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Structured Streaming is one of the options, spark.readStream.format("delta")

  • 0 kudos
1 More Replies
priyanananthram
by New Contributor II
  • 8731 Views
  • 4 replies
  • 1 kudos

Delta live tables for large number of tables

Hi There I am hoping for some guidance I have some 850 tables that I need to ingest using  a DLT Pipeline. When I do this my event log shows that driver node dies becomes unresponsive likely due to GC.Can DLT be used to ingest large number of tablesI...

  • 8731 Views
  • 4 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

Delta Live Tables (DLT) can indeed be used to ingest a large number of tables. However, if you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the dr...

  • 1 kudos
3 More Replies
badari_narayan
by New Contributor II
  • 1926 Views
  • 6 replies
  • 1 kudos

How to create SQL Functions using Pysparkin local machine

I am trying to create spark SQL function in particular schema (i.e) spark.sql(" CREATE OR REPLACE FUNCTION <spark_catalog>.<schema_name>.<function_name()> RETURNS STRING RETURN <value>")This works perfectly fine on Databricks using notebooks.But, I n...

  • 1926 Views
  • 6 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @badari_narayan ,In general you may run pyspark project locally, but with limitations.Create virtual environmentInstall pyspark in your virtual environment (the same version you have on your cluster)Since spark version 2.x you even do not need to ...

  • 1 kudos
5 More Replies
sms101
by New Contributor
  • 1185 Views
  • 1 replies
  • 0 kudos

Table lineage visibility in Databricks

I’ve observed differences in table lineage visibility in Databricks based on how data is referenced, and I would like to confirm if this is the expected behavior.1. When referencing a Delta table as the source in a query (e.g., df = spark.table("cata...

  • 1185 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi @sms101,How are you doing today?As per my understanding, It is correct that lineage tracking in Databricks works primarily at the table level, meaning when you reference a Delta table directly, the lineage is properly captured. However, when you u...

  • 0 kudos
Bilel
by New Contributor II
  • 1262 Views
  • 1 replies
  • 2 kudos

Python library not installed when compute is resized

 Hi,I have a python notebook workflow that uses a job cluster. The cluster lost at least a node (due to Spot Instance Termination) and did an upsize. After that I got an error in my job "Module not found", but the python module was being used before ...

  • 1262 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 2 kudos

Hi @Bilel,How are you doing today?As per my understanding, Consider installing the library at the cluster level to ensure it's automatically applied across all nodes when a new one is added. You could also try using init scripts to guarantee the requ...

  • 2 kudos
fperry
by New Contributor III
  • 670 Views
  • 1 replies
  • 0 kudos

Question about stateful processing

I'm experiencing an issue that I don't understand. I am using Python's arbitrary stateful processing with structured streaming to calculate metrics for each item/ID. A timeout is set, after which I clear the state for that item/ID and display each ID...

  • 670 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi @fperry,How are you doing today?As per my understanding, Consider checking for any differences in how the stateful streaming function is writing and persisting data. It's possible that while the state is cleared after the timeout, some state might...

  • 0 kudos
gabrieleladd
by New Contributor II
  • 2732 Views
  • 3 replies
  • 1 kudos

Clearing data stored by pipelines

Hi everyone! I'm new to Databricks and moving my first steps with Delta Live Tables, so please forgive my inexperience. I'm building my first DLT pipeline and there's something that I can't really grasp: how to clear all the objects generated or upda...

Data Engineering
Data Pipelines
Delta Live Tables
  • 2732 Views
  • 3 replies
  • 1 kudos
Latest Reply
ChKing
New Contributor II
  • 1 kudos

To clear all objects generated or updated by the DLT pipeline, you can drop the tables manually using the DROP command as you've mentioned. However, to get a completely clean slate, including metadata like the tracking of already processed files in t...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels