cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

naveenreddy1
by New Contributor II
  • 18091 Views
  • 4 replies
  • 0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

  • 18091 Views
  • 4 replies
  • 0 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

  • 0 kudos
3 More Replies
Confused
by New Contributor III
  • 34807 Views
  • 6 replies
  • 3 kudos

Resolved! Configuring pip index-url and using artifacts-keyring

Hi I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where i...

  • 34807 Views
  • 6 replies
  • 3 kudos
Latest Reply
murtazahzaveri
New Contributor II
  • 3 kudos

For Authentication you can provide below config on cluster's Spark Environment Variables,PIP_EXTRA_INDEX_URL=https://username:password@pkgs.sample.com/sample/_packaging/artifactory_name/pypi/simple/.Also, you can store the value in Databricks secret

  • 3 kudos
5 More Replies
ae20cg
by New Contributor III
  • 3753 Views
  • 5 replies
  • 9 kudos

Databricks Cluster Web terminal different permissions with tmux and xterm.

I am launching web terminal on my databricks cluster and when I am using the ephemeral xterm instance I am easily able to navigate to desired directory in `Workspace` and run anything... for example `ls ./` When I switch to tmux so that I can preserv...

  • 3753 Views
  • 5 replies
  • 9 kudos
Latest Reply
alenka
New Contributor III
  • 9 kudos

Hey there, fellow data explorer pals! I totally get your excitement when launching that web terminal on your Databricks cluster and feeling the power of running commands like 'ls ./' in the ephemeral xterm instance. It's like traversing the vast univ...

  • 9 kudos
4 More Replies
negrinij
by New Contributor
  • 26190 Views
  • 3 replies
  • 0 kudos

Understanding Used Memory in Databricks Cluster

Hello, I wonder if anyone could give me any insights regarding used memory and how could I change my code to "release" some memory as the code runs. I am using a Databricks Notebook.Basically, what we need to do is perform a query, create a spark sql...

image.png image
  • 26190 Views
  • 3 replies
  • 0 kudos
Latest Reply
JKR
Contributor
  • 0 kudos

Did anyone find the solution for mentioned issue?

  • 0 kudos
2 More Replies
RajeshRK
by Contributor
  • 7976 Views
  • 3 replies
  • 0 kudos

Need help to analyze databricks logs for a long-running job.

Hi Team,We have a job it completes in 3 minutes in one Databricks cluster, if we run the same job in another databricks cluster it is taking 3 hours to complete.I am quite new to Databricks and need your guidance on how to find out where databricks s...

  • 7976 Views
  • 3 replies
  • 0 kudos
Latest Reply
AmitKP
New Contributor II
  • 0 kudos

Hi @Retired_mod ,I am saving logs of my databricks Job Compute From ADF, How can i open those files that present in dbfs location.

  • 0 kudos
2 More Replies
pokus
by New Contributor III
  • 7333 Views
  • 2 replies
  • 2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

  • 7333 Views
  • 2 replies
  • 2 kudos
Latest Reply
dbal
New Contributor III
  • 2 kudos

Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.

  • 2 kudos
1 More Replies
vinaykumar
by New Contributor III
  • 6016 Views
  • 3 replies
  • 6 kudos

Reading Iceberg table present in S3 from databricks console using spark given none error .

Hi Team , I am facing issue while reading iceberg table from S3 and getting none error when read the data . below steps I followed .Added Iceberg Spark connector library to your Databricks cluster. 2. Cluster Configuration to Enable Iceberg ...

image image
  • 6016 Views
  • 3 replies
  • 6 kudos
Latest Reply
Ambesh
New Contributor III
  • 6 kudos

Hi @Retired_mod I am using Databricks Runtime 10.4 ( Spark 3.2 ), so I have downloaded “iceberg-spark-runtime-3.2_2.12”Also the table exists in the S3 bkt. The error msg is:  java.util.NoSuchElementException: None.getI am also attaching a screenshot ...

  • 6 kudos
2 More Replies
Rajaniesh
by New Contributor III
  • 2525 Views
  • 2 replies
  • 1 kudos

URGENT HELP NEEDED: Python functions deployed in the cluster throwing the error

Hi,I have created a python wheel with the following code. And the package name is rule_engine"""The entry point of the Python Wheel"""import sysfrom pyspark.sql.functions import expr, coldef get_rules(tag): """  loads data quality rules from a table ...

  • 2525 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

You can find more details and examples here https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#use-a-python-wheel-in-a-databricks-job

  • 1 kudos
1 More Replies
dukebaslangic
by New Contributor II
  • 1986 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks performance related documentation/books

Hi,Do you know any good resources about Databricks performance improvements(like improving query performances, monitoring/resolving performance bottlenecks etc)?Thanks

  • 1986 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ömer Özsakarya​  We haven't heard from you since the last response from @Lakshay Goel​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

  • 3 kudos
2 More Replies
Kaijser
by New Contributor II
  • 1846 Views
  • 1 replies
  • 2 kudos

Installing private python Azure DevOps repository without revealing personal access token in pyproject.toml

I want to install a .whl file on my Databricks cluster which includes a private Azure DevOps repository as a dependency in its pyproject.toml file, i.e.:[project] name = "test" description = "test_description." version = "0.1.0" authors = [ { name ...

  • 1846 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Aaron Kaijser​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
Phani1
by Valued Contributor II
  • 2544 Views
  • 2 replies
  • 1 kudos

Integration Dolly with Databricks

Hi Databricks Team,Could you please share any links /docs/Sample notebooks to integrate Dolly with Databricks, our aim is to generate SQL queries based on the free text and execute it via databricks cluster/SQL warehouse.

  • 2544 Views
  • 2 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

https://www.dbdemos.ai/demo.html?demoName=llm-dolly-chatbot is a good demonstration of Dolly (or really any LLM) for question answering. LLMs like this are not for SQL generation, but other LLMs are, like starcoderbase

  • 1 kudos
1 More Replies
ros
by New Contributor III
  • 2459 Views
  • 2 replies
  • 3 kudos

Apache Hudi Table creation using hudi maven library

I installed hudi maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 in Dbricks Runtime Ver : 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) with spark config :spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCat...

  • 2459 Views
  • 2 replies
  • 3 kudos
Latest Reply
ros
New Contributor III
  • 3 kudos

@Shanmugavel Chandrakasu​ %sql create table hudi_cow_pt_tbl ( id bigint, name string, ts bigint, dt string, hh string ) using hudi tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' ) partitioned by (dt, hh) location '/mnt/data/h...

  • 3 kudos
1 More Replies
de-hru
by New Contributor III
  • 25515 Views
  • 4 replies
  • 1 kudos

Resolved! How to add pre-commit hook to the Git Client on Databricks Cluster?

I'd like to add a Git pre-commit hook to the Databricks Cluster.This pre-commit hook should be executed when pushing to GitHub.Why would I need a pre-commit hook on a Databricks Cluster?My goal is to run blackbricks and format all notebooks automatic...

  • 25515 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Dejan Hrubenja​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
3 More Replies
Anonymous
by Not applicable
  • 7329 Views
  • 8 replies
  • 0 kudos

Not able to connect to On-Prem Oracle from Databricks cluster

Hi Everyone,I was trying to connect to Oracle Instance from Databricks cluster and it is giving below error:java.sql.SQLTimeoutException: ORA-12170: Cannot connect. TCP connect timeout of 30000ms for host xx.x.x.*** port 1521. (CONNECTION_ID=CgM7V7UB...

  • 7329 Views
  • 8 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Satya89:The error message you received indicates that the TCP connection to the Oracle database timed out. This could be caused by a number of factors such as network issues, firewall restrictions, or the database being overloaded.Here are a few ste...

  • 0 kudos
7 More Replies
johnb1
by Contributor
  • 2403 Views
  • 3 replies
  • 0 kudos

Cluster Configuration for ML Model Training

Hi!I am training a Random Forest (pyspark.ml.classification.RandomForestClassifier) on Databricks with 1,000,000 training examples and 25 features. I employ a cluster with one driver (16 GB Memory, 4 Cores), 2-6 workers (32-96 GB Memory, 8-24 Cores),...

  • 2403 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @John B​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can...

  • 0 kudos
2 More Replies
Labels