cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vinaykumar
by New Contributor III
  • 7431 Views
  • 6 replies
  • 6 kudos

Reading Iceberg table present in S3 from databricks console using spark given none error .

Hi Team , I am facing issue while reading iceberg table from S3 and getting none error when read the data . below steps I followed .Added Iceberg Spark connector library to your Databricks cluster. 2. Cluster Configuration to Enable Iceberg ...

image image
  • 7431 Views
  • 6 replies
  • 6 kudos
Latest Reply
Ohad-upriver
New Contributor II
  • 6 kudos

I want to use both unity catalog and iceberg that is in a S3 path.To use unity catalog I can't use access mode "No isolation shared".Is there a solution for this?

  • 6 kudos
5 More Replies
negrinij
by New Contributor II
  • 30734 Views
  • 4 replies
  • 2 kudos

Understanding Used Memory in Databricks Cluster

Hello, I wonder if anyone could give me any insights regarding used memory and how could I change my code to "release" some memory as the code runs. I am using a Databricks Notebook.Basically, what we need to do is perform a query, create a spark sql...

image.png image
  • 30734 Views
  • 4 replies
  • 2 kudos
Latest Reply
loic
New Contributor III
  • 2 kudos

I have exactly the same kind of problem.I really do not understand why my driver goes out of memory meanwhile I do not cache anything in Spark.Since I don't cache anything, I expect references to objects that are not used anymore to be freed.Even a s...

  • 2 kudos
3 More Replies
michaelh
by New Contributor III
  • 4283 Views
  • 5 replies
  • 4 kudos

Resolved! AWS Databricks Cluster terminated.Reason:Container launch failure

We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environment but we're not able to make it work in client's environment. It's large corporation with...

  • 4283 Views
  • 5 replies
  • 4 kudos
Latest Reply
NandiniN
Databricks Employee
  • 4 kudos

This appears to be an issue with the security group. Kindly review security group inbound/outbound rules.

  • 4 kudos
4 More Replies
naveenreddy1
by New Contributor II
  • 18559 Views
  • 4 replies
  • 0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

  • 18559 Views
  • 4 replies
  • 0 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

  • 0 kudos
3 More Replies
Confused
by New Contributor III
  • 47288 Views
  • 6 replies
  • 3 kudos

Resolved! Configuring pip index-url and using artifacts-keyring

Hi I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where i...

  • 47288 Views
  • 6 replies
  • 3 kudos
Latest Reply
murtazahzaveri
New Contributor II
  • 3 kudos

For Authentication you can provide below config on cluster's Spark Environment Variables,PIP_EXTRA_INDEX_URL=https://username:password@pkgs.sample.com/sample/_packaging/artifactory_name/pypi/simple/.Also, you can store the value in Databricks secret

  • 3 kudos
5 More Replies
ae20cg
by New Contributor III
  • 4337 Views
  • 5 replies
  • 9 kudos

Databricks Cluster Web terminal different permissions with tmux and xterm.

I am launching web terminal on my databricks cluster and when I am using the ephemeral xterm instance I am easily able to navigate to desired directory in `Workspace` and run anything... for example `ls ./` When I switch to tmux so that I can preserv...

  • 4337 Views
  • 5 replies
  • 9 kudos
Latest Reply
alenka
New Contributor III
  • 9 kudos

Hey there, fellow data explorer pals! I totally get your excitement when launching that web terminal on your Databricks cluster and feeling the power of running commands like 'ls ./' in the ephemeral xterm instance. It's like traversing the vast univ...

  • 9 kudos
4 More Replies
RajeshRK
by Contributor II
  • 8545 Views
  • 3 replies
  • 0 kudos

Need help to analyze databricks logs for a long-running job.

Hi Team,We have a job it completes in 3 minutes in one Databricks cluster, if we run the same job in another databricks cluster it is taking 3 hours to complete.I am quite new to Databricks and need your guidance on how to find out where databricks s...

  • 8545 Views
  • 3 replies
  • 0 kudos
Latest Reply
AmitKP
New Contributor II
  • 0 kudos

Hi @Retired_mod ,I am saving logs of my databricks Job Compute From ADF, How can i open those files that present in dbfs location.

  • 0 kudos
2 More Replies
pokus
by New Contributor III
  • 8196 Views
  • 2 replies
  • 2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

  • 8196 Views
  • 2 replies
  • 2 kudos
Latest Reply
dbal
New Contributor III
  • 2 kudos

Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.

  • 2 kudos
1 More Replies
Rajaniesh
by New Contributor III
  • 2806 Views
  • 2 replies
  • 1 kudos

URGENT HELP NEEDED: Python functions deployed in the cluster throwing the error

Hi,I have created a python wheel with the following code. And the package name is rule_engine"""The entry point of the Python Wheel"""import sysfrom pyspark.sql.functions import expr, coldef get_rules(tag): """  loads data quality rules from a table ...

  • 2806 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

You can find more details and examples here https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#use-a-python-wheel-in-a-databricks-job

  • 1 kudos
1 More Replies
dukebaslangic
by New Contributor II
  • 2231 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks performance related documentation/books

Hi,Do you know any good resources about Databricks performance improvements(like improving query performances, monitoring/resolving performance bottlenecks etc)?Thanks

  • 2231 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ömer Özsakarya​  We haven't heard from you since the last response from @Lakshay Goel​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

  • 3 kudos
2 More Replies
Kaijser
by New Contributor II
  • 2073 Views
  • 1 replies
  • 2 kudos

Installing private python Azure DevOps repository without revealing personal access token in pyproject.toml

I want to install a .whl file on my Databricks cluster which includes a private Azure DevOps repository as a dependency in its pyproject.toml file, i.e.:[project] name = "test" description = "test_description." version = "0.1.0" authors = [ { name ...

  • 2073 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Aaron Kaijser​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
Phani1
by Valued Contributor II
  • 3061 Views
  • 2 replies
  • 1 kudos

Integration Dolly with Databricks

Hi Databricks Team,Could you please share any links /docs/Sample notebooks to integrate Dolly with Databricks, our aim is to generate SQL queries based on the free text and execute it via databricks cluster/SQL warehouse.

  • 3061 Views
  • 2 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

https://www.dbdemos.ai/demo.html?demoName=llm-dolly-chatbot is a good demonstration of Dolly (or really any LLM) for question answering. LLMs like this are not for SQL generation, but other LLMs are, like starcoderbase

  • 1 kudos
1 More Replies
ros
by New Contributor III
  • 2801 Views
  • 2 replies
  • 3 kudos

Apache Hudi Table creation using hudi maven library

I installed hudi maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 in Dbricks Runtime Ver : 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) with spark config :spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCat...

  • 2801 Views
  • 2 replies
  • 3 kudos
Latest Reply
ros
New Contributor III
  • 3 kudos

@Shanmugavel Chandrakasu​ %sql create table hudi_cow_pt_tbl ( id bigint, name string, ts bigint, dt string, hh string ) using hudi tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' ) partitioned by (dt, hh) location '/mnt/data/h...

  • 3 kudos
1 More Replies
de-hru
by New Contributor III
  • 26892 Views
  • 4 replies
  • 1 kudos

Resolved! How to add pre-commit hook to the Git Client on Databricks Cluster?

I'd like to add a Git pre-commit hook to the Databricks Cluster.This pre-commit hook should be executed when pushing to GitHub.Why would I need a pre-commit hook on a Databricks Cluster?My goal is to run blackbricks and format all notebooks automatic...

  • 26892 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Dejan Hrubenja​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
3 More Replies
Anonymous
by Not applicable
  • 8621 Views
  • 8 replies
  • 0 kudos

Not able to connect to On-Prem Oracle from Databricks cluster

Hi Everyone,I was trying to connect to Oracle Instance from Databricks cluster and it is giving below error:java.sql.SQLTimeoutException: ORA-12170: Cannot connect. TCP connect timeout of 30000ms for host xx.x.x.*** port 1521. (CONNECTION_ID=CgM7V7UB...

  • 8621 Views
  • 8 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Satya89:The error message you received indicates that the TCP connection to the Oracle database timed out. This could be caused by a number of factors such as network issues, firewall restrictions, or the database being overloaded.Here are a few ste...

  • 0 kudos
7 More Replies
Labels