cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RajeshRK
by Contributor
  • 5424 Views
  • 6 replies
  • 0 kudos

Resolved! Need help to analyze databricks logs for a long-running job.

Hi Team,We have a job it completes in 3 minutes in one Databricks cluster, if we run the same job in another databricks cluster it is taking 3 hours to complete.I am quite new to Databricks and need your guidance on how to find out where databricks s...

  • 5424 Views
  • 6 replies
  • 0 kudos
Latest Reply
AmitKP
New Contributor II
  • 0 kudos

Hi @Kaniz ,I am saving logs of my databricks Job Compute From ADF, How can i open those files that present in dbfs location.

  • 0 kudos
5 More Replies
pokus
by New Contributor III
  • 3415 Views
  • 3 replies
  • 2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

  • 3415 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbal
New Contributor III
  • 2 kudos

Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.

  • 2 kudos
2 More Replies
vinaykumar
by New Contributor III
  • 3431 Views
  • 4 replies
  • 3 kudos

Reading Iceberg table present in S3 from databricks console using spark given none error .

Hi Team , I am facing issue while reading iceberg table from S3 and getting none error when read the data . below steps I followed .Added Iceberg Spark connector library to your Databricks cluster. 2. Cluster Configuration to Enable Iceberg ...

image image
  • 3431 Views
  • 4 replies
  • 3 kudos
Latest Reply
Ambesh
New Contributor III
  • 3 kudos

Hi @Kaniz I am using Databricks Runtime 10.4 ( Spark 3.2 ), so I have downloaded “iceberg-spark-runtime-3.2_2.12”Also the table exists in the S3 bkt. The error msg is:  java.util.NoSuchElementException: None.getI am also attaching a screenshot for re...

  • 3 kudos
3 More Replies
Rajaniesh
by New Contributor III
  • 1405 Views
  • 3 replies
  • 1 kudos

URGENT HELP NEEDED: Python functions deployed in the cluster throwing the error

Hi,I have created a python wheel with the following code. And the package name is rule_engine"""The entry point of the Python Wheel"""import sysfrom pyspark.sql.functions import expr, coldef get_rules(tag): """  loads data quality rules from a table ...

  • 1405 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

You can find more details and examples here https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#use-a-python-wheel-in-a-databricks-job

  • 1 kudos
2 More Replies
ae20cg
by New Contributor III
  • 2058 Views
  • 4 replies
  • 4 kudos

Databricks Cluster Web terminal different permissions with tmux and xterm.

I am launching web terminal on my databricks cluster and when I am using the ephemeral xterm instance I am easily able to navigate to desired directory in `Workspace` and run anything... for example `ls ./` When I switch to tmux so that I can preserv...

  • 2058 Views
  • 4 replies
  • 4 kudos
Latest Reply
alenka
New Contributor II
  • 4 kudos

Hey there, fellow data explorer pals! I totally get your excitement when launching that web terminal on your Databricks cluster and feeling the power of running commands like 'ls ./' in the ephemeral xterm instance. It's like traversing the vast univ...

  • 4 kudos
3 More Replies
dukebaslangic
by New Contributor II
  • 1067 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks performance related documentation/books

Hi,Do you know any good resources about Databricks performance improvements(like improving query performances, monitoring/resolving performance bottlenecks etc)?Thanks

  • 1067 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ömer Özsakarya​  We haven't heard from you since the last response from @Lakshay Goel​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

  • 3 kudos
2 More Replies
Kaijser
by New Contributor II
  • 1040 Views
  • 1 replies
  • 2 kudos

Installing private python Azure DevOps repository without revealing personal access token in pyproject.toml

I want to install a .whl file on my Databricks cluster which includes a private Azure DevOps repository as a dependency in its pyproject.toml file, i.e.:[project] name = "test" description = "test_description." version = "0.1.0" authors = [ { name ...

  • 1040 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Aaron Kaijser​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
negrinij
by New Contributor
  • 10928 Views
  • 2 replies
  • 0 kudos

Understanding Used Memory in Databricks Cluster

Hello, I wonder if anyone could give me any insights regarding used memory and how could I change my code to "release" some memory as the code runs. I am using a Databricks Notebook.Basically, what we need to do is perform a query, create a spark sql...

image.png image
  • 10928 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@Juliana Negrini​ - with respect to the your sample code, you can use spark's distributed query capabilities to run the query using spark instead of pandas. so, you don't have to toggle between the pandas data frame and the spark data frame.

  • 0 kudos
1 More Replies
Phani1
by Valued Contributor
  • 1525 Views
  • 2 replies
  • 1 kudos

Integration Dolly with Databricks

Hi Databricks Team,Could you please share any links /docs/Sample notebooks to integrate Dolly with Databricks, our aim is to generate SQL queries based on the free text and execute it via databricks cluster/SQL warehouse.

  • 1525 Views
  • 2 replies
  • 1 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 1 kudos

https://www.dbdemos.ai/demo.html?demoName=llm-dolly-chatbot is a good demonstration of Dolly (or really any LLM) for question answering. LLMs like this are not for SQL generation, but other LLMs are, like starcoderbase

  • 1 kudos
1 More Replies
ros
by New Contributor III
  • 1609 Views
  • 2 replies
  • 3 kudos

Apache Hudi Table creation using hudi maven library

I installed hudi maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 in Dbricks Runtime Ver : 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) with spark config :spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCat...

  • 1609 Views
  • 2 replies
  • 3 kudos
Latest Reply
ros
New Contributor III
  • 3 kudos

@Shanmugavel Chandrakasu​ %sql create table hudi_cow_pt_tbl ( id bigint, name string, ts bigint, dt string, hh string ) using hudi tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' ) partitioned by (dt, hh) location '/mnt/data/h...

  • 3 kudos
1 More Replies
de-hru
by New Contributor III
  • 10001 Views
  • 4 replies
  • 1 kudos

Resolved! How to add pre-commit hook to the Git Client on Databricks Cluster?

I'd like to add a Git pre-commit hook to the Databricks Cluster.This pre-commit hook should be executed when pushing to GitHub.Why would I need a pre-commit hook on a Databricks Cluster?My goal is to run blackbricks and format all notebooks automatic...

  • 10001 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Dejan Hrubenja​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
3 More Replies
Kaniz
by Community Manager
  • 2421 Views
  • 5 replies
  • 11 kudos
  • 2421 Views
  • 5 replies
  • 11 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 11 kudos

Thanks @Kaniz Fatma​  for selecting this as the best answer, Keep adding questions by that we can put our views and people get guidance And the databricks community can grow more.

  • 11 kudos
4 More Replies
Anonymous
by Not applicable
  • 4074 Views
  • 8 replies
  • 0 kudos

Not able to connect to On-Prem Oracle from Databricks cluster

Hi Everyone,I was trying to connect to Oracle Instance from Databricks cluster and it is giving below error:java.sql.SQLTimeoutException: ORA-12170: Cannot connect. TCP connect timeout of 30000ms for host xx.x.x.*** port 1521. (CONNECTION_ID=CgM7V7UB...

  • 4074 Views
  • 8 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Satya89:The error message you received indicates that the TCP connection to the Oracle database timed out. This could be caused by a number of factors such as network issues, firewall restrictions, or the database being overloaded.Here are a few ste...

  • 0 kudos
7 More Replies
johnb1
by New Contributor III
  • 1414 Views
  • 3 replies
  • 0 kudos

Cluster Configuration for ML Model Training

Hi!I am training a Random Forest (pyspark.ml.classification.RandomForestClassifier) on Databricks with 1,000,000 training examples and 25 features. I employ a cluster with one driver (16 GB Memory, 4 Cores), 2-6 workers (32-96 GB Memory, 8-24 Cores),...

  • 1414 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @John B​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can...

  • 0 kudos
2 More Replies
zeta_load
by New Contributor II
  • 765 Views
  • 1 replies
  • 1 kudos

Resolved! Unique ID of table values is not unique anymore after merge every x-times

I have two tables with unique IDs:ID val ID val1 10 1 102 11 2 103 13 ...

  • 765 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Lukas Goldschmied​ :There are a few reasons why you might be experiencing this issue:Data Skew: Data skew is a common problem in distributed computing when one or more nodes in the cluster have more data to process than others. This can lead to long...

  • 1 kudos
Labels