cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

devashishraverk
by New Contributor II
  • 2473 Views
  • 2 replies
  • 2 kudos

Not able to create SQL Endpoint in Databricks SQL (Databricks 14-day free trial)

Hi,I am not able to create SQL Endpoint getting below error, I have selected Cluster size as 2X-Small on Azure platform:Clusters are failing to launch. Cluster launch will be retried. Details for the latest failure: Error: Error code: PublicIPCountLi...

  • 2473 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Devashish Raverkar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

  • 2 kudos
1 More Replies
Direo
by Contributor II
  • 10982 Views
  • 3 replies
  • 2 kudos

Default indentation for Python has changed after migration to the new workspace

In our old workspace default identation was 2 spaces. In our new one it has changed to 4 spaces. Of course you can manually change it back to 2 spaces as we used to have, but it does not work. Does anyone know how to solve this issue?

  • 10982 Views
  • 3 replies
  • 2 kudos
Latest Reply
ranged_coop
Valued Contributor II
  • 2 kudos

You do have that option of Settings --> User Settings (Admin Settings ? not sure - I don't have admin access) --> Notebook Settings --> Default indentation for Python cells (in spaces)This will change the indentation for newer cells, but existing one...

  • 2 kudos
2 More Replies
Michael_Galli
by Contributor III
  • 4076 Views
  • 4 replies
  • 2 kudos

Resolved! Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?

When writing unit tests with unittest / pytest in PySpark, reading mockup datasources with built-in datatypes like csv, json (spark.read.format("json")) works just fine.But when reading XML´s with spark.read.format("com.databricks.spark.xml") in the ...

  • 4076 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Please install spark-xml from Maven. As it is from Maven you need to install it for cluster which you are using in cluster settings (alternatively using API or CLI)https://mvnrepository.com/artifact/com.databricks/spark-xml

  • 2 kudos
3 More Replies
mj2022
by New Contributor III
  • 3219 Views
  • 2 replies
  • 2 kudos

Spark Streaming with SASL_SSL Kafka throwing java.nio.file.NoSuchFileException: dbfs:/mnt/**/kafka.client.truststore.imported.jks

I testing Spark Streaming working withSASL_SSL enabled kafka broker in a notebook.as per this guide https://docs.databricks.com/spark/latest/structured-streaming/kafka.htmli have copied jsk files in an s3 bucket and mounted it in dbfs.In notebook wh...

  • 3219 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

Thanks..Yes '/dbfs/mnt/xxxx/kafka.client.truststore.imported.jks'  path worked. Also other workaround we got it working, is copy the file from s3 to filesystem using init script and use filepath.

  • 2 kudos
1 More Replies
whatthespark
by New Contributor II
  • 5997 Views
  • 4 replies
  • 1 kudos

Inconsistent duplicated row with Spark (Databricks on MS Azure)

I'm having a weird behavior with Apache Spark, which I run in a Python Notebook on Azure Databricks. I have a dataframe with some data, with 2 columns of interest: name and ftimeI found that I sometime have duplicated values, sometime not, depending ...

  • 5997 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I would like to see how you create the df dataframe.In pyspark you can get weird results if you do not clear state, or when you reuse dataframe names.

  • 1 kudos
3 More Replies
jwilliam
by Contributor
  • 3670 Views
  • 3 replies
  • 2 kudos

Resolved! Cannot use Web Terminal when creating cluster with Custom container.

I follow this guide to create cluster with custom container: https://docs.databricks.com/clusters/custom-containers.htmlHowever, when cluster created, I coudn't access to web terminal. It resulted in 502 bad gateway.

image
  • 3670 Views
  • 3 replies
  • 2 kudos
Latest Reply
Ravi
Databricks Employee
  • 2 kudos

This is a limitation at the moment. Enabling Docker Container Services disables web terminal.https://docs.databricks.com/clusters/web-terminal.html#limitations

  • 2 kudos
2 More Replies
pantelis_mare
by Contributor III
  • 7132 Views
  • 6 replies
  • 2 kudos

Resolved! Delta log statistics - timestamp type not working

Hello team!As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.This is the case for numerical types, and timestamp is sup...

max value image.png
  • 7132 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

are you sure the timestamp column is a valid spark-timestamp-type?

  • 2 kudos
5 More Replies
niels
by New Contributor III
  • 4865 Views
  • 3 replies
  • 10 kudos

Resolved! Change cluster mid-pipeline

I have a notebook functioning as a pipeline, where multiple notebooks are chained together. The issue I'm facing is that some of the notebooks are spark-optimized, others aren't, and what I want is to use 1 cluster for the former and another for the ...

  • 4865 Views
  • 3 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

Yes, you can achieve this by setting two different job clusters. In the screenshot, you can see I have used 2 job clusters PipelineTest and pipelinetest2. You can refer the doc https://docs.databricks.com/jobs.html#cluster-config-tips

  • 10 kudos
2 More Replies
ThomasKastl
by Contributor
  • 6215 Views
  • 4 replies
  • 4 kudos

Calling Databricks API from Databricks notebook

A similar question has already been added, but the reply is very confusing to me. Basically, for automated jobs, I want to log the following information from inside a Python notebook that runs in the job: - What is the cluster configuration (most im...

  • 6215 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

hi @Thomas Kastl​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
3 More Replies
shawncao
by New Contributor II
  • 2483 Views
  • 3 replies
  • 2 kudos

best practice of using data bricks API

Hello, I'm building a Databricks connector to allow users to issue command/SQL from a web app.In general, I think the REST API is okay to work with, though it's pretty tedious to write wrap code for each API call.[Q1]Is there an official (or semi-off...

  • 2483 Views
  • 3 replies
  • 2 kudos
Latest Reply
shawncao
New Contributor II
  • 2 kudos

I don't know if I fully understand DBX, sounds like a job client to manage jobs and deployment and I don't see NodeJS support for this project yet. My question was about how to "stream" query results back from Databricks in a NodeJs application, curr...

  • 2 kudos
2 More Replies
rgrosskopf
by New Contributor II
  • 6504 Views
  • 2 replies
  • 1 kudos

How to access secrets in Hashicorp Vault from Databricks notebooks?

I see in this blog post that Databricks supports Hashicorp Vault for secrets storage but I've been unable to find any additional details on how that would work. Specifically, how would I authenticate to Vault from within a Databricks notebook?

  • 6504 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Ryan Grosskopf​,Just a friendly follow-up. Did any Prabakar's responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
1 More Replies
KamKam
by New Contributor
  • 1658 Views
  • 2 replies
  • 0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name'   df.write \ .mode(write_mode) \ .format(write_format) \ ....

  • 1658 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Kamalen Reddy​ ,Could you share the error message please?

  • 0 kudos
1 More Replies
dududu
by New Contributor II
  • 1542 Views
  • 1 replies
  • 0 kudos

How to explain the huge time latency between two jobs? How to optimize the job to reduce the latency ?

I have met a problem , you can see in the picture as followed: there is some long delay between some jobs , I don't understand what happened and how to optimize the job ? Can anybody help me ? Thanks a lot.

image
  • 1542 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @jieping zhang​,Did you check the driver's logs? do you see any error messages? please provide more details

  • 0 kudos
wyzer
by Contributor II
  • 5917 Views
  • 8 replies
  • 4 kudos

Unable to read an XML file of 9 GB

Hello,We have a large XML file (9 GB) that we can't read.We have this error : VM size limitBut how can we change the VM size limit ?We have tested many clusters, but no one can read this file.Thank you for your help.

  • 5917 Views
  • 8 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Salah K.​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
7 More Replies
MarcJustice
by New Contributor
  • 1897 Views
  • 2 replies
  • 3 kudos

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direc...

  • 1897 Views
  • 2 replies
  • 3 kudos
Latest Reply
Aashita
Databricks Employee
  • 3 kudos

@Marc Barnett​ , Databricks’ Lakehouse architecture is the ideal data architecture for data-driven organizations. It combines the best qualities of data warehouses and data lakes to provide a single solution for all major data workloads and supports ...

  • 3 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels