cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pantelis_mare
by Contributor III
  • 6002 Views
  • 6 replies
  • 2 kudos

Resolved! Delta log statistics - timestamp type not working

Hello team!As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.This is the case for numerical types, and timestamp is sup...

max value image.png
  • 6002 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

are you sure the timestamp column is a valid spark-timestamp-type?

  • 2 kudos
5 More Replies
niels
by New Contributor III
  • 4132 Views
  • 3 replies
  • 10 kudos

Resolved! Change cluster mid-pipeline

I have a notebook functioning as a pipeline, where multiple notebooks are chained together. The issue I'm facing is that some of the notebooks are spark-optimized, others aren't, and what I want is to use 1 cluster for the former and another for the ...

  • 4132 Views
  • 3 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

Yes, you can achieve this by setting two different job clusters. In the screenshot, you can see I have used 2 job clusters PipelineTest and pipelinetest2. You can refer the doc https://docs.databricks.com/jobs.html#cluster-config-tips

  • 10 kudos
2 More Replies
ThomasKastl
by Contributor
  • 5496 Views
  • 4 replies
  • 4 kudos

Calling Databricks API from Databricks notebook

A similar question has already been added, but the reply is very confusing to me. Basically, for automated jobs, I want to log the following information from inside a Python notebook that runs in the job: - What is the cluster configuration (most im...

  • 5496 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

hi @Thomas Kastl​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
3 More Replies
shawncao
by New Contributor II
  • 2079 Views
  • 3 replies
  • 2 kudos

best practice of using data bricks API

Hello, I'm building a Databricks connector to allow users to issue command/SQL from a web app.In general, I think the REST API is okay to work with, though it's pretty tedious to write wrap code for each API call.[Q1]Is there an official (or semi-off...

  • 2079 Views
  • 3 replies
  • 2 kudos
Latest Reply
shawncao
New Contributor II
  • 2 kudos

I don't know if I fully understand DBX, sounds like a job client to manage jobs and deployment and I don't see NodeJS support for this project yet. My question was about how to "stream" query results back from Databricks in a NodeJs application, curr...

  • 2 kudos
2 More Replies
rgrosskopf
by New Contributor II
  • 5459 Views
  • 2 replies
  • 1 kudos

How to access secrets in Hashicorp Vault from Databricks notebooks?

I see in this blog post that Databricks supports Hashicorp Vault for secrets storage but I've been unable to find any additional details on how that would work. Specifically, how would I authenticate to Vault from within a Databricks notebook?

  • 5459 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Ryan Grosskopf​,Just a friendly follow-up. Did any Prabakar's responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
1 More Replies
KamKam
by New Contributor
  • 1324 Views
  • 2 replies
  • 0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name'   df.write \ .mode(write_mode) \ .format(write_format) \ ....

  • 1324 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Kamalen Reddy​ ,Could you share the error message please?

  • 0 kudos
1 More Replies
dududu
by New Contributor II
  • 1245 Views
  • 1 replies
  • 0 kudos

How to explain the huge time latency between two jobs? How to optimize the job to reduce the latency ?

I have met a problem , you can see in the picture as followed: there is some long delay between some jobs , I don't understand what happened and how to optimize the job ? Can anybody help me ? Thanks a lot.

image
  • 1245 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @jieping zhang​,Did you check the driver's logs? do you see any error messages? please provide more details

  • 0 kudos
wyzer
by Contributor II
  • 5031 Views
  • 8 replies
  • 4 kudos

Unable to read an XML file of 9 GB

Hello,We have a large XML file (9 GB) that we can't read.We have this error : VM size limitBut how can we change the VM size limit ?We have tested many clusters, but no one can read this file.Thank you for your help.

  • 5031 Views
  • 8 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Salah K.​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
7 More Replies
MarcJustice
by New Contributor
  • 1626 Views
  • 2 replies
  • 3 kudos

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direc...

  • 1626 Views
  • 2 replies
  • 3 kudos
Latest Reply
Aashita
Databricks Employee
  • 3 kudos

@Marc Barnett​ , Databricks’ Lakehouse architecture is the ideal data architecture for data-driven organizations. It combines the best qualities of data warehouses and data lakes to provide a single solution for all major data workloads and supports ...

  • 3 kudos
1 More Replies
celerity12
by New Contributor II
  • 6015 Views
  • 7 replies
  • 4 kudos

Pulling list of running jobs using JOBS API 2.1

I need to find out all jobs which are currently running and not get other jobsThe below command fetches all the jobscurl --location --request GET 'https://xxxxxx.gcp.databricks.com/api/2.1/jobs/list?active_only=true&expand_tasks=true&run_type=JOB_RUN...

  • 6015 Views
  • 7 replies
  • 4 kudos
Latest Reply
User16764241763
Honored Contributor
  • 4 kudos

Hi @Sumit Rohatgi​ It seems like active_only=true only applies to jobs/runs/list API and not to jobs/list.Can you please try the jobs/runs/list API?

  • 4 kudos
6 More Replies
C_1
by New Contributor III
  • 4850 Views
  • 5 replies
  • 4 kudos

Resolved! Databricks notebook command logging

Hello Community,I am trying to search for Databricks notebook command logging feature for compliance purpose.My requirement is to log the exact spark sql fired by user.I didnt get spark sql (notebook command) tracked under this azure diagnostic logs....

  • 4850 Views
  • 5 replies
  • 4 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 4 kudos

Hi @C P​ we don't have this feature implemented, however, there is already an existing idea available in our idea portal here: https://databricks.aha.io/features/DB-7583.You can check and vote the same.

  • 4 kudos
4 More Replies
CHANDY
by New Contributor
  • 1393 Views
  • 1 replies
  • 0 kudos

Real Time data processing

Say I am getting a customer record from an website. I want to read the massage & then insert/update that one to snowflake table , depending on the records insert/update is successful I need to respond back the success / failure massage in say 1 sec. ...

  • 1393 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey @CHANDAN NANDY​ Just checking in with you.Does @Kaniz Fatma​'s answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you further.Thanks!

  • 0 kudos
JohnB
by New Contributor II
  • 3016 Views
  • 1 replies
  • 1 kudos

Are there implications moving Managed Table, and mounting as External.

The scenario is "A substaincial amount of data needs to be moved from a legacy Databricks that has Managed Tables, to a new E2 Databrick. The new bucket will be a dedicated Datalake rather than the Workspace Bucket so they will be External Tables."U...

  • 3016 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @John Brandborg​ Hope everything is going great! Just wanted to check in if you were able to resolve your issue would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to ...

  • 1 kudos
Ravi96
by New Contributor II
  • 3878 Views
  • 4 replies
  • 5 kudos

How can we sort the timeout issue in Databricks

we are creating a denorm table based on a JSON ingestion but the complex table is getting generated .when we try to deflatten the JSON rows it is taking for more than 5 hours and the error message is timeout erroris there any way that we could resolv...

  • 3878 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hey @Raviteja Paluri​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. Thanks!

  • 5 kudos
3 More Replies
ta_db
by New Contributor
  • 1766 Views
  • 1 replies
  • 0 kudos

Databricks SQL Endpoint Failing to create an external table on a parquet file with Decimal or Timestamp datatype

I'm using the Databricks SQL Endpoint and I'm attempting to create an external table on top of an existing parquet file. I can do this so long as my table definition does not include a reference to a decimal or timestamp/date datatype.ex. This worksC...

  • 1766 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @T A​ Hope everything is going great!Does @Kaniz Fatma​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? If not, would you be happy to give us more info...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels