cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Deloitte_DS
by Databricks Partner
  • 11260 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to install poppler-utils

Hi,I'm trying to install system level package "Poppler-utils" for the cluster. I added the following line to the init.sh script.sudo apt-get -f -y install poppler-utilsI got the following error: PDFInfoNotInstalledError: Unable to get page count. Is ...

  • 11260 Views
  • 5 replies
  • 1 kudos
Latest Reply
Raghavan93513
Databricks Employee
  • 1 kudos

Hi Team, If you use a single user cluster and use the below init script, it will work: sudo rm -r /var/lib/apt/lists/* sudo apt clean && sudo apt update --fix-missing -ysudo apt-get install poppler-utils tesseract-ocr -y But if you are using a shared...

  • 1 kudos
4 More Replies
vvk
by New Contributor II
  • 6790 Views
  • 2 replies
  • 0 kudos

Unable to upload a wheel file in Azure DevOps pipeline

Hi, I am trying to upload a wheel file to Databricks workspace using Azure DevOps release pipeline to use it in the interactive cluster. I tried "databricks workspace import" command, but looks like it does not support .whl files. Hence, I tried to u...

  • 6790 Views
  • 2 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

Hi @vvk - The HTTP 403 error typically indicates a permissions issue. Ensure that the SP has the necessary permissions to perform the fs cp operation on the specified path. Verify that the path specified in the fs cp command is correct and that the v...

  • 0 kudos
1 More Replies
stvayers
by New Contributor
  • 7049 Views
  • 1 replies
  • 0 kudos

How to mount AWS EFS via NFS on a Databricks Cluster

I'm trying to read in ~500 million small json files into an spark autoloader pipeline, and I seem to be slowed down massively by S3 request limits, so I want to explore using AWS EFS instead. I found this blog post: https://www.databricks.com/blog/20...

  • 7049 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

Hi @stvayers Please refer to this doc. https://docs.databricks.com/api/workspace/clusters/create It has instructions on how to mount using EFS.  

  • 0 kudos
Bepposbeste1993
by New Contributor III
  • 2818 Views
  • 4 replies
  • 0 kudos

Resolved! select 1 query not finishing

Hello,I have the issue that even a query like "select 1" is not finishing. The sql warehouse runs infinite. I have no idea where to look for any issues because in the SPARK UI I cant see any error.What is intresting is that also allpurpose clusters (...

  • 2818 Views
  • 4 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Bepposbeste1993, Do you have the case ID raised for this issue? 

  • 0 kudos
3 More Replies
cmilligan
by Contributor II
  • 5130 Views
  • 4 replies
  • 0 kudos

Undescriptive error when trying to insert overwrite into a table

I have a query that I'm trying to insert overwrite into a table. In an effort to try and speed up the query I added a range join hint. After adding it I started getting the error below.I can get around this though by creating a temporary view of the ...

Screenshot_20230118_104626
  • 5130 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Could you share your code and the full error stack trace please? Check the driver logs for the full stack trace.

  • 0 kudos
3 More Replies
pranitha
by New Contributor II
  • 1055 Views
  • 3 replies
  • 0 kudos
  • 1055 Views
  • 3 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

Hi @pranitha Use this query to get the cluster details along with cost info as well. WITH hourly_metrics AS (  SELECT    date_trunc('hour', usage_start_time) as hour,    usage_metadata.cluster_id,    sku_name,    MAX(usage_quantity) as max_usage,    ...

  • 0 kudos
2 More Replies
abelian-grape
by New Contributor III
  • 1392 Views
  • 1 replies
  • 0 kudos

Near real time processing with CDC from snowflake to databricks

Hi I would like to configure near real time streaming on Databricks to process data as soon as a new data finish processing on snowflake e.g. with DLT pipelins and Auto Loader. Which option would be better for this setup? Option A)Export the Snowpark...

  • 1392 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 0 kudos

it is like latency vs complexity and cost. you have to choose for yourself for me option A sounds reasonable

  • 0 kudos
Sans
by New Contributor III
  • 6028 Views
  • 9 replies
  • 3 kudos

Unable to create new compute in community databricks

Hi Team,I am unable to create computer in databricks community due to below error. Please advice.Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-0ab6798b2c762fb25 @ 10.172.246.217. Please check network connectivity between the ...

  • 6028 Views
  • 9 replies
  • 3 kudos
Latest Reply
drag7ter
Contributor
  • 3 kudos

The same get this error regularly in eu-west-1 workspace. So many issues. Did databricks try to check this issue, as it could be a bug? No any response so far? 

  • 3 kudos
8 More Replies
jyothib
by New Contributor II
  • 2952 Views
  • 2 replies
  • 3 kudos

Resolved! System tables latency

How much time is the latency of system tables#unitycatalog

  • 2952 Views
  • 2 replies
  • 3 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 3 kudos

@jyothib at the current moment, system tables are still under Public Preview stage (more details at: https://docs.databricks.com/en/admin/system-tables/index.html)We don’t offer data freshness SLOs for system tables at this point and there are no pla...

  • 3 kudos
1 More Replies
Kanna
by New Contributor II
  • 2347 Views
  • 1 replies
  • 4 kudos

Resolved! Autoloader clarification

Hi team,Good day! I would like to know how we can perform an incremental load using Autoloader.I am uploading one file to DBFS and writing it into a table. When I upload a similar file to the same directory, it does not perform an incremental load; i...

  • 2347 Views
  • 1 replies
  • 4 kudos
Latest Reply
boitumelodikoko
Databricks Partner
  • 4 kudos

Hi @Kanna,Good day! Based on the issue you’re encountering, I believe the problem stems from missing deduplication or upsert logic in your current implementation. Here's an approach that combines the power of Databricks Autoloader and Delta Lake to h...

  • 4 kudos
harlemmuniz
by New Contributor II
  • 4880 Views
  • 8 replies
  • 1 kudos

Issue with Job Versioning with “Run Job” tasks and Deployments between envinronments

Hello,I am writing to bring to your attention an issue that we have encountered while working with Databricks and seek your assistance in resolving it.When running a Job of Workflow with the task "Run Job" and clicking on "View YAML/JSON," we have ob...

  • 4880 Views
  • 8 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 1 kudos

Hi , Sorry if I don't understand your usecase, are your trying to start/stop databricks job via terraform? for this reason do you want to harcode job-id??

  • 1 kudos
7 More Replies
Kumar4567
by New Contributor II
  • 8232 Views
  • 4 replies
  • 0 kudos

disable downloading files for specific group of users ?

I see we can disable/enable download button for entire workspace using download button for notebook results.is there a way to disable/enable this just for specific group of users ?

  • 8232 Views
  • 4 replies
  • 0 kudos
Latest Reply
_anonymous
New Contributor II
  • 0 kudos

To future adventurers, the feature described by responder to OP does not exist.

  • 0 kudos
3 More Replies
MauricioS
by Databricks Partner
  • 2210 Views
  • 3 replies
  • 2 kudos

Delta Live Tables - Dynamic Target Schema

Hi all,I have a requirement where I need to migrate a few jobs from standard databricks notebooks that are orchestrated by Azure Data Factory to DLT Pipelines, pretty straight forward so far. The tricky part is that the data tables in the catalog are...

image.png
  • 2210 Views
  • 3 replies
  • 2 kudos
Latest Reply
fmadeiro
Contributor II
  • 2 kudos

@MauricioS Great question!Databricks Delta Live Tables (DLT) pipelines are very flexible, but by default, the target schema specified in the pipeline configuration (such as target or schema) is fixed. That said, you can implement strategies to enable...

  • 2 kudos
2 More Replies
Jfoxyyc
by Valued Contributor
  • 7740 Views
  • 6 replies
  • 2 kudos

Is there a way to catch the cancel button or the interrupt button in a Databricks notebook?

I'm running oracledb package and it uses sessions. When you cancel a running query it doesn't close the session even if you have a try catch block because a cancel or interrupt issues a kill command on the process. Is there a method to catch the canc...

  • 7740 Views
  • 6 replies
  • 2 kudos
Latest Reply
gustavo_woiler
New Contributor II
  • 2 kudos

I was having the same issue and I think I was finally able to solve it!When you simply except and capture the KeyboardInterrupt signal and do not raise it, the notebook gets into an endless cycle of "interrupting..." and never does anything.However, ...

  • 2 kudos
5 More Replies
pranitha
by New Contributor II
  • 1232 Views
  • 3 replies
  • 0 kudos

instance_id in compute.node_timelines

I am trying to fetch active worker nodes from system tables using the code like below:select count(distinct instance_id)from system.compute.node_timelines where cluster_id = "xx"groupy by instance_id,start_time,end_timesIt gives an output like 20 but...

  • 1232 Views
  • 3 replies
  • 0 kudos
Latest Reply
pranitha
New Contributor II
  • 0 kudos

Hi @Alberto_Umana , Thanks for replying.Even if we add the driver node it should be around 16-17 right, not like 20. I checked for al the clusters, for every cluster there is a difference of 5-7 nodes between max_worker count and count(distinct insta...

  • 0 kudos
2 More Replies
Labels