cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kanna1706
by New Contributor III
  • 3315 Views
  • 3 replies
  • 0 kudos

DBFS option

I can't find dbfs option in my free data bricks community edition when I tried to see location of the table.

  • 3315 Views
  • 3 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

It's fixed. You can continue to use Upload.

  • 0 kudos
2 More Replies
Maverick11
by New Contributor
  • 535 Views
  • 1 replies
  • 0 kudos

FROM databricksruntime/standard:11.3-LTS as application

Below command was working until yesterday for 11.3-LTS  base image. It started failing from today (17Oct 2024 11AM IST)FROM databricksruntime/standard:11.3-LTS as applicationRUN apt-get install -y python3-venv It throws the error :Reading package lis...

  • 535 Views
  • 1 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Any specific reason you are trying to containerize this? Can you try adding an apt-get update before python3-venv RUN apt-get update && \          apt-get install -y python3-venv

  • 0 kudos
GodSpeed
by New Contributor
  • 626 Views
  • 1 replies
  • 0 kudos

Jenkins Alternatives for Data Pipeline Automation?

I’ve been managing data pipelines with Jenkins but would like to explore other options that might work better for data-centric projects. Has anyone tried GitLab CI or Azure DevOps for similar use cases? Looking for insights into what might offer bett...

  • 626 Views
  • 1 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

You can use Github Actions or Azure DevOps as an alternative.I use Azure DevOps Pipelines for my projects and have had very good experiences. You can find instructions on how to do the whole thing with Github Actions for example, in the official docu...

  • 0 kudos
Volker
by Contributor
  • 2859 Views
  • 1 replies
  • 0 kudos

Structured Streaming schemaTrackingLocation does not work with starting_version

Hello Community,I came across a strange behviour when using structured streaming on top of a delta table. I have a stream that I wanted to start from a specific version  of a delta table using the option option("starting_version", x) because I did no...

Data Engineering
Delta Lake
schemaTrackingLocation
starting_version
structured streaming
  • 2859 Views
  • 1 replies
  • 0 kudos
Latest Reply
Volker
Contributor
  • 0 kudos

I found that it actually is not related to specifying the starting_version.I think I found the flaw in the flow how the schema is updated in the schemaTrackingLocation:On the first readStream operation the _schema_log_... gets createdOn the first wri...

  • 0 kudos
MihaiTache
by New Contributor II
  • 1672 Views
  • 4 replies
  • 1 kudos

Resolved! Get job run_id of run_job_task in an orchestration job

Hi,I have a Databricks job which orchestrates the run of two jobs: job1 and job2 using run_job_task.job2 depends on job1 and would need to use the run_id of job1 as a parameter. How can this be done?I see that you can only easily access the task run ...

  • 1672 Views
  • 4 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

@MihaiTache You can achieve this by utilizing a combination of dbutils.widgets.get and dbutils.jobs.taskValues.set. The approach involves extracting the run_id from Job1 and passing it as a value to Job2 using taskValues.set. This allows seamless com...

  • 1 kudos
3 More Replies
tj-cycyota
by Databricks Employee
  • 3323 Views
  • 4 replies
  • 1 kudos

Can you use the Databricks API from a notebook?

I want to test out different APIs directly from a Databricks notebook instead of using Postman or CURL. Is this possible?

  • 3323 Views
  • 4 replies
  • 1 kudos
Latest Reply
Boris2
New Contributor II
  • 1 kudos

@Panda There is no REST API for databricks. "RE" in REST stands for Ready Everywhere. You cannot connect to the API in workspace 1, from a notebook in workspace 2. Therefor it is Not Ready Everywhere. Workspace 1 cannot resolve the hostname for Works...

  • 1 kudos
3 More Replies
sashikanth
by New Contributor II
  • 1186 Views
  • 2 replies
  • 1 kudos

Job optimization

How to increase the resource efficiency in databricks jobs?We see that idle cost is more than the utilization cost. Any guidelines will be helpful Please share some examples.

  • 1186 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

My main improvements are:- use singlenode job clusters for small data- cluster reuse (so use the same job cluster for multiple tasks, in parallel or serial)- use autoscaling only when it is very hard to find a good fixed sizing, otherwise go for fixe...

  • 1 kudos
1 More Replies
sandeephenkel23
by New Contributor III
  • 1327 Views
  • 1 replies
  • 1 kudos

StringIndexer method fails with shared compute

Dear TeamStringIndexer method of mlflow library upon running code on No Isolation Shared access mode data bricks cluster it works but it is failing on Unity catalog enabled data bricks cluster having Shared access mode. here is the library name: from...

  • 1327 Views
  • 1 replies
  • 1 kudos
Latest Reply
shashank853
Databricks Employee
  • 1 kudos

Hi, The issue you're encountering with the StringIndexer method from the MLflow library failing on a Unity Catalog-enabled Databricks cluster with Shared access mode is likely due to the limitations associated with Shared access mode in Unity Catalog...

  • 1 kudos
User16752245312
by Databricks Employee
  • 22858 Views
  • 3 replies
  • 3 kudos

How can I make Databricks API calls from notebook?

Access to Databricks APIs require the user to authenticate. This usually means creating a PAT (Personal Access Token) token. Conveniently, a token is readily available to you when you are using a Databricks notebook.databricksURL = dbutils.notebook....

  • 22858 Views
  • 3 replies
  • 3 kudos
Latest Reply
Panda
Valued Contributor
  • 3 kudos

@User16752245312  You can use Databricks Secret Scope to manage sensitive data such as personal access tokens (PATs) securely. Storing your token in a secret scope ensures you don’t hard-code credentials in your notebook, making it more secure.For mo...

  • 3 kudos
2 More Replies
slakshmanan
by New Contributor III
  • 609 Views
  • 1 replies
  • 0 kudos

how to get access_token from REST API without a user password

using rest api /oauth2/token how do i get access_token programmatically

  • 609 Views
  • 1 replies
  • 0 kudos
Latest Reply
LauJohansson
Contributor
  • 0 kudos

Have you read this blogpost?https://medium.com/@wernkai95/generate-azure-databricks-workspace-personal-access-token-for-azure-service-principal-8dff72d045c6Also refer to the API docs: https://docs.databricks.com/api/azure/workspace/tokens/create 

  • 0 kudos
jonathanjone
by New Contributor
  • 937 Views
  • 1 replies
  • 0 kudos

Large Dataset Processing: How AI-Powered PCs Measure Up Against Cloud Solutions

Hey community,I’ve been diving deep into AI-powered PCs and their growing capabilities, particularly when it comes to processing large datasets. As cloud solutions like AWS, Google Cloud, and Azure have been the go-to for scaling data-heavy tasks, I’...

  • 937 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

My answer to the same question in another topic:https://community.databricks.com/t5/data-engineering/how-does-an-ai-powered-pc-handle-large-datasets-compared-to/m-p/91889#M38291

  • 0 kudos
nadia
by New Contributor II
  • 28399 Views
  • 4 replies
  • 2 kudos

Resolved! Executor heartbeat timed out

Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in sta...

  • 28399 Views
  • 4 replies
  • 2 kudos
Latest Reply
SparkJun
Databricks Employee
  • 2 kudos

Please also review the Spark UI to see the failed Spark job and Spark stage. Please check on the GC time and data spill to memory and disk. See if there is any error in the failed task in the Spark stage view. This will confirm data skew or GC/memory...

  • 2 kudos
3 More Replies
amitprasad01
by New Contributor II
  • 931 Views
  • 4 replies
  • 2 kudos

I want to consolidate the delta tables historical versions and write to another table in append mode

Hi Team, I have a table let's say, employee and it has 5 versions .version 0 and 1 has column A, column B, from version 2 column A has been changed to column C and column B to column D. I want to ingest version by version without manual intervention....

  • 931 Views
  • 4 replies
  • 2 kudos
Latest Reply
uday_satapathy
Databricks Employee
  • 2 kudos

Have you checked out deltaTable.cloneAtVersion()?

  • 2 kudos
3 More Replies
demo0404
by New Contributor II
  • 707 Views
  • 1 replies
  • 1 kudos

Upload files to DBFS

  Hello, since yesterday I cannot upload files to DBFS, only the S3 option appears, I am a little desperate with this, because I have to teach some courses and the tool does not work for me, is there a way to upload the csv files? Thank you!

  • 707 Views
  • 1 replies
  • 1 kudos
Latest Reply
gchandra
Databricks Employee
  • 1 kudos

https://community.databricks.com/t5/data-engineering/suddenly-can-t-find-the-option-to-uplaod-files-into-databricks/m-p/94359/highlight/true#M38884

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels