cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sriram_kumar
by New Contributor II
  • 3541 Views
  • 4 replies
  • 5 kudos

To do Optimization on the real time delta table

Hi Team,We have few prod tables which are created in s3 bucket, that have grown now very large, these tables are getting real time data continuously from round the clock databricks workflows; we would like run the optimization commands(optimize, zord...

  • 3541 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Sriram Kumar​ We haven't heard from you since the last response from @Suteja Kanuri​ â€‹ . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 5 kudos
3 More Replies
jole3112
by New Contributor III
  • 14452 Views
  • 7 replies
  • 9 kudos

virtual environment on azure databricks compute cluster

I'm using Azure Databricks and I'd like to create a project virtual environment, persisted on a shared compute cluster. As the cluster is shared for many projects, it is necessary to have virtual environments if I want to execute code runs from withi...

  • 14452 Views
  • 7 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hi @Joshua L​ We haven't heard from you since the last response from @Debayan Mukherjee​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ot...

  • 9 kudos
6 More Replies
Matt1209
by New Contributor II
  • 1812 Views
  • 1 replies
  • 3 kudos

How to execute requests later for a number of times that exceeds the Maximum concurrent runs?

I am trying to start the same Jobs multiple times using the python sdk's "run_now" command.If the number of requests exceeds the Maximum concurrent runs, the status of the run will be Skipped and the run will not be executed.Is there any way to queue...

  • 1812 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi, We do have a private preview feature which will be enabled shortly for queueing. Please tag me (@Debayan Mukherjee​ ) with your next update so that I will get notified.

  • 3 kudos
sevvalmehder
by New Contributor II
  • 3437 Views
  • 3 replies
  • 3 kudos

Databricks run-time 12.2 LTS drop function problem

I am getting an error about the `drop function of pyspark` at a cluster using 12.2 LTS. When I check the error I see spark solved that bug, see SPARK-42444. Also when I check maintenance updates page, I saw this solved issue included the Databricks R...

image.png
  • 3437 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Sevval Mehder​ Elevate our community by acknowledging exceptional contributions. Your participation in marking the best answers is a testament to our collective pursuit of knowledge.

  • 3 kudos
2 More Replies
RamdasP
by New Contributor
  • 2416 Views
  • 2 replies
  • 3 kudos

Resolved! Implement & Test DR Plan

Hi,Can you direct me to any documentation on how to implement and test Disaster Recovery for Databricks (PAAS) on Azure ?Thx & RgdsRamdas

  • 2416 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ramdas Panicher​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 3 kudos
1 More Replies
ayush1900
by New Contributor II
  • 1967 Views
  • 1 replies
  • 1 kudos
  • 1967 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Ayush Raj​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 1 kudos
reachbharathan
by New Contributor III
  • 5069 Views
  • 3 replies
  • 4 kudos

Resolved! How to checkout specific commit version via databricks UI

I have integrated gitlab with my azure databricks repo, I am able to push and pull commits from the databricks UI, I want to checkout to a specific commit version via databricks UI.Note: I am aware that via the gitlab i have checkout to specific vers...

  • 5069 Views
  • 3 replies
  • 4 kudos
Latest Reply
reachbharathan
New Contributor III
  • 4 kudos

After getting more context on databricks repo in details,Currently databricks doesn't support checkout of repo to specific commit.databricks provides only limited functionality mentioned belowAdd a repo and connect remotely laterClone a repo connecte...

  • 4 kudos
2 More Replies
fhmessas
by New Contributor II
  • 2958 Views
  • 2 replies
  • 2 kudos

Trigger.AvailableNow getting stuck when there is no event

Hi, I have several streaming jobs, however one of them uses the Trigger.AvailableNow. The issue is that it gets stuck when there is no events or finishes ingesting all events. The expected behavior would be the job being shutdown.I've already checked...

Stuck streaming
  • 2958 Views
  • 2 replies
  • 2 kudos
Latest Reply
fhmessas
New Contributor II
  • 2 kudos

Hi, the source is an S3 bucket using file notification with SQS.No errors or warns in the logs, the AvailableNow trigger just gets stuck.

  • 2 kudos
1 More Replies
andrew0117
by Contributor
  • 2532 Views
  • 1 replies
  • 0 kudos

what is best practice to handle the concurrency issue in batch processing?

Normally, our ELT framework takes in batches one by one and loads the data into target tables. But if more than one batches come in at the same time, the framework will break due to the concurrency issue that multiple sources are trying to write the ...

  • 2532 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

you can partition you table to avoid the changes of getting this exception.

  • 0 kudos
jwu1
by Databricks Employee
  • 1644 Views
  • 1 replies
  • 3 kudos

www.databricks.com

Attention Community! For a limited period, we are offering a generous 50% discount on training at the Data + AI Summit. Simply apply the code FLS4vop5ep during the registration process. Hurry, though, as this offer will expire on June 12, 2023. Don'...

  • 1644 Views
  • 1 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Thank you for sharing this @Juliet Wu​!!!

  • 3 kudos
Sas
by New Contributor II
  • 2919 Views
  • 1 replies
  • 0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

  • 2919 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

  • 0 kudos
Qwetroman
by New Contributor
  • 2572 Views
  • 1 replies
  • 0 kudos

AutoML runs fail after 5 seconds

Hi everyoneI am exploring automl, and I met a strange problem - after I launch a classification experiment on my personal newly created cluster (screenshot attached) it successfully performs data exploration, but after that, all runs fail after appro...

  • 2572 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 0 kudos

Hi Qwetroman,we can see the following error message in the notebook - ExecutionTimeoutError: Execution timed out before any trials could be successfully run. Please increase the timeout for AutoML to run some trials.What's the size of the dataset? St...

  • 0 kudos
Nikhil3107
by New Contributor III
  • 2863 Views
  • 1 replies
  • 2 kudos

Deploy model to AWS Sagemaker: ModuleNotFoundError: No module named 'docker'

Greetings, When trying to run the following command: %sh mlflow sagemaker build-and-push-containerI get the following error:/databricks/python3/lib/python3.9/site-packages/click/core.py:2309: UserWarning: Virtualenv support is still experimental and ...

  • 2863 Views
  • 1 replies
  • 2 kudos
BenLambert
by Contributor
  • 3075 Views
  • 2 replies
  • 2 kudos

Table Refresh UI Error

Within the UI it is possible to "Select tables for refresh" for a specific Delta Live Tables Workflow. I often use it to make a full refresh on smaller tables during development. Unfortunately, when an error occurs during the full refresh on selected...

  • 3075 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Could you please share the full error stack trace? it will help us to narrow down the issue

  • 2 kudos
1 More Replies
Mado
by Valued Contributor II
  • 3434 Views
  • 1 replies
  • 1 kudos

How to set timezone for SQL Warehouse?

Hi, I want to change the default time zone for SQL Warehoue in the SQL Persona. When I try to Edit the SQL warehouse settings in the "SQL Warehouses" section, I am not able to find any setting where I can set the time zone. I am aware that I can set ...

  • 3434 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mado
Valued Contributor II
  • 1 kudos

Thanks. I am aware of the SET TIME ZONE command but I need to run this command every time I start the SQL warehouse. I am looking for a way to change the default time zone of the SQL warehouse. Something like "spark.sql.session.timeZone GMT+10" that ...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels