cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

andrew0117
by Contributor
  • 2309 Views
  • 1 replies
  • 0 kudos

what is best practice to handle the concurrency issue in batch processing?

Normally, our ELT framework takes in batches one by one and loads the data into target tables. But if more than one batches come in at the same time, the framework will break due to the concurrency issue that multiple sources are trying to write the ...

  • 2309 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

you can partition you table to avoid the changes of getting this exception.

  • 0 kudos
jwu1
by Databricks Employee
  • 1454 Views
  • 1 replies
  • 3 kudos

www.databricks.com

Attention Community! For a limited period, we are offering a generous 50% discount on training at the Data + AI Summit. Simply apply the code FLS4vop5ep during the registration process. Hurry, though, as this offer will expire on June 12, 2023. Don'...

  • 1454 Views
  • 1 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Thank you for sharing this @Juliet Wu​!!!

  • 3 kudos
Sas
by New Contributor II
  • 2683 Views
  • 1 replies
  • 0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

  • 2683 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

  • 0 kudos
Qwetroman
by New Contributor
  • 2328 Views
  • 1 replies
  • 0 kudos

AutoML runs fail after 5 seconds

Hi everyoneI am exploring automl, and I met a strange problem - after I launch a classification experiment on my personal newly created cluster (screenshot attached) it successfully performs data exploration, but after that, all runs fail after appro...

  • 2328 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 0 kudos

Hi Qwetroman,we can see the following error message in the notebook - ExecutionTimeoutError: Execution timed out before any trials could be successfully run. Please increase the timeout for AutoML to run some trials.What's the size of the dataset? St...

  • 0 kudos
Nikhil3107
by New Contributor III
  • 2490 Views
  • 1 replies
  • 2 kudos

Deploy model to AWS Sagemaker: ModuleNotFoundError: No module named 'docker'

Greetings, When trying to run the following command: %sh mlflow sagemaker build-and-push-containerI get the following error:/databricks/python3/lib/python3.9/site-packages/click/core.py:2309: UserWarning: Virtualenv support is still experimental and ...

  • 2490 Views
  • 1 replies
  • 2 kudos
BenLambert
by Contributor
  • 2560 Views
  • 2 replies
  • 2 kudos

Table Refresh UI Error

Within the UI it is possible to "Select tables for refresh" for a specific Delta Live Tables Workflow. I often use it to make a full refresh on smaller tables during development. Unfortunately, when an error occurs during the full refresh on selected...

  • 2560 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Could you please share the full error stack trace? it will help us to narrow down the issue

  • 2 kudos
1 More Replies
Mado
by Valued Contributor II
  • 3157 Views
  • 1 replies
  • 1 kudos

How to set timezone for SQL Warehouse?

Hi, I want to change the default time zone for SQL Warehoue in the SQL Persona. When I try to Edit the SQL warehouse settings in the "SQL Warehouses" section, I am not able to find any setting where I can set the time zone. I am aware that I can set ...

  • 3157 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mado
Valued Contributor II
  • 1 kudos

Thanks. I am aware of the SET TIME ZONE command but I need to run this command every time I start the SQL warehouse. I am looking for a way to change the default time zone of the SQL warehouse. Something like "spark.sql.session.timeZone GMT+10" that ...

  • 1 kudos
iptkrisna
by New Contributor III
  • 2676 Views
  • 2 replies
  • 4 kudos

Error while rendering UI editor

Hi, does anyone facing an issue related to error while rendering editor on databricks notebook? it seems like this

Screenshot 2023-05-09 at 9.53.48 AM
  • 2676 Views
  • 2 replies
  • 4 kudos
Latest Reply
Debayan
Databricks Employee
  • 4 kudos

Hi, This looks like a browser issue. Could you please try it with some other browser? Or clear the cookies and caches of the same browser and confirm? Please tag @Debayan​ with your next comment so that I will get notified. Thank you!

  • 4 kudos
1 More Replies
Rishabh_T
by New Contributor III
  • 5429 Views
  • 4 replies
  • 5 kudos

Resolved! DLT pipeline is unable to process struct with hyphen in nested column name

Hello,I have some nested columns with hyphen i.e. sample-1 in struct column, recently DLT pipeline has started throwing synatx error. Before May 24, 2023, this was working fine.Is this a new bug in May 2023 release?After clearing table and table's da...

Error
  • 5429 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Rishabh Tomar​ We haven't heard from you since the last response from @Kaniz Fatma​  . Kindly share the information with us, and in return, we will provide you with the necessary solution. Thanks and Regards

  • 5 kudos
3 More Replies
Dinu2
by New Contributor III
  • 7373 Views
  • 7 replies
  • 5 kudos

Timestamp in databricks are getting converted to different timezone

Timestamp columns which are extracted from source databases using jdbc read are getting converted to different timezone and is not matching with source timestamp. Could anyone suggest how can we get same timestamp data like source data?

  • 7373 Views
  • 7 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Dinu Sukumara​ We haven't heard from you since the last response from @Werner Stinckens​ â€‹ . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 5 kudos
6 More Replies
Anonymous
by Not applicable
  • 3040 Views
  • 4 replies
  • 2 kudos

 Dear Community,I want to understand from you all - How do you debug your codes when using Databricks? Have you tried the Variable Explorer of Databr...

 Dear Community,I want to understand from you all - How do you debug your codes when using Databricks?Have you tried the Variable Explorer of Databricks? This allows the users to view at-a-glance all the variables defined in their notebooks, inspect...

Discussions
  • 3040 Views
  • 4 replies
  • 2 kudos
Latest Reply
etsyal1e2r3
Honored Contributor
  • 2 kudos

I just create code in notebooks that allow me to check outputs at different steps. These methods usually include print statements or .display() of dataframes. If youre working with lots of data the .show(truncate=100,vertical=True) may help you. I ha...

  • 2 kudos
3 More Replies
CloudBull
by New Contributor
  • 4001 Views
  • 3 replies
  • 2 kudos
  • 4001 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Gerard Blackburns​ :Calculating the cost or billing for Azure SQL Server instances involves considering the Azure SQL Database Unit (DBU) pricing model. DBUs are the unit of measure for the consumption of Azure SQL Database resources. To calculate t...

  • 2 kudos
2 More Replies
js54123875
by New Contributor III
  • 4570 Views
  • 3 replies
  • 3 kudos

Setup for Unity Catalog, autoloader, three-level namespace, SCD2

I am trying to setup delta live tables pipelines to ingest data to bronze and silver tables. Bronze and Silver are separate schema. This will be triggered by a daily job. It appears to run fine when set as continuous, but fails when triggered.Table...

  • 4570 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Jennette Shepard​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 3 kudos
2 More Replies
akshay_patni228
by New Contributor II
  • 9722 Views
  • 2 replies
  • 3 kudos

Missing Credential Scope - Unable to call databrick(Scala) notebook from ADF

Hi Team ,I am using job cluster while setting Linked Service in ADF to call Data bricks Notebook activity .Cluster Detail - Policy - UnrestrictedAccess Mode - Single userUnity Catalog Enabled.databrick run time - 12.2 LTS (includes Apache Spark 3.3.2...

  • 9722 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Akshay Patni​ We haven't heard from you since the last response from @Debayan Mukherjee​ â€‹ . Kindly share the information with us, and in return, we will provide you with the necessary solution. Thanks and Regards

  • 3 kudos
1 More Replies
Michael42
by New Contributor III
  • 19624 Views
  • 4 replies
  • 7 kudos

Resolved! Want to load a high volume of CSV rows in the fastest way possible (in excess of 5 billion rows). I want the best approach, in terms of speed, for loading into the bronze table.

My source can only deliver CSV format (pipe delimited).My source has the ability to generate multiple CSV files and transfer them to a single upload folder.All rows must go to the same target bronze delta table.I do not care about the order in which ...

  • 19624 Views
  • 4 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Michael Popp​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 7 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels