cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Sreekanth1
by New Contributor II
  • 1110 Views
  • 2 replies
  • 0 kudos

How to pass job task parameters to another task in scala

Hi Team,​I have a requirement in workflow job. Job has two tasks, one is python-task and another one is scala-task (both are running their own cluster).​I have defined dbutils.job.taskValue in python which is not able to read value in scala because o...

  • 1110 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Sreekanth Nallapa​ please refer this link This might help you in this

  • 0 kudos
1 More Replies
ridrasura
by New Contributor III
  • 2022 Views
  • 1 replies
  • 5 kudos

Optimal Batch Size for Batch Insert Queries using JDBC for Delta Tables

Hi,I am currently experimenting with databricks-jdbc : 2.6.29 and trying to execute batch insert queries What is the optimal batch size recommended by Databricks for performing Batch Insert queries?Currently it seems that values are inserted row by r...

  • 2022 Views
  • 1 replies
  • 5 kudos
Latest Reply
ridrasura
New Contributor III
  • 5 kudos

Just an observation : By using auto optimize table level property, I was able to see batch inserts inserting records in single file.https://docs.databricks.com/optimizations/auto-optimize.html

  • 5 kudos
BkP
by Contributor
  • 5895 Views
  • 15 replies
  • 9 kudos

Suggestion Needed for a Orchestrator/Scheduler to schedule and execute Jobs in an automated way

Hello Friends,We have an application which extracts dat from various tables in Azure Databricks and we extract it to postgres tables (postgres installed on top of Azure VMs). After extraction we apply transformation on those datasets in postgres tabl...

image
  • 5895 Views
  • 15 replies
  • 9 kudos
Latest Reply
VaibB
Contributor
  • 9 kudos

You can leverage Airflow, which provides a connector for databricks jobs API, or can use databricks workflow to orchestrate your jobs where you can define several tasks and set dependencies accordingly.

  • 9 kudos
14 More Replies
nk76
by New Contributor III
  • 6344 Views
  • 11 replies
  • 5 kudos

Resolved! Custom library import fails randomly with error: not found: value it

Hello,I have an issue with the import of a custom library, in Azure Databricks.(roughly) 95% of the times it works fine, but sometimes it fails.I searched the internet and this community with no luck, so far.It is a scala library in a scala notebook,...

  • 6344 Views
  • 11 replies
  • 5 kudos
Latest Reply
Naskar
New Contributor II
  • 5 kudos

Even I also encountered the same error. While Importing a file getting an error as "Import failed with error: Could not deserialize: Exceeded 16777216 bytes (current = 16778609)"

  • 5 kudos
10 More Replies
Chris_Konsur
by New Contributor III
  • 2364 Views
  • 3 replies
  • 1 kudos

Resolved! Configure Autoloader with the file notification mode for production

I configured ADLS Gen2 standard storage and successfully configured Autoloader with the file notification mode.In this documenthttps://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html"ADLS Gen2 provides different event notificati...

  • 2364 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

Hi, @Chris Konsur​. You do not need anything with the FlushWithClose event REST API that is just the event type that we listen to. As for backfill setting, this is for handling late data or late event that are being triggered. This setting largely de...

  • 1 kudos
2 More Replies
899572
by New Contributor II
  • 1571 Views
  • 4 replies
  • 1 kudos

"approxQuantile" not working as part of a delta live table workflow pipeline.

I am trying to compute outliers using approxQuantile on a Dataframe. It works fine in a Databricks notebook, but the call doesn't work correctly when it is part of a delta live table pipeline. (This is python.) Here is the line that isn't working a...

  • 1571 Views
  • 4 replies
  • 1 kudos
Latest Reply
899572
New Contributor II
  • 1 kudos

Good recommendation. I was able to do something similar that appears to work.# Step 2. Compute the is outlier sessions based on duration_minutes lc = session_agg_df.selectExpr("percentile(duration_minutes, 0.25) lower_quartile") session_agg_d...

  • 1 kudos
3 More Replies
mattjones
by New Contributor II
  • 482 Views
  • 0 replies
  • 0 kudos

www.meetup.com

DEC 13 MEETUP: Arbitrary Stateful Stream Processing in PySparkFor folks in the Bay Area- Dr. Karthik Ramasamy, Databricks' Head of Streaming, will be joined by engineering experts on the streaming and PySpark teams at Databricks for this in-person me...

  • 482 Views
  • 0 replies
  • 0 kudos
karthik_p
by Esteemed Contributor
  • 2965 Views
  • 5 replies
  • 7 kudos

How to properly convert DUB's consumed into Doller Amount in Databricks AWS/GCP/AZURE

HI Team,I have gone through Lot of articles, but it looks there is some gap on pricing. can anyone please let me know accurate way to calculate DBU Pricing into dollarsas per my understandingTotal DBU Cost: DBU /hour * total job ran in hours (Shows a...

  • 2965 Views
  • 5 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

DBU is per VM, and every VM has a different DBU price

  • 7 kudos
4 More Replies
Christine
by Contributor II
  • 8577 Views
  • 1 replies
  • 2 kudos

ADD COLUMN IF NOT EXISTS does not recognize "IF NOT EXIST". How do I add a column to an existing delta table with SQL if the column does not already exist?

How do I add a column to an existing delta table with SQL if the column does not already exist?I am using the following code: <%sqlALTER TABLE table_name ADD COLUMN IF NOT EXISTS column_name type; >but it prints the error: <[PARSE_SYNTAX_ERROR] Synta...

  • 8577 Views
  • 1 replies
  • 2 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 2 kudos

Hi @Christine Pedersen​ I guess IF NOT EXISTS or IF EXISTS can be used in conjunction with DROP or PARTITIONS according to the documentation. If you want to do this the same checking way, you can do using a try catch block in pyspark or as per your l...

  • 2 kudos
Ovi
by New Contributor III
  • 2340 Views
  • 5 replies
  • 10 kudos

Construct Dataframe or RDD from S3 bucket with Delta tables

Hi all! I have an S3 bucket with Delta parquet files/folders with different schemas each. I need to create an RDD or DataFrame from all those Delta Tables that should contain the path, name and different schema of each.How could I do that?Thank you!P...

  • 2340 Views
  • 5 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

You can mount S3 bucket or read directly from it.access_key = dbutils.secrets.get(scope = "aws", key = "aws-access-key") secret_key = dbutils.secrets.get(scope = "aws", key = "aws-secret-key") sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", ac...

  • 10 kudos
4 More Replies
aaronpetry
by New Contributor III
  • 2907 Views
  • 2 replies
  • 3 kudos

%run not printing notebook output when using 'Run All' command

I have been using the %run command to run auxiliary notebooks from an "orchestration" notebook. I like using %run over dbutils.notebook.run because of the variable inheritance, troubleshooting ease, and the printing of the output from the auxiliary n...

  • 2907 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Aaron Petry​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
1 More Replies
Nayan7276
by Valued Contributor II
  • 2377 Views
  • 5 replies
  • 29 kudos

Resolved! databricks community

I have points in databricks community 461 but in reward store only reflecting 23 points can any one look into this issue

  • 2377 Views
  • 5 replies
  • 29 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 29 kudos

Hi rewards account needs to be created with same email id and points may take a week to reflect in your rewards account

  • 29 kudos
4 More Replies
isaac_gritz
by Valued Contributor II
  • 1966 Views
  • 4 replies
  • 8 kudos

Databricks Runtime Support

How Long are Databricks runtimes supported for? How often are they updated?You can learn more about the Databricks runtime support lifecycle here (AWS | Azure | GCP).Long Term Support (LTS) runtimes are released every 6 months and supported for 2 yea...

  • 1966 Views
  • 4 replies
  • 8 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 8 kudos

Thanks for update

  • 8 kudos
3 More Replies
Saikrishna2
by New Contributor III
  • 4989 Views
  • 7 replies
  • 11 kudos

Data bricks SQL is allowing 10 queries only ?

•Power BI is a publisher that uses AD group authentication to publish result sets. Since the publisher's credentials are maintained, the same user can access the data bricks database.•Number of the users are retrieving the data from the power bi or i...

  • 4989 Views
  • 7 replies
  • 11 kudos
Latest Reply
VaibB
Contributor
  • 11 kudos

I believe 10 is a limit as of now. See if you can increase the concurrency limit from the source.

  • 11 kudos
6 More Replies
User16835756816
by Valued Contributor
  • 3353 Views
  • 4 replies
  • 11 kudos

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

Tip: These steps are built out for AWS accounts and workspaces that are using Delta Lake. If you would like to learn more watch this video and reach out to your Databricks sales representative for more information.Step 1: Create your own notebook or ...

  • 3353 Views
  • 4 replies
  • 11 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 11 kudos

Thanks @Nithya Thangaraj​ 

  • 11 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels