cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nk76
by New Contributor III
  • 10389 Views
  • 7 replies
  • 5 kudos

Resolved! Custom library import fails randomly with error: not found: value it

Hello,I have an issue with the import of a custom library, in Azure Databricks.(roughly) 95% of the times it works fine, but sometimes it fails.I searched the internet and this community with no luck, so far.It is a scala library in a scala notebook,...

  • 10389 Views
  • 7 replies
  • 5 kudos
Latest Reply
Naskar
New Contributor II
  • 5 kudos

Even I also encountered the same error. While Importing a file getting an error as "Import failed with error: Could not deserialize: Exceeded 16777216 bytes (current = 16778609)"

  • 5 kudos
6 More Replies
Chris_Konsur
by New Contributor III
  • 3773 Views
  • 2 replies
  • 0 kudos

Resolved! Configure Autoloader with the file notification mode for production

I configured ADLS Gen2 standard storage and successfully configured Autoloader with the file notification mode.In this documenthttps://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html"ADLS Gen2 provides different event notificati...

  • 3773 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

Hi, @Chris Konsur​. You do not need anything with the FlushWithClose event REST API that is just the event type that we listen to. As for backfill setting, this is for handling late data or late event that are being triggered. This setting largely de...

  • 0 kudos
1 More Replies
899572
by New Contributor II
  • 2542 Views
  • 4 replies
  • 1 kudos

"approxQuantile" not working as part of a delta live table workflow pipeline.

I am trying to compute outliers using approxQuantile on a Dataframe. It works fine in a Databricks notebook, but the call doesn't work correctly when it is part of a delta live table pipeline. (This is python.) Here is the line that isn't working a...

  • 2542 Views
  • 4 replies
  • 1 kudos
Latest Reply
899572
New Contributor II
  • 1 kudos

Good recommendation. I was able to do something similar that appears to work.# Step 2. Compute the is outlier sessions based on duration_minutes lc = session_agg_df.selectExpr("percentile(duration_minutes, 0.25) lower_quartile") session_agg_d...

  • 1 kudos
3 More Replies
mattjones
by Databricks Employee
  • 914 Views
  • 0 replies
  • 0 kudos

www.meetup.com

DEC 13 MEETUP: Arbitrary Stateful Stream Processing in PySparkFor folks in the Bay Area- Dr. Karthik Ramasamy, Databricks' Head of Streaming, will be joined by engineering experts on the streaming and PySpark teams at Databricks for this in-person me...

  • 914 Views
  • 0 replies
  • 0 kudos
Sujitha
by Databricks Employee
  • 2442 Views
  • 3 replies
  • 5 kudos

Weekly Release Notes Recap�� Here’s a quick recap of the latest release notes updates from the past one week. Databricks platform release notes...

Weekly Release Notes Recap Here’s a quick recap of the latest release notes updates from the past one week.Databricks platform release notes December 1-6, 2022Partner Connect supports connecting to AtScale:You can now easily create a connection betwe...

  • 2442 Views
  • 3 replies
  • 5 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 5 kudos

@Uma Maheswara Rao Desula​ if i am not wrong, below ideas portal should help you Ideas Portal | Databricks on AWS

  • 5 kudos
2 More Replies
karthik_p
by Esteemed Contributor
  • 6133 Views
  • 4 replies
  • 7 kudos

How to properly convert DUB's consumed into Doller Amount in Databricks AWS/GCP/AZURE

HI Team,I have gone through Lot of articles, but it looks there is some gap on pricing. can anyone please let me know accurate way to calculate DBU Pricing into dollarsas per my understandingTotal DBU Cost: DBU /hour * total job ran in hours (Shows a...

  • 6133 Views
  • 4 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

DBU is per VM, and every VM has a different DBU price

  • 7 kudos
3 More Replies
Christine
by Contributor II
  • 15233 Views
  • 1 replies
  • 2 kudos

ADD COLUMN IF NOT EXISTS does not recognize "IF NOT EXIST". How do I add a column to an existing delta table with SQL if the column does not already exist?

How do I add a column to an existing delta table with SQL if the column does not already exist?I am using the following code: <%sqlALTER TABLE table_name ADD COLUMN IF NOT EXISTS column_name type; >but it prints the error: <[PARSE_SYNTAX_ERROR] Synta...

  • 15233 Views
  • 1 replies
  • 2 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 2 kudos

Hi @Christine Pedersen​ I guess IF NOT EXISTS or IF EXISTS can be used in conjunction with DROP or PARTITIONS according to the documentation. If you want to do this the same checking way, you can do using a try catch block in pyspark or as per your l...

  • 2 kudos
Ovi
by New Contributor III
  • 4176 Views
  • 4 replies
  • 9 kudos

Construct Dataframe or RDD from S3 bucket with Delta tables

Hi all! I have an S3 bucket with Delta parquet files/folders with different schemas each. I need to create an RDD or DataFrame from all those Delta Tables that should contain the path, name and different schema of each.How could I do that?Thank you!P...

  • 4176 Views
  • 4 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

You can mount S3 bucket or read directly from it.access_key = dbutils.secrets.get(scope = "aws", key = "aws-access-key") secret_key = dbutils.secrets.get(scope = "aws", key = "aws-secret-key") sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", ac...

  • 9 kudos
3 More Replies
aaronpetry
by New Contributor III
  • 7125 Views
  • 2 replies
  • 3 kudos

%run not printing notebook output when using 'Run All' command

I have been using the %run command to run auxiliary notebooks from an "orchestration" notebook. I like using %run over dbutils.notebook.run because of the variable inheritance, troubleshooting ease, and the printing of the output from the auxiliary n...

  • 7125 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Aaron Petry​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
1 More Replies
Nayan7276
by Valued Contributor II
  • 4307 Views
  • 5 replies
  • 29 kudos

Resolved! databricks community

I have points in databricks community 461 but in reward store only reflecting 23 points can any one look into this issue

  • 4307 Views
  • 5 replies
  • 29 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 29 kudos

Hi rewards account needs to be created with same email id and points may take a week to reflect in your rewards account

  • 29 kudos
4 More Replies
isaac_gritz
by Databricks Employee
  • 3645 Views
  • 4 replies
  • 8 kudos

Databricks Runtime Support

How Long are Databricks runtimes supported for? How often are they updated?You can learn more about the Databricks runtime support lifecycle here (AWS | Azure | GCP).Long Term Support (LTS) runtimes are released every 6 months and supported for 2 yea...

  • 3645 Views
  • 4 replies
  • 8 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 8 kudos

Thanks for update

  • 8 kudos
3 More Replies
Saikrishna2
by New Contributor III
  • 8204 Views
  • 7 replies
  • 11 kudos

Data bricks SQL is allowing 10 queries only ?

•Power BI is a publisher that uses AD group authentication to publish result sets. Since the publisher's credentials are maintained, the same user can access the data bricks database.•Number of the users are retrieving the data from the power bi or i...

  • 8204 Views
  • 7 replies
  • 11 kudos
Latest Reply
VaibB
Contributor
  • 11 kudos

I believe 10 is a limit as of now. See if you can increase the concurrency limit from the source.

  • 11 kudos
6 More Replies
User16835756816
by Databricks Employee
  • 8000 Views
  • 4 replies
  • 11 kudos

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

Tip: These steps are built out for AWS accounts and workspaces that are using Delta Lake. If you would like to learn more watch this video and reach out to your Databricks sales representative for more information.Step 1: Create your own notebook or ...

  • 8000 Views
  • 4 replies
  • 11 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 11 kudos

Thanks @Nithya Thangaraj​ 

  • 11 kudos
3 More Replies
chhavibansal
by New Contributor III
  • 5630 Views
  • 4 replies
  • 1 kudos

ANALYZE TABLE showing NULLs for all statistics in Spark

var df2 = spark.read .format("csv") .option("sep", ",") .option("header", "true") .option("inferSchema", "true") .load("src/main/resources/datasets/titanic.csv")   df2.createOrReplaceTempView("titanic") spark.table("titanic").cach...

  • 5630 Views
  • 4 replies
  • 1 kudos
Latest Reply
chhavibansal
New Contributor III
  • 1 kudos

can you share what the *newtitanic* is I think that you would have done something similarspark.sql("create table newtitanic as select * from titanic")something like this works for me, but the issue is i first make a temp view then again create a tab...

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels