cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nav
by New Contributor II
  • 7338 Views
  • 8 replies
  • 0 kudos

R packages not getting installed on cluster when creating cluster from dockerfile

I'm trying to use dockerfile to create a cluster which has Robyn (https://facebookexperimental.github.io/Robyn/) and other R libraries installed. But it is failing to install the R libraries to the cluster. When I run the container in interactive mod...

  • 7338 Views
  • 8 replies
  • 0 kudos
Latest Reply
workingtogetdbw
New Contributor II
  • 0 kudos

What there has been no answer here!  @Debayan Mukherjee​ @Vartika Nain​ So I am running into this same problem as the idea of having to wait 45 minutes for libraries to install is absolutely wild as well as I have done everything outside of working w...

  • 0 kudos
7 More Replies
Dave_Nithio
by Contributor II
  • 10567 Views
  • 1 replies
  • 3 kudos

Delta Live Table Schema Error

I'm using Delta Live Tables to load a set of csv files in a directory. I am pre-defining the schema to avoid issues with schema inference. This works with autoloader on a regular delta table, but is failing for Delta Live Tables. Below is an example ...

  • 10567 Views
  • 1 replies
  • 3 kudos
Latest Reply
shagun
New Contributor III
  • 3 kudos

i was facing similar issue in loading json files through autoloader for delta live tables.Was able to fix with this option .option("cloudFiles.inferColumnTypes", "True")From the docs "For formats that don’t encode data types (JSON and CSV), Auto Load...

  • 3 kudos
Kannan1206
by New Contributor II
  • 2832 Views
  • 4 replies
  • 0 kudos

Databricks Certification Exam Got Suspended. Need help in resolving the issue

Hi Team,I have taken online exam for Databricks Certified Associate Developer for Apache Spark 3.0 - Python on 21-May-2023 6:30 , In between the exam my session got suspended. by proctor eventhough I was in my seat and looking at camera . Again I cou...

  • 2832 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kannan1206
New Contributor II
  • 0 kudos

Hi @Vidula Khanna​ , I got the relevant details from the team , was able to complete the certification as well . Thanks for help .

  • 0 kudos
3 More Replies
sindh
by New Contributor II
  • 2695 Views
  • 3 replies
  • 0 kudos

session suspended , for the databricks exam , how to restart it.

session suspended , please enable launch option

  • 2695 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @sindhu goyal​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

  • 0 kudos
2 More Replies
Enzo_Bahrami
by New Contributor III
  • 4802 Views
  • 2 replies
  • 0 kudos

Resolved! Input File Path from Autoloader in Delta Live Tables

Hello everyone!I was wondering if there is any way to get the subdirectories in which the file resides while loading while loading using Autoloader with DLT. For example:def customer(): return (  spark.readStream.format('cloudfiles')    .option('clou...

  • 4802 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Parsa Bahraminejad​ We haven't heard from you since the last response from @Vigneshraja Palaniraj​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be...

  • 0 kudos
1 More Replies
ros
by New Contributor III
  • 3691 Views
  • 2 replies
  • 2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

  • 3691 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Roshan RC​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 2 kudos
1 More Replies
erickeniuk
by New Contributor II
  • 3422 Views
  • 2 replies
  • 1 kudos

Search for Databricks Jobs By Name

The Databricks CLI has the ability to list jobs by exact name using “Databricks jobs list —name my_job”. Is there a way to search for jobs using this same method, where I could put a partial name of a job and get all the jobs that match? Ex: “databri...

  • 3422 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Eric Keniuk​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 1 kudos
1 More Replies
Nishant1307056
by New Contributor
  • 1421 Views
  • 0 replies
  • 0 kudos

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Ba...

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Badge to generate or What is the process to get it?? 

image
  • 1421 Views
  • 0 replies
  • 0 kudos
vijaykumarbotla
by New Contributor III
  • 6683 Views
  • 5 replies
  • 1 kudos

Resolved! Getting error : Analysis Exception : olumn Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark.

AnalysisException: Column Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. ...

  • 6683 Views
  • 5 replies
  • 1 kudos
Latest Reply
vijaykumarbotla
New Contributor III
  • 1 kudos

Hi All,the solution for this problem is very strange.this has caused due to the version of the Databricks runtime.We are using Runtime version 7.0 with Apache Spark 3.0.0 version.In PRD we are using Runtime version 11.3LTS with Apache Spark 3.3.0 ver...

  • 1 kudos
4 More Replies
darioAnt
by New Contributor II
  • 2581 Views
  • 1 replies
  • 2 kudos

Filtering delta table by CONCAT of a partition column and a non-partition one

Hi,I know how filtering a delta table on a partition column is a very powerful time-saving approach, but what if this column appears as a CONCAT in the where-clause?I explain my case: I have a delta table with only one partition column, say called co...

  • 2581 Views
  • 1 replies
  • 2 kudos
Latest Reply
darioAnt
New Contributor II
  • 2 kudos

I did myself a test and the answer is no:with a Concat filter, spark sql does not know I am using a partition-based column, so it scan all the table.

  • 2 kudos
Altay
by New Contributor II
  • 1110 Views
  • 0 replies
  • 0 kudos

Delta merge drops cached variables

Hi Everyone,I have an ingestion script where I use the delta merge to update and append newly incoming data in dataframe format to an existing delta table.I am experiencing an issue where all the variables that have been used previously loose their d...

  • 1110 Views
  • 0 replies
  • 0 kudos
konda1
by New Contributor
  • 1562 Views
  • 0 replies
  • 0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

  • 1562 Views
  • 0 replies
  • 0 kudos
martindlarsson
by New Contributor III
  • 1519 Views
  • 0 replies
  • 0 kudos

Autoloader and deletion vectors (Predictive IO)

We are looking into enabling Predictive IO on our delta tables. In the ingest process we are using autoloader and I am wondering if autoloader will gett a flag to enable deletion vectors at table creation? Deletion vectors is a requirement for Predic...

  • 1519 Views
  • 0 replies
  • 0 kudos
ros
by New Contributor III
  • 3904 Views
  • 2 replies
  • 3 kudos

Apache Hudi Table creation using hudi maven library

I installed hudi maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 in Dbricks Runtime Ver : 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) with spark config :spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCat...

  • 3904 Views
  • 2 replies
  • 3 kudos
Latest Reply
ros
New Contributor III
  • 3 kudos

@Shanmugavel Chandrakasu​ %sql create table hudi_cow_pt_tbl ( id bigint, name string, ts bigint, dt string, hh string ) using hudi tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' ) partitioned by (dt, hh) location '/mnt/data/h...

  • 3 kudos
1 More Replies
Anonymous
by Not applicable
  • 1394 Views
  • 0 replies
  • 2 kudos

 Hello Everyone, I am thrilled to announce that we have our 6th winner for the raffle contest -@Bolanle Adesanya​ . Please join me in congratulating h...

 Hello Everyone,I am thrilled to announce that we have our 6th winner for the raffle contest -@Bolanle Adesanya​ . Please join me in congratulating her on this remarkable achievement!Your dedication and hard work have paid off, and we are delighted t...

winner7
  • 1394 Views
  • 0 replies
  • 2 kudos
Labels