cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AnaMocanu
by Contributor
  • 1367 Views
  • 1 replies
  • 0 kudos

Compute pools max capacity and ideal compute settings

Hi there,I'm having a difficult time understanding the compute side of our jobs under the hood, and I checked the documentation but don't have clear answers so far, so hopefully someone will provide some clarity.I set up pools to use for our overnigh...

  • 1367 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I moved away from pools since cluster reuse is possible in databricks jobs.Why? More control over your workers, no need to find a good waiting time and you can even run multiple tasks on a single cluster.Why your jobs fail is not clear to me.When no ...

  • 0 kudos
JakubMlacki
by Databricks Partner
  • 1309 Views
  • 1 replies
  • 0 kudos

After catalog recreation I cannot create folders with names that previously had existed

I have a catalog with multiple schemas, tables and volumes. Each volume contains two folders with specific names that I cannot change.When I drop the whole catalog (CASCADE) and create catalog with the same name, with the same schemas, tables and vol...

  • 1309 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I also had issues a while back with (what I believe is) some kind of cache.  I have no knowledge how to override this.What I did is what you tried: waiting.

  • 0 kudos
AskMe55
by New Contributor
  • 1672 Views
  • 1 replies
  • 0 kudos

Allowing Azure Databricks to query a local/private database

Hello,I am trying to set up a simple machine learning pipeline where I want to generate example data on my computer, save this data into a MariaDB database on my computer, and then allow Azure Databricks to access my local database to train a model w...

  • 1672 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

To do that, Databricks needs access to your local LAN.This means configuring network security groups or a firewall.Setting up a private endpoint is also a good idea.You also have to make sure that your databricks cluster can connect to your on-prem d...

  • 0 kudos
standup1
by Contributor
  • 1373 Views
  • 1 replies
  • 1 kudos

How to use identifier() as reference

Hello,i am trying to build some dynamic sql scripts using this identifier() clause. This works fine when it is used with [ select * from identifier (mytable) ]. However, when I try to use this identifier as a reference to a foreign key table, it does...

  • 1373 Views
  • 1 replies
  • 1 kudos
Latest Reply
stil
Databricks Partner
  • 1 kudos

Is this something you are working to resolve?We would also like to use IDENTIFIER() with our REFERENCE(s) CONSTRAINT(s)If not - is it possible to get an explaination why the limitation exists?

  • 1 kudos
Hertz
by New Contributor II
  • 3891 Views
  • 3 replies
  • 0 kudos

Serverless Compute Cost Monitoring (System Tables)

Hello,I have developed a dashboard for monitoring compute costs using system tables, allowing tracking of expenses by Cluster Name (user created name), Job Name, or Warehouse Name. However, with the introduction of the new shared serverless compute, ...

  • 3891 Views
  • 3 replies
  • 0 kudos
Latest Reply
augustsc
New Contributor II
  • 0 kudos

Hi! We're also facing the same problem. We don't have any materialized views or streaming tables, but we are still seeing PREMIUM_JOBS_SERVERLESS_COMPUTE_EU_NORTH with billing_origin_product = SHARED_SERVERLESS_COMPUTE generated each day at a time wh...

  • 0 kudos
2 More Replies
Policepatil
by New Contributor III
  • 4233 Views
  • 5 replies
  • 3 kudos

Best Way to process large number of records from multiple files

Hi,I have input files in S3 with below structure./mnt/<mount_name>/test/<company_id>/sales/file_1.json/mnt/<mount_name>/test/<company_id>/sales/file_2.json/mnt/<mount_name>/test/<company_id>/sales/file_<n>.jsonNumber of companies = 15Number of files ...

  • 4233 Views
  • 5 replies
  • 3 kudos
Latest Reply
lprevost
Contributor III
  • 3 kudos

Are the json files compressed?  If they are in .gz, this is unsplittable which means you lose some of spark's parallel magic.

  • 3 kudos
4 More Replies
ADB0513
by Databricks Partner
  • 3999 Views
  • 5 replies
  • 1 kudos

Load tables from JDBC in parallel

I have a list of about 80 tables that I need to load from an Oracle database into Databricks via JDBC.  I would like to do this in parallel, instead of looping through one table at a time.I have a function defined to ingest the data:  def ingest_data...

  • 3999 Views
  • 5 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @ADB0513, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will h...

  • 1 kudos
4 More Replies
l_c_s
by New Contributor II
  • 3049 Views
  • 3 replies
  • 1 kudos

Random errors SparkException: Job aborted due to stage failure

Hi, we are trying to run some workflows on a shared cluster, with Databricks runtime version 14.3 LTS, and we randomly receive the error: SparkException: Job aborted due to stage failure: Task 2 in stage 78.0 failed 4 times, most recent failure: Lost...

error_sandbox.png
  • 3049 Views
  • 3 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @l_c_s, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will hel...

  • 1 kudos
2 More Replies
seeker
by New Contributor II
  • 2021 Views
  • 4 replies
  • 1 kudos

Get metadata of files present in a zip

I have a .zip file present on an ADLS path which contains multiple files of different formats. I want to get metadata of the files like file name, modification time present in it without unzipping it. I have a code which works for smaller zip but run...

  • 2021 Views
  • 4 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @seeker, There are only 2 ways I can think of to do it: Write a UDF.Write customized MapReduce logic instead of using Spark SQL. But they are kind of the same. So I would say UDF is a good solution.

  • 1 kudos
3 More Replies
subhas_1729
by New Contributor II
  • 1186 Views
  • 2 replies
  • 2 kudos

CSV file and partitions

Hi       I want to know whether csv files can be partitioned or not. I find from a book that parque, Avro .. these types of files can be partitioned only. RegardsSubhas

  • 1186 Views
  • 2 replies
  • 2 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 2 kudos

Hi @subhas_1729, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This wi...

  • 2 kudos
1 More Replies
amit119
by Databricks Partner
  • 4023 Views
  • 0 replies
  • 0 kudos

Not able to access partner-academy

Hi,I have used my company email to register an account for customer-academy.databricks.com a while back.Now what I need to do is create an account with partner-academy.databricks.com using my company email too.However when I register at partner-acade...

  • 4023 Views
  • 0 replies
  • 0 kudos
Ravikumashi
by Contributor
  • 3625 Views
  • 3 replies
  • 0 kudos

Extract cluster usage tags from databricks cluster init script

Is it possible we extract cluster usage tags from databricks cluster init script, I am specifically interested in spark.databricks.clusterUsageTags.clusterAllTags.I tried to extract from /databricks/spark/conf/spark.conf and /databricks/spark/conf/sp...

Data Engineering
Azure Databricks
  • 3625 Views
  • 3 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, For reference: https://community.databricks.com/t5/data-engineering/pull-cluster-tags/td-p/19216 , could you please confirm the key expectation here? Extracting as such? 

  • 0 kudos
2 More Replies
hemanthtirumala
by New Contributor II
  • 3782 Views
  • 0 replies
  • 0 kudos

Free Voucher Worth 200$ in the upcoming events are there Please send me note on it

I need info about any upcoming events that databricks will provide me a free voucher for the Azure platform architect exam , anyone know the time or a hunch about it please ping me the details. i will be stay tuned at that point of time.................

  • 3782 Views
  • 0 replies
  • 0 kudos
ksenija
by Contributor
  • 4153 Views
  • 5 replies
  • 5 kudos

DLT pipeline - SCD type 2

I created my table using SCD type 2 in SQL. I need to do full refresh to load all of the data. Whenever I update data in my source table, in my new table scd_target I see only the latest record, history is not being saved. CREATE OR REFRESH STREAMING...

  • 4153 Views
  • 5 replies
  • 5 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 5 kudos

hi @ksenija  i got your use case but could you please tell me , what do you mean by "sources_test.sources_test.source" 

  • 5 kudos
4 More Replies
Labels