cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rameshybr
by New Contributor II
  • 2819 Views
  • 2 replies
  • 0 kudos

How to get the files one by one in blob storage using pyspark/python

how to write the pyspark/python to get the files one by one in blob storage.

  • 2819 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 0 kudos

@rameshybr # List files in a directory files = dbutils.fs.ls("/mnt/<mount-name>/path/to/directory") for file in files: file_path = file.path # Read each file into a DataFrame / if you file format is parquet for example i am taking df = s...

  • 0 kudos
1 More Replies
alxsbn
by Contributor
  • 6202 Views
  • 5 replies
  • 7 kudos

How to change SQL editor / schema browser defalut catalog / database

On SQL editor / schema browser Is there a way to change the default catalog / database ? My mine always fixed on my unity catalog. 

  • 6202 Views
  • 5 replies
  • 7 kudos
Latest Reply
Debayan
Databricks Employee
  • 7 kudos

Hi, From the dropdown you can get the data objects.https://docs.databricks.com/sql/user/queries/queries.html#browse-data-objects-in-sql-editorPlease let us know if this helps. Also, please tag @Debayan​ with your next comment so that I will get notif...

  • 7 kudos
4 More Replies
alexgv12
by New Contributor III
  • 1064 Views
  • 2 replies
  • 0 kudos

isolated databricks cluster call from synapses or azure datafactory

https://learn.microsoft.com/en-us/answers/questions/1919424/isolated-databricks-cluster-call-from-synapses-orhow can I create a job in databricks with parameters of isolated from synapses or azure datafactory, because I can not find any option that a...

Captura de pantalla 2024-08-20 122534.png
  • 1064 Views
  • 2 replies
  • 0 kudos
Latest Reply
alexgv12
New Contributor III
  • 0 kudos

Hi warner thanks for your question, I share the link service in synapses updated, currently we have a pool in databricks then what we do with the link service is that it creates a job and uploads an instance with the resources of our pool but to uplo...

  • 0 kudos
1 More Replies
AnaMocanu
by Contributor
  • 1078 Views
  • 1 replies
  • 0 kudos

Compute pools max capacity and ideal compute settings

Hi there,I'm having a difficult time understanding the compute side of our jobs under the hood, and I checked the documentation but don't have clear answers so far, so hopefully someone will provide some clarity.I set up pools to use for our overnigh...

  • 1078 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I moved away from pools since cluster reuse is possible in databricks jobs.Why? More control over your workers, no need to find a good waiting time and you can even run multiple tasks on a single cluster.Why your jobs fail is not clear to me.When no ...

  • 0 kudos
JakubMlacki
by New Contributor III
  • 1040 Views
  • 1 replies
  • 0 kudos

After catalog recreation I cannot create folders with names that previously had existed

I have a catalog with multiple schemas, tables and volumes. Each volume contains two folders with specific names that I cannot change.When I drop the whole catalog (CASCADE) and create catalog with the same name, with the same schemas, tables and vol...

  • 1040 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I also had issues a while back with (what I believe is) some kind of cache.  I have no knowledge how to override this.What I did is what you tried: waiting.

  • 0 kudos
AskMe55
by New Contributor
  • 1154 Views
  • 1 replies
  • 0 kudos

Allowing Azure Databricks to query a local/private database

Hello,I am trying to set up a simple machine learning pipeline where I want to generate example data on my computer, save this data into a MariaDB database on my computer, and then allow Azure Databricks to access my local database to train a model w...

  • 1154 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

To do that, Databricks needs access to your local LAN.This means configuring network security groups or a firewall.Setting up a private endpoint is also a good idea.You also have to make sure that your databricks cluster can connect to your on-prem d...

  • 0 kudos
standup1
by Contributor
  • 1041 Views
  • 1 replies
  • 1 kudos

How to use identifier() as reference

Hello,i am trying to build some dynamic sql scripts using this identifier() clause. This works fine when it is used with [ select * from identifier (mytable) ]. However, when I try to use this identifier as a reference to a foreign key table, it does...

  • 1041 Views
  • 1 replies
  • 1 kudos
Latest Reply
stil
New Contributor II
  • 1 kudos

Is this something you are working to resolve?We would also like to use IDENTIFIER() with our REFERENCE(s) CONSTRAINT(s)If not - is it possible to get an explaination why the limitation exists?

  • 1 kudos
dnchankov
by New Contributor II
  • 9115 Views
  • 3 replies
  • 4 kudos

Why my notebook I created in a Repo can be opened safe?

I've cloned a Repo during "Get Started with Data Engineering on Databricks".Then I'm trying to run another notebook from a cell with a magic %run command.But I get that the file can't be opened safe.Here my code:notebook_aname = "John" print(f"Hello ...

  • 9115 Views
  • 3 replies
  • 4 kudos
Latest Reply
petermeissner
New Contributor II
  • 4 kudos

It could be that you need to put the %run in a cell all by itself. Suggested here: https://stackoverflow.com/a/72833400/1144966

  • 4 kudos
2 More Replies
Hertz
by New Contributor II
  • 3141 Views
  • 3 replies
  • 0 kudos

Serverless Compute Cost Monitoring (System Tables)

Hello,I have developed a dashboard for monitoring compute costs using system tables, allowing tracking of expenses by Cluster Name (user created name), Job Name, or Warehouse Name. However, with the introduction of the new shared serverless compute, ...

  • 3141 Views
  • 3 replies
  • 0 kudos
Latest Reply
augustsc
New Contributor II
  • 0 kudos

Hi! We're also facing the same problem. We don't have any materialized views or streaming tables, but we are still seeing PREMIUM_JOBS_SERVERLESS_COMPUTE_EU_NORTH with billing_origin_product = SHARED_SERVERLESS_COMPUTE generated each day at a time wh...

  • 0 kudos
2 More Replies
Policepatil
by New Contributor III
  • 3118 Views
  • 5 replies
  • 3 kudos

Best Way to process large number of records from multiple files

Hi,I have input files in S3 with below structure./mnt/<mount_name>/test/<company_id>/sales/file_1.json/mnt/<mount_name>/test/<company_id>/sales/file_2.json/mnt/<mount_name>/test/<company_id>/sales/file_<n>.jsonNumber of companies = 15Number of files ...

  • 3118 Views
  • 5 replies
  • 3 kudos
Latest Reply
lprevost
Contributor II
  • 3 kudos

Are the json files compressed?  If they are in .gz, this is unsplittable which means you lose some of spark's parallel magic.

  • 3 kudos
4 More Replies
ADB0513
by New Contributor III
  • 2922 Views
  • 5 replies
  • 1 kudos

Load tables from JDBC in parallel

I have a list of about 80 tables that I need to load from an Oracle database into Databricks via JDBC.  I would like to do this in parallel, instead of looping through one table at a time.I have a function defined to ingest the data:  def ingest_data...

  • 2922 Views
  • 5 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @ADB0513, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will h...

  • 1 kudos
4 More Replies
l_c_s
by New Contributor II
  • 2374 Views
  • 3 replies
  • 1 kudos

Random errors SparkException: Job aborted due to stage failure

Hi, we are trying to run some workflows on a shared cluster, with Databricks runtime version 14.3 LTS, and we randomly receive the error: SparkException: Job aborted due to stage failure: Task 2 in stage 78.0 failed 4 times, most recent failure: Lost...

error_sandbox.png
  • 2374 Views
  • 3 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @l_c_s, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will hel...

  • 1 kudos
2 More Replies
seeker
by New Contributor II
  • 1470 Views
  • 4 replies
  • 1 kudos

Get metadata of files present in a zip

I have a .zip file present on an ADLS path which contains multiple files of different formats. I want to get metadata of the files like file name, modification time present in it without unzipping it. I have a code which works for smaller zip but run...

  • 1470 Views
  • 4 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @seeker, There are only 2 ways I can think of to do it: Write a UDF.Write customized MapReduce logic instead of using Spark SQL. But they are kind of the same. So I would say UDF is a good solution.

  • 1 kudos
3 More Replies
subhas_1729
by New Contributor II
  • 903 Views
  • 2 replies
  • 2 kudos

CSV file and partitions

Hi       I want to know whether csv files can be partitioned or not. I find from a book that parque, Avro .. these types of files can be partitioned only. RegardsSubhas

  • 903 Views
  • 2 replies
  • 2 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 2 kudos

Hi @subhas_1729, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This wi...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels