cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Tahseen0354
by Valued Contributor
  • 1445 Views
  • 1 replies
  • 1 kudos

Configure CLI on databricks on GCP

Hi, I have a service account in my GCP project and the service account is added as a user in my databricks GCP account. Is it possible to configure CLI on databricks on GCP using that service account ? Something similar to:databricks configure ---tok...

  • 1445 Views
  • 1 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
Hi, I have a service account in my GCP project and the service account is added as a user in my databricks GCP account. Is it possible to configure CLI on databricks on GCP using that service account ? Something similar to:databricks configure ---tok...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.
LukaszJ
by Contributor III
  • 4292 Views
  • 4 replies
  • 4 kudos

Resolved! Terraform: get metastore id without creating new metastore

Hello,I want to create database (schema) and tables in my Databricks workspace using terraform.I found this resources: databricks_schemaIt requires databricks_catalog, which requires metastore_id.However, I have databricks_workspace and I did not cre...

  • 4292 Views
  • 4 replies
  • 4 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 4 kudos

https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/schema I think this is for UC. https://docs.databricks.com/data-governance/unity-catalog/index.html

  • 4 kudos
3 More Replies
Juniper_AIML
by New Contributor
  • 3886 Views
  • 3 replies
  • 0 kudos

How to access the virtual environment directory where the databricks notebooks are running?

How to get access to a separate virtual environment space and its storage location on databricks so that we can move our created libraries into it without waiting for their installation each time the cluster is brought up.What we want basically is a ...

  • 3886 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Aman Gaurav​ Thank you for posting your question.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 0 kudos
2 More Replies
alejandrofm
by Valued Contributor
  • 2964 Views
  • 4 replies
  • 4 kudos

Resolved! Are there any recommended spark config settings for Delta/Databricks?

Hi! I'm starting to test configs on DataBricks, for example, to avoid corrupting data if two processes try to write at the same time:.config('spark.databricks.delta.multiClusterWrites.enabled', 'false')Or if I need more partitions than default .confi...

  • 2964 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Alejandro Martinez​ Hope everything is going well.Just wanted to see if you were able to find an answer to your question. If yes, would you be happy to let us know and mark it as best so that other members can find the solution more quickl...

  • 4 kudos
3 More Replies
DejanSunderic
by New Contributor III
  • 12685 Views
  • 11 replies
  • 3 kudos

is command stuck?

I created some ETL using DataFrames in python. It used to run 180 sec. But it is not taking ~ 1200 sec. I have been changing it, so it could be something that I introduced, or something in the environment.Part of the process is appending results into...

  • 12685 Views
  • 11 replies
  • 3 kudos
Latest Reply
Carneiro
New Contributor II
  • 3 kudos

I am having a problem very similar. Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands like: dataframe.show(n=1) dataframe.toPandas() dataframe.description() dataframe.wr...

  • 3 kudos
10 More Replies
Thefan
by New Contributor II
  • 858 Views
  • 0 replies
  • 1 kudos

Koalas dropna in DLT

Greetings !I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.My goal is to drop all columns that contain only null/na values before writing it.Current code is this : @dlt...

  • 858 Views
  • 0 replies
  • 1 kudos
shawncao
by New Contributor II
  • 3680 Views
  • 0 replies
  • 0 kudos

REST api to execute SQL query and read output

Hi there,I'm using these two APIs to execute SQL statements and read output back when it's finished. However, seems it always returns only 1000 rows even though I need all the results (millions of rows), is there a solution for this? execute SQL: htt...

  • 3680 Views
  • 0 replies
  • 0 kudos
Jackie
by New Contributor II
  • 5602 Views
  • 3 replies
  • 6 kudos

Resolved! speed up a for loop in python (azure databrick)

code example# a list of file pathlist_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]# copy all file above to this folderdest_path=""/dbfs/mnt/..."for file_path in list_files_path: # copy function copy_file(file_path, dest_path)I am runni...

  • 5602 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hemant
Valued Contributor II
  • 6 kudos

@Jackie Chan​ , What's the data size you want to copy? If it's bigger, then use ADF.

  • 6 kudos
2 More Replies
818674
by New Contributor III
  • 7269 Views
  • 10 replies
  • 8 kudos

Resolved! How to perform a cross-check for data in multiple columns in same table?

I am trying to check whether a certain datapoint exists in multiple locations.This is what my table looks like:I am checking whether the same datapoint is in two locations. The idea is that this datapoint should exist in BOTH locations, and be counte...

Table Examples of Results for Cross-Checking
  • 7269 Views
  • 10 replies
  • 8 kudos
Latest Reply
818674
New Contributor III
  • 8 kudos

Hi,Thank you very much for following up. I no longer need assistance with this issue.

  • 8 kudos
9 More Replies
deisou
by New Contributor
  • 2625 Views
  • 4 replies
  • 2 kudos

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up. The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks...

  • 2625 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @deisou​ Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Cheers!

  • 2 kudos
3 More Replies
Kyle
by New Contributor II
  • 23846 Views
  • 5 replies
  • 4 kudos

Resolved! What's the best way to manage multiple versions of the same datasets?

We have use cases that require multiple versions of the same datasets to be available. For example, we have a knowledge graph made of entities of relations, and we have multiple versions of the knowledge graph that's distinguished by schema names ri...

  • 23846 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Kyle Gao​ Hope you are doing well. Thank you for posting your query.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Cheers!

  • 4 kudos
4 More Replies
Darshana_Ganesh
by New Contributor II
  • 2678 Views
  • 4 replies
  • 2 kudos

Resolved! Post upgrading the Azure databricks cluster from 8.3 (includes Apache Spark 3.1.1, Scala 2.12) to 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I am getting intermittent error.

The error is as below. The error is intermittent. eg. - The same code throws the below issue for run 3 but doesn't throws issue for run 4. Then again throws issue for run 5.An error occurred while calling o1509.getCause. Trace:py4j.security.Py4JSecur...

  • 2678 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey @Darshana Ganesh​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
3 More Replies
Development
by New Contributor III
  • 4355 Views
  • 5 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 4355 Views
  • 5 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
4 More Replies
manasa
by Contributor
  • 4096 Views
  • 7 replies
  • 2 kudos

Resolved! Recursive view error while using spark 3.2.0 version

This happens while creating temp view using below code blocklatest_data.createOrReplaceGlobalTempView("e_test")ideally this command should replace the view if e_test already exists instead it is throwing"Recursive view `global_temp`.`e_test` detecte...

  • 4096 Views
  • 7 replies
  • 2 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 2 kudos

Hi, @Manasa​, could you please check SPARK-38318 and use Spark 3.1.2, Spark 3.2.2, or Spark 3.3.0 to allow cyclic reference?

  • 2 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels