cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 936 Views
  • 0 replies
  • 4 kudos

Happy August! �� On August 25th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have...

Happy August! On August 25th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social! Join us for our August ...

  • 936 Views
  • 0 replies
  • 4 kudos
AP
by New Contributor III
  • 4996 Views
  • 5 replies
  • 3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

  • 4996 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@AKSHAY PALLERLA​ Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens​ for jumping in, as always!

  • 3 kudos
4 More Replies
Jayesh
by New Contributor III
  • 3175 Views
  • 5 replies
  • 3 kudos

Resolved! How can we do data copy from Databricks SQL using notebook?

Hi Team, we have a scenario where we have to connect to the DataBricks SQL instance 1 from another DataBricks instance 2 using notebook or Azure Data Factory. Can you please help?

  • 3175 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Thanks for jumping in to help @Arvind Ravish​  @Hubert Dudek​ and @Artem Sheiko​ !

  • 3 kudos
4 More Replies
Jeade
by New Contributor II
  • 3663 Views
  • 3 replies
  • 1 kudos

Resolved! Pulling data from Azure Boards into databricks

Looking for best practises/examples on how to pull data (epics, features, PBIs) from Azure Boards into databricks for analysis.Any ideas/help appreciated!

  • 3663 Views
  • 3 replies
  • 1 kudos
Latest Reply
artsheiko
Databricks Employee
  • 1 kudos

you can use export to csv (link), push the file to the storage mounted to Databricks or just import the file obtained to dbfs

  • 1 kudos
2 More Replies
cralle
by New Contributor II
  • 7414 Views
  • 7 replies
  • 2 kudos

Resolved! Cannot display DataFrame when I filter by length

I have a DataFrame that I have created based on a couple of datasets and multiple operations. The DataFrame has multiple columns, one of which is a array of strings. But when I take the DataFrame and try to filter based upon the size of this array co...

image image
  • 7414 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

strange, works fine here. what version of databricks are you on?What you could do to identify the issue is to output the query plan (.explain).And also creating a new df for each transformation could help. Like that you can check step by step where...

  • 2 kudos
6 More Replies
tej1
by New Contributor III
  • 4499 Views
  • 5 replies
  • 7 kudos

Resolved! Trouble accessing `_metadata` column using cloudFiles in Delta Live Tables

We are building a delta live pipeline where we ingest csv files in AWS S3 using cloudFiles. And it is necessary to access the file modification timestamp of the file. As documented here, we tried selecting `_metadata` column in a task in delta live p...

  • 4499 Views
  • 5 replies
  • 7 kudos
Latest Reply
tej1
New Contributor III
  • 7 kudos

Update: We were able to test `_metadata` column feature in DLT "preview" mode (which is DBR 11.0). Databricks doesn't recommend production workloads when using "preview" mode, but nevertheless, glad to be using this feature in DLT.

  • 7 kudos
4 More Replies
alexgv12
by New Contributor III
  • 3082 Views
  • 2 replies
  • 3 kudos

delta table separate gold zone by different tenant

Hello, currently we have a process that builds with delta table the zones of bronze, silver and when it reaches gold we must create specific zones for each client because the schema changes, for this we create databases and separate tables, but when ...

image
  • 3082 Views
  • 2 replies
  • 3 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 3 kudos

Hi @alexander grajales vanegas​ Are you creating all the databases and tables in gold zone manually?If so, please check out DLT https://docs.databricks.com/data-engineering/delta-live-tables/index.html, it will take care of your complete pipeline by ...

  • 3 kudos
1 More Replies
GKKarthi
by New Contributor
  • 6146 Views
  • 6 replies
  • 2 kudos

Resolved! Databricks - Simba SparkJDBCDriver 500550 exception

We have a Denodo big data platform hosted on Databricks. Recently we have been facing the exception with message '[Simba][SparkJDBCDriver](500550)'  with the Databricks which interrupts the Databricks connection after the certain time Interval usuall...

  • 6146 Views
  • 6 replies
  • 2 kudos
Latest Reply
PFBOLIVEIRA
New Contributor II
  • 2 kudos

Hi All,We are also experiencing the same behavior:[Simba][SimbaSparkJDBCDriver] (500550) The next rowset buffer is already marked as consumed. The fetch thread might have terminated unexpectedly. Foreground thread ID: xxxx. Background thread ID: yyyy...

  • 2 kudos
5 More Replies
pankaj92
by New Contributor II
  • 4901 Views
  • 4 replies
  • 0 kudos

extract latest files from ADLS Gen2 mount point in databricks using pyspark

Hi Team,I am trying to get the latest files from an ADLS mount point directory. I am not sure how to extract latest files ,Last modified Date using Pyspark from ADLS Gen2 storage account. Please let me know asap. Thanks! I am looking forward your re...

  • 4901 Views
  • 4 replies
  • 0 kudos
Latest Reply
Sha_1890
New Contributor III
  • 0 kudos

Hi @pankaj92​ ,I wrote a Python code to pick a latest file from mnt location ,import ospath = "/dbfs/mnt/xxxx"filelist=[]for file_item in os.listdir(path):  filelist.append(file_item)file=len(filelist)print(filelist[file-1])Thanks

  • 0 kudos
3 More Replies
ivanychev
by Contributor II
  • 9534 Views
  • 5 replies
  • 2 kudos

Resolved! How to find out why the cluster is in PENDING state for so long?

I'm using Databricks on AWS. Our clusters are typically in PENDING state for 5-8 minutes after they are created. I would like to find out why (ec2 instance provisioning? docker image download is slow? ...?). The cluster logs are not helpful enough be...

  • 9534 Views
  • 5 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

hi @Sergey Ivanychev​ while the cluster is starting, you can see the status on the compute page. Hover the mouse pointer to the green rotating circle on the left of the cluster name. It will give a notification of what is happening on the cluster. Wh...

  • 2 kudos
4 More Replies
118004
by New Contributor II
  • 2263 Views
  • 1 replies
  • 2 kudos

Resolved! Installing pdpbox plugin on cluster

Hello,We are having issues installing the pdpbox library on a fresh cluster. This includes trying to upload and install a whl file, or using pip in a workbook. I have attached an example of an error received. Can anybody assist with installing the...

  • 2263 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

PDPbox is updated rarely, and it requires older versions of matplotlib (3.1.1)https://github.com/SauceCat/PDPboxIt tries to install but fails because matplotlib requires pkgconfig.The solution to that is to use Machine Learning runtime. There it will...

  • 2 kudos
PSY
by New Contributor III
  • 5663 Views
  • 5 replies
  • 2 kudos

Resolved! Updating git token fails

When updating an expired Azure DevOps personal access token (PAT) for git integration, I get the error message "Failed to save. Please try again.". The error persists with different tokens. Previously (months ago), updating the token did not result i...

Screenshot 2022-07-19 at 13.39.56
  • 5663 Views
  • 5 replies
  • 2 kudos
Latest Reply
Atanu
Databricks Employee
  • 2 kudos

Is this happening for all users @Pencho Yordanov​ 

  • 2 kudos
4 More Replies
al_joe
by Contributor
  • 6625 Views
  • 3 replies
  • 6 kudos

Resolved! Can I use Databricks CLI with community edition?

I installed the CLI but unable to configure it to connect to my instance -- as I am unable to find the "Generate Access tokens" option under User Settings page.Documentation does not say whether this feature is disabled for community edition.

  • 6625 Views
  • 3 replies
  • 6 kudos
Latest Reply
Prabakar
Databricks Employee
  • 6 kudos

hi @Al Jo​ we understand your interest in learning Databricks. However, the community edition is limited in features. Certain features are available only in the paid version. If you are interested, to use the full features, then I would suggest you g...

  • 6 kudos
2 More Replies
Ryan512
by New Contributor III
  • 1773 Views
  • 2 replies
  • 2 kudos

Autoloader (GCP) Custom PubSub Queue

I want to know if what I describe below is possible with AutoLoader in the Google Cloud Platform.Problem Description:We have GCS buckets for every client/account. Inside these buckets is a path/blob for each client's instances of our platform. A clie...

  • 1773 Views
  • 2 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hello @Ryan Ebanks​ Please let us know if more help is needed on this.

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels