cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

165036
by New Contributor III
  • 1446 Views
  • 3 replies
  • 1 kudos

Resolved! Error message when editing schedule cron expression on job

When attempting to edit the schedule cron expression on one of our jobs we receive the following error message:Cluster validation error: Validation failed for spark_conf, spark.databricks.acl.dfAclsEnabled must be false (is "true") The spark.databric...

  • 1446 Views
  • 3 replies
  • 1 kudos
Latest Reply
165036
New Contributor III
  • 1 kudos

FYI this was a temporary Databricks bug. Seems to be resolved now.

  • 1 kudos
2 More Replies
AP
by New Contributor III
  • 3197 Views
  • 5 replies
  • 3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

  • 3197 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@AKSHAY PALLERLA​ Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens​ for jumping in, as always!

  • 3 kudos
4 More Replies
Jayesh
by New Contributor III
  • 1973 Views
  • 5 replies
  • 3 kudos

Resolved! How can we do data copy from Databricks SQL using notebook?

Hi Team, we have a scenario where we have to connect to the DataBricks SQL instance 1 from another DataBricks instance 2 using notebook or Azure Data Factory. Can you please help?

  • 1973 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Thanks for jumping in to help @Arvind Ravish​  @Hubert Dudek​ and @Artem Sheiko​ !

  • 3 kudos
4 More Replies
Jeade
by New Contributor II
  • 2069 Views
  • 3 replies
  • 1 kudos

Resolved! Pulling data from Azure Boards into databricks

Looking for best practises/examples on how to pull data (epics, features, PBIs) from Azure Boards into databricks for analysis.Any ideas/help appreciated!

  • 2069 Views
  • 3 replies
  • 1 kudos
Latest Reply
artsheiko
Valued Contributor III
  • 1 kudos

you can use export to csv (link), push the file to the storage mounted to Databricks or just import the file obtained to dbfs

  • 1 kudos
2 More Replies
cralle
by New Contributor II
  • 4377 Views
  • 7 replies
  • 2 kudos

Resolved! Cannot display DataFrame when I filter by length

I have a DataFrame that I have created based on a couple of datasets and multiple operations. The DataFrame has multiple columns, one of which is a array of strings. But when I take the DataFrame and try to filter based upon the size of this array co...

image image
  • 4377 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

strange, works fine here. what version of databricks are you on?What you could do to identify the issue is to output the query plan (.explain).And also creating a new df for each transformation could help. Like that you can check step by step where...

  • 2 kudos
6 More Replies
tej1
by New Contributor III
  • 2673 Views
  • 6 replies
  • 7 kudos

Resolved! Trouble accessing `_metadata` column using cloudFiles in Delta Live Tables

We are building a delta live pipeline where we ingest csv files in AWS S3 using cloudFiles. And it is necessary to access the file modification timestamp of the file. As documented here, we tried selecting `_metadata` column in a task in delta live p...

  • 2673 Views
  • 6 replies
  • 7 kudos
Latest Reply
tej1
New Contributor III
  • 7 kudos

Update: We were able to test `_metadata` column feature in DLT "preview" mode (which is DBR 11.0). Databricks doesn't recommend production workloads when using "preview" mode, but nevertheless, glad to be using this feature in DLT.

  • 7 kudos
5 More Replies
alexgv12
by New Contributor III
  • 2109 Views
  • 2 replies
  • 3 kudos

delta table separate gold zone by different tenant

Hello, currently we have a process that builds with delta table the zones of bronze, silver and when it reaches gold we must create specific zones for each client because the schema changes, for this we create databases and separate tables, but when ...

image
  • 2109 Views
  • 2 replies
  • 3 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 3 kudos

Hi @alexander grajales vanegas​ Are you creating all the databases and tables in gold zone manually?If so, please check out DLT https://docs.databricks.com/data-engineering/delta-live-tables/index.html, it will take care of your complete pipeline by ...

  • 3 kudos
1 More Replies
GKKarthi
by New Contributor
  • 3871 Views
  • 7 replies
  • 2 kudos

Resolved! Databricks - Simba SparkJDBCDriver 500550 exception

We have a Denodo big data platform hosted on Databricks. Recently we have been facing the exception with message '[Simba][SparkJDBCDriver](500550)'  with the Databricks which interrupts the Databricks connection after the certain time Interval usuall...

  • 3871 Views
  • 7 replies
  • 2 kudos
Latest Reply
PFBOLIVEIRA
New Contributor II
  • 2 kudos

Hi All,We are also experiencing the same behavior:[Simba][SimbaSparkJDBCDriver] (500550) The next rowset buffer is already marked as consumed. The fetch thread might have terminated unexpectedly. Foreground thread ID: xxxx. Background thread ID: yyyy...

  • 2 kudos
6 More Replies
pankaj92
by New Contributor II
  • 3588 Views
  • 4 replies
  • 0 kudos

extract latest files from ADLS Gen2 mount point in databricks using pyspark

Hi Team,I am trying to get the latest files from an ADLS mount point directory. I am not sure how to extract latest files ,Last modified Date using Pyspark from ADLS Gen2 storage account. Please let me know asap. Thanks! I am looking forward your re...

  • 3588 Views
  • 4 replies
  • 0 kudos
Latest Reply
Sha_1890
New Contributor III
  • 0 kudos

Hi @pankaj92​ ,I wrote a Python code to pick a latest file from mnt location ,import ospath = "/dbfs/mnt/xxxx"filelist=[]for file_item in os.listdir(path):  filelist.append(file_item)file=len(filelist)print(filelist[file-1])Thanks

  • 0 kudos
3 More Replies
ivanychev
by Contributor
  • 5680 Views
  • 5 replies
  • 2 kudos

Resolved! How to find out why the cluster is in PENDING state for so long?

I'm using Databricks on AWS. Our clusters are typically in PENDING state for 5-8 minutes after they are created. I would like to find out why (ec2 instance provisioning? docker image download is slow? ...?). The cluster logs are not helpful enough be...

  • 5680 Views
  • 5 replies
  • 2 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 2 kudos

hi @Sergey Ivanychev​ while the cluster is starting, you can see the status on the compute page. Hover the mouse pointer to the green rotating circle on the left of the cluster name. It will give a notification of what is happening on the cluster. Wh...

  • 2 kudos
4 More Replies
118004
by New Contributor II
  • 1425 Views
  • 1 replies
  • 2 kudos

Resolved! Installing pdpbox plugin on cluster

Hello,We are having issues installing the pdpbox library on a fresh cluster. This includes trying to upload and install a whl file, or using pip in a workbook. I have attached an example of an error received. Can anybody assist with installing the...

  • 1425 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

PDPbox is updated rarely, and it requires older versions of matplotlib (3.1.1)https://github.com/SauceCat/PDPboxIt tries to install but fails because matplotlib requires pkgconfig.The solution to that is to use Machine Learning runtime. There it will...

  • 2 kudos
al_joe
by Contributor
  • 3488 Views
  • 4 replies
  • 3 kudos

Resolved! Can I use Databricks CLI with community edition?

I installed the CLI but unable to configure it to connect to my instance -- as I am unable to find the "Generate Access tokens" option under User Settings page.Documentation does not say whether this feature is disabled for community edition.

  • 3488 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @al_joe (Customer)​​, We haven't heard from you on the last response from @Prabakar​.Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

  • 3 kudos
3 More Replies
PSY
by New Contributor III
  • 3437 Views
  • 5 replies
  • 2 kudos

Resolved! Updating git token fails

When updating an expired Azure DevOps personal access token (PAT) for git integration, I get the error message "Failed to save. Please try again.". The error persists with different tokens. Previously (months ago), updating the token did not result i...

Screenshot 2022-07-19 at 13.39.56
  • 3437 Views
  • 5 replies
  • 2 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 2 kudos

Is this happening for all users @Pencho Yordanov​ 

  • 2 kudos
4 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors