cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Harsh1
by New Contributor II
  • 2234 Views
  • 2 replies
  • 1 kudos

Query on DBFS migration

We are doing DBFS migration. In that we have a folder 'user' in Root DBFS having data 5.8 TB in legacy workspace. We performed AWS CLi Sync/cp between Legacy to Target and again performed the same between Target bucket to Target dbfs   While implemen...

  • 2234 Views
  • 2 replies
  • 1 kudos
Latest Reply
Harsh1
New Contributor II
  • 1 kudos

Thanks for the quick response.Regarding the suggested AWS data sync approach, we have tried data sync in multiple ways, it is creating folders in s3 bucket itself not on DBFS. As our task is to copy from bucket to DBFS.It seems that it only supports ...

  • 1 kudos
1 More Replies
Niha1
by New Contributor III
  • 1582 Views
  • 0 replies
  • 1 kudos

Not able to install the AIRBNB dataset when trying to run in the notebook-"Scalable ML". I am getting the error as below-:AnalysisException: Path does not exist:

file_path = f"{datasets_dir}/airbnb/sf-listings/sf-listings-2019-03-06-clean.parquet/"2airbnb_df = spark.read.format("parquet").load(file_path)3​4display(airbnb_df)AnalysisException: Path does not exist: dbfs:/user/nniha9188@gmail.com/dbacademy/machi...

  • 1582 Views
  • 0 replies
  • 1 kudos
trang_le
by Databricks Employee
  • 1068 Views
  • 0 replies
  • 1 kudos

Are you a university student or faculty member? Are you interested in getting trained by Databricks experts and getting Databricks accredited? The Dat...

Are you a university student or faculty member? Are you interested in getting trained by Databricks experts and getting Databricks accredited?The Databricks Lakehouse Platform Fundamentals Learning Plan will give you an overview of the platform and p...

  • 1068 Views
  • 0 replies
  • 1 kudos
Somi
by New Contributor III
  • 1372 Views
  • 0 replies
  • 1 kudos

How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object

Hey Sara, this Somayeh from VINN Automotive.As I had already shared with you, I am trying to distribute hyperparameter tuning using hyperopt on a tensorflow.keras model. I am using sparkTrials in my fmin:spark_trials = SparkTrials(parallelism=4)...be...

  • 1372 Views
  • 0 replies
  • 1 kudos
saira1122
by New Contributor
  • 744 Views
  • 0 replies
  • 0 kudos

bit.ly

If you are having trouble with any study problem, you should first find the source of the problem.https://bit.ly/3AoiotQ

  • 744 Views
  • 0 replies
  • 0 kudos
Dhara
by New Contributor III
  • 3273 Views
  • 3 replies
  • 1 kudos

Access multiple .mdb files using Python

Hi, I wanted to access multiple .mdb access files which are stored in the Azure Data Lake Storage(ADLS) or on Databricks File System using Python. Can anyone help me on how can I do it? It would be great if you can share some code snippets for the sa...

  • 3273 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Dhara Mandal​ Hope everything is going great.Just wanted to check in to see if you were able to resolve your issue and would you be happy to mark an answer as best or do you need more help? We'd love to hear from you.Thanks!

  • 1 kudos
2 More Replies
isaac_gritz
by Databricks Employee
  • 2079 Views
  • 0 replies
  • 1 kudos

Data Mesh with Databricks

Where to Learn More about Databricks for Data MeshWe recommend checking out our Data & AI Summit Talk on how the Databricks Lakehouse platform is the best platform for distributed architectures like Data Mesh. We would also recommend checking out thi...

  • 2079 Views
  • 0 replies
  • 1 kudos
isaac_gritz
by Databricks Employee
  • 2930 Views
  • 0 replies
  • 4 kudos

CI/CD Best Practices

Best Practices for CI/CD on DatabricksFor CI/CD and software engineering best practices with Databricks notebooks we recommend checking out this best practices guide (AWS, Azure, GCP).For CI/CD and local development using an IDE, we recommend dbx, a ...

  • 2930 Views
  • 0 replies
  • 4 kudos
weldermartins
by Honored Contributor
  • 12598 Views
  • 9 replies
  • 13 kudos

Resolved! Delta table upsert - databricks community

Hello guys,I'm trying to use upsert via delta lake following the documentation, but the command doesn't update or insert newlines.scenario: my source table is separated in bronze layer and updates or inserts are in silver layer.from delta.tables impo...

  • 12598 Views
  • 9 replies
  • 13 kudos
Latest Reply
weldermartins
Honored Contributor
  • 13 kudos

I managed to find the solution. In insert and update I was setting the target.tanks @Werner Stinckens​ !delta_df = DeltaTable.forPath(spark, 'dbfs:/mnt/silver/vendas/')     delta_df.alias('target').m...

  • 13 kudos
8 More Replies
isaac_gritz
by Databricks Employee
  • 2007 Views
  • 0 replies
  • 2 kudos

Connecting Applications and BI Tools to Databricks SQL

Access Data in Databricks Using an Application or your Favorite BI ToolYou can leverage Partner Connect for easy, low-configuration connections to some of the most popular BI tools through our optimized connectors. Alternatively, you can follow these...

  • 2007 Views
  • 0 replies
  • 2 kudos
isaac_gritz
by Databricks Employee
  • 1414 Views
  • 0 replies
  • 3 kudos

Optimize Azure VM / AWS EC2 / GKE Cloud Infrastructure Costs

Tips on Reducing Cloud Compute Infrastructure Costs for Azure VM, AWS EC2, and GCP GKE on DatabricksDatabricks takes advantage of the latest Azure VM / AWS EC2 / GKE VM/instance types to ensure you get the best price performance for your workloads on...

  • 1414 Views
  • 0 replies
  • 3 kudos
isaac_gritz
by Databricks Employee
  • 13079 Views
  • 4 replies
  • 3 kudos

Performance Tuning Best Practices

Recommendations for performance tuning best practices on DatabricksWe recommend also checking out this article from my colleague @Franco Patano​ on best practices for performance tuning on Databricks.​Performance tuning your workloads is an important...

Performance Tuning Framework.png
  • 13079 Views
  • 4 replies
  • 3 kudos
Latest Reply
isaac_gritz
Databricks Employee
  • 3 kudos

Let us know in the comments if you have any other performance tuning tips & tricks

  • 3 kudos
3 More Replies
438037
by New Contributor
  • 1580 Views
  • 0 replies
  • 0 kudos

Databricks VPC - EKS VPC security groups

Hi,We have a databricks deployment in our AWS account in a dedicated VPC which we created a VPC peering to our EKS VPC, in the EKS main security group we added a rule that opens all TCP ports from the Databricks VPC and now it's working. Once I try t...

  • 1580 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels