cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16835756816
by Valued Contributor
  • 6999 Views
  • 1 replies
  • 6 kudos

How can I simplify my data ingestion by processing the data as it arrives in cloud storage?

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables. Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delt...

  • 6999 Views
  • 1 replies
  • 6 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 6 kudos

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delta...

  • 6 kudos
ricperelli
by New Contributor II
  • 1991 Views
  • 0 replies
  • 1 kudos

How can i save a parquet file using pandas with a data factory orchestrated notebook?

Hi guys,this is my first question, feel free to correct me if i'm doing something wrong.Anyway, i'm facing a really strange problem, i have a notebook in which i'm performing some pandas analysis, after that i save the resulting dataframe in a parque...

  • 1991 Views
  • 0 replies
  • 1 kudos
venkad
by Contributor
  • 1076 Views
  • 0 replies
  • 4 kudos

Default location for Schema/Database in Unity

Hello Bricksters,We organize the delta lake in multiple storage accounts. One storage account per data domain and one container per database. This helps us to isolate the resources and cost on the business domain level.Earlier, when a schema/database...

  • 1076 Views
  • 0 replies
  • 4 kudos
vizoso
by New Contributor III
  • 1283 Views
  • 1 replies
  • 3 kudos

Cluster list in Microsoft.Azure.Databricks.Client fails because ClusterSource enum does not include MODELS. When you have a model serving cluster, Clu...

Cluster list in Microsoft.Azure.Databricks.Client fails because ClusterSource enum does not include MODELS.When you have a model serving cluster, ClustersApiClient.List method fails to deserialize the API response because that cluster has MODELS as C...

  • 1283 Views
  • 1 replies
  • 3 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 3 kudos

This widget could not be displayed.
Cluster list in Microsoft.Azure.Databricks.Client fails because ClusterSource enum does not include MODELS.When you have a model serving cluster, ClustersApiClient.List method fails to deserialize the API response because that cluster has MODELS as C...

This widget could not be displayed.
  • 3 kudos
This widget could not be displayed.
saurabh12521
by New Contributor II
  • 2254 Views
  • 3 replies
  • 4 kudos

Unity through terraform

I am working on automation of Unity through terraform. I have referred below link link to get started :https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/unity-catalog-azureI am facing issue when I create metastore using...

image
  • 2254 Views
  • 3 replies
  • 4 kudos
Latest Reply
Pat
Honored Contributor III
  • 4 kudos

Not sure if you got this working, but I noticed you are using provider: `databrickslabs/databricks`, hence why this is not avaialable. You should be using new provider: `databricks/databricks`: https://registry.terraform.io/providers/databricks/datab...

  • 4 kudos
2 More Replies
DataBricks_2022
by New Contributor III
  • 1190 Views
  • 1 replies
  • 1 kudos

Resolved! How to get started with Auto Loader using partner academy portal? Are there any videos and step by step material

Need Video and step by step documentation on Auto Loader as well as how to build end-to-end data pipeline

  • 1190 Views
  • 1 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@raja iqbal​ below course will provide overview related Autoloader Course name: How to Use Databricks' Auto Loader for Incremental ETL with the Databricks Data Science and Data Engineering WorkspaceIf you register for Data Engineer Catalog, then you ...

  • 1 kudos
cvantassel
by New Contributor III
  • 5668 Views
  • 7 replies
  • 8 kudos

Is there any way to propagate errors from dbutils?

I have a master notebook that runs a few different notebooks on a schedule using the dbutils.notebook.run() function. Occasionally, these child notebooks will fail (due to API connections or whatever). My issue is, when I attempt to catch the errors ...

  • 5668 Views
  • 7 replies
  • 8 kudos
Latest Reply
wdphilli
New Contributor III
  • 8 kudos

I have the same issue. I see no reason that Databricks couldn't propagate the internal exception back through their WorkflowException

  • 8 kudos
6 More Replies
parulpaul
by New Contributor III
  • 3647 Views
  • 1 replies
  • 2 kudos

AnalysisException: Multiple sources found for bigquery (com.google.cloud.spark.bigquery.BigQueryRelationProvider, com.google.cloud.spark.bigquery.v2.BigQueryTableProvider), please specify the fully qualified class name.

While reading data from BigQuery to Databricks getting the error : AnalysisException: Multiple sources found for bigquery (com.google.cloud.spark.bigquery.BigQueryRelationProvider, com.google.cloud.spark.bigquery.v2.BigQueryTableProvider), please spe...

  • 3647 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi @Parul Paul​ , could you please check if this is the scenario: https://stackoverflow.com/questions/68623803/load-to-bigquery-via-spark-job-fails-with-an-exception-for-multiple-sources-foun Also, you can refer: https://github.com/GoogleCloudDatapro...

  • 2 kudos
740209
by New Contributor II
  • 1979 Views
  • 4 replies
  • 1 kudos

Bug in db.fs.utils

When using db.fs.utils on a s3 bucket titled "${sometext}.${sometext}.${somenumber}${sometext}-${sometext}-${sometext}" we receive an error. PLEASE understand this is an issue with how it encodes the .${somenumber} because we verified with boto3 that...

  • 1979 Views
  • 4 replies
  • 1 kudos
Latest Reply
740209
New Contributor II
  • 1 kudos

@Debayan Mukherjee​ All the information is there please read accurately. I am not going to give you the actual bucket name I am using on a public forum. As i said above here is the command:dbutils.fs.ls("s3a://${bucket_name_here_follow_above_format}"...

  • 1 kudos
3 More Replies
ramankr48
by Contributor II
  • 9234 Views
  • 3 replies
  • 6 kudos
  • 9234 Views
  • 3 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Raman Gupta​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 6 kudos
2 More Replies
parulpaul
by New Contributor III
  • 2361 Views
  • 2 replies
  • 7 kudos
  • 2361 Views
  • 2 replies
  • 7 kudos
Latest Reply
parulpaul
New Contributor III
  • 7 kudos

No solution found

  • 7 kudos
1 More Replies
gud4eve
by New Contributor III
  • 4247 Views
  • 5 replies
  • 5 kudos

Resolved! Why is Databricks on AWS cluster start time less than 5 mins and EMR cluster start time is 15 mins?

We are migrating from AWS EMR to Databricks. One thing that we have noticed during the POCs is that Databricks cluster of same size and instance type takes much lesser time to start compared to EMR.My understanding is Databricks also would be request...

  • 4247 Views
  • 5 replies
  • 5 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 5 kudos

@gud4eve​ what kind of cluster you are using, have you configured pools. if not as @Werner Stinckens​ said there might be chance Databricks worked hard to get provisioning of instances in faster way

  • 5 kudos
4 More Replies
Raagavi
by New Contributor
  • 2201 Views
  • 1 replies
  • 1 kudos

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks?

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks? 

  • 2201 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 1 kudos

Hi @Raagavi Rajagopal​ , you can access files on mounted object storage (just an example) or files, please refer: https://docs.databricks.com/files/index.html#access-files-on-mounted-object-storageAnd in the DBFS , CSV files can be read and write fr...

  • 1 kudos
mattjones
by New Contributor II
  • 715 Views
  • 0 replies
  • 1 kudos

Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streamin...

Hi all - Matt Jones here, I’m on the Data Streaming team at Databricks and wanted to share a few takeaways from last week’s Current 2022 data streaming event (formerly Kafka Summit) in Austin.By far the most common question we got at the booth was ho...

Current 2022 Banner Image
  • 715 Views
  • 0 replies
  • 1 kudos
Ross
by New Contributor II
  • 1607 Views
  • 1 replies
  • 0 kudos

Failed R install package of survminer in Databricks 10.4 LTS

I am trying to install the survminer package but I get a non-zero exit status. It may be due to the jpeg package which is a pre-requisite but this also fails when installing independently.install.packages("survminer", repos = "https://cran.microsoft....

  • 1607 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@Ross Hamilton​ - Please follow the below steps in the given orderRun the below init script in an isolated notebook and add the init script to the issue cluster > Advanced options > Init Scripts%python dbutils.fs.put("/tmp/test/init_script.sh",""" #...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels