cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Tahseen0354
by Valued Contributor
  • 5665 Views
  • 2 replies
  • 4 kudos

Resolved! How do I track databricks cluster users ?

Hi, is there a way to find out/monitor which users has used my cluster, how long and how many times in an azure databricks workspace ?

  • 5665 Views
  • 2 replies
  • 4 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 4 kudos

Hello, You can activate Audit logs ( More specifically Cluster logs) https://learn.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/azure-diagnostic-logs It can be very helpful to track all the metrics.

  • 4 kudos
1 More Replies
ramankr48
by Contributor II
  • 42757 Views
  • 6 replies
  • 11 kudos

Resolved! how to find the size of a table in python or sql?

let's suppose there is a database db, inside that so many tables are there and , i want to get the size of tables . how to get in either sql, python, pyspark.even if i have to get one by one it's fine.

  • 42757 Views
  • 6 replies
  • 11 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 11 kudos

@Raman Gupta​ - could you please try the below %python spark.sql("describe detail delta-table-name").select("sizeInBytes").collect()

  • 11 kudos
5 More Replies
User16835756816
by Valued Contributor
  • 8694 Views
  • 1 replies
  • 6 kudos

How can I simplify my data ingestion by processing the data as it arrives in cloud storage?

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables. Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delt...

  • 8694 Views
  • 1 replies
  • 6 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 6 kudos

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delta...

  • 6 kudos
ricperelli
by New Contributor II
  • 2630 Views
  • 0 replies
  • 1 kudos

How can i save a parquet file using pandas with a data factory orchestrated notebook?

Hi guys,this is my first question, feel free to correct me if i'm doing something wrong.Anyway, i'm facing a really strange problem, i have a notebook in which i'm performing some pandas analysis, after that i save the resulting dataframe in a parque...

  • 2630 Views
  • 0 replies
  • 1 kudos
venkad
by Contributor
  • 1682 Views
  • 0 replies
  • 4 kudos

Default location for Schema/Database in Unity

Hello Bricksters,We organize the delta lake in multiple storage accounts. One storage account per data domain and one container per database. This helps us to isolate the resources and cost on the business domain level.Earlier, when a schema/database...

  • 1682 Views
  • 0 replies
  • 4 kudos
vizoso
by New Contributor III
  • 1779 Views
  • 1 replies
  • 3 kudos

Cluster list in Microsoft.Azure.Databricks.Client fails because ClusterSource enum does not include MODELS. When you have a model serving cluster, Clu...

Cluster list in Microsoft.Azure.Databricks.Client fails because ClusterSource enum does not include MODELS.When you have a model serving cluster, ClustersApiClient.List method fails to deserialize the API response because that cluster has MODELS as C...

  • 1779 Views
  • 1 replies
  • 3 kudos
saurabh12521
by New Contributor II
  • 3960 Views
  • 3 replies
  • 4 kudos

Unity through terraform

I am working on automation of Unity through terraform. I have referred below link link to get started :https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/unity-catalog-azureI am facing issue when I create metastore using...

image
  • 3960 Views
  • 3 replies
  • 4 kudos
Latest Reply
Pat
Esteemed Contributor
  • 4 kudos

Not sure if you got this working, but I noticed you are using provider: `databrickslabs/databricks`, hence why this is not avaialable. You should be using new provider: `databricks/databricks`: https://registry.terraform.io/providers/databricks/datab...

  • 4 kudos
2 More Replies
DataBricks_2022
by New Contributor III
  • 1633 Views
  • 1 replies
  • 1 kudos

Resolved! How to get started with Auto Loader using partner academy portal? Are there any videos and step by step material

Need Video and step by step documentation on Auto Loader as well as how to build end-to-end data pipeline

  • 1633 Views
  • 1 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@raja iqbal​ below course will provide overview related Autoloader Course name: How to Use Databricks' Auto Loader for Incremental ETL with the Databricks Data Science and Data Engineering WorkspaceIf you register for Data Engineer Catalog, then you ...

  • 1 kudos
cvantassel
by New Contributor III
  • 13608 Views
  • 7 replies
  • 8 kudos

Is there any way to propagate errors from dbutils?

I have a master notebook that runs a few different notebooks on a schedule using the dbutils.notebook.run() function. Occasionally, these child notebooks will fail (due to API connections or whatever). My issue is, when I attempt to catch the errors ...

  • 13608 Views
  • 7 replies
  • 8 kudos
Latest Reply
wdphilli
New Contributor III
  • 8 kudos

I have the same issue. I see no reason that Databricks couldn't propagate the internal exception back through their WorkflowException

  • 8 kudos
6 More Replies
parulpaul
by New Contributor III
  • 4837 Views
  • 1 replies
  • 2 kudos

AnalysisException: Multiple sources found for bigquery (com.google.cloud.spark.bigquery.BigQueryRelationProvider, com.google.cloud.spark.bigquery.v2.BigQueryTableProvider), please specify the fully qualified class name.

While reading data from BigQuery to Databricks getting the error : AnalysisException: Multiple sources found for bigquery (com.google.cloud.spark.bigquery.BigQueryRelationProvider, com.google.cloud.spark.bigquery.v2.BigQueryTableProvider), please spe...

  • 4837 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi @Parul Paul​ , could you please check if this is the scenario: https://stackoverflow.com/questions/68623803/load-to-bigquery-via-spark-job-fails-with-an-exception-for-multiple-sources-foun Also, you can refer: https://github.com/GoogleCloudDatapro...

  • 2 kudos
740209
by New Contributor II
  • 2674 Views
  • 4 replies
  • 1 kudos

Bug in db.fs.utils

When using db.fs.utils on a s3 bucket titled "${sometext}.${sometext}.${somenumber}${sometext}-${sometext}-${sometext}" we receive an error. PLEASE understand this is an issue with how it encodes the .${somenumber} because we verified with boto3 that...

  • 2674 Views
  • 4 replies
  • 1 kudos
Latest Reply
740209
New Contributor II
  • 1 kudos

@Debayan Mukherjee​ All the information is there please read accurately. I am not going to give you the actual bucket name I am using on a public forum. As i said above here is the command:dbutils.fs.ls("s3a://${bucket_name_here_follow_above_format}"...

  • 1 kudos
3 More Replies
ramankr48
by Contributor II
  • 12218 Views
  • 3 replies
  • 6 kudos
  • 12218 Views
  • 3 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Raman Gupta​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 6 kudos
2 More Replies
parulpaul
by New Contributor III
  • 3679 Views
  • 2 replies
  • 7 kudos
  • 3679 Views
  • 2 replies
  • 7 kudos
Latest Reply
parulpaul
New Contributor III
  • 7 kudos

No solution found

  • 7 kudos
1 More Replies
gud4eve
by New Contributor III
  • 6783 Views
  • 5 replies
  • 5 kudos

Resolved! Why is Databricks on AWS cluster start time less than 5 mins and EMR cluster start time is 15 mins?

We are migrating from AWS EMR to Databricks. One thing that we have noticed during the POCs is that Databricks cluster of same size and instance type takes much lesser time to start compared to EMR.My understanding is Databricks also would be request...

  • 6783 Views
  • 5 replies
  • 5 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 5 kudos

@gud4eve​ what kind of cluster you are using, have you configured pools. if not as @Werner Stinckens​ said there might be chance Databricks worked hard to get provisioning of instances in faster way

  • 5 kudos
4 More Replies
Raagavi
by New Contributor
  • 3025 Views
  • 1 replies
  • 1 kudos

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks?

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks? 

  • 3025 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi @Raagavi Rajagopal​ , you can access files on mounted object storage (just an example) or files, please refer: https://docs.databricks.com/files/index.html#access-files-on-mounted-object-storageAnd in the DBFS , CSV files can be read and write fr...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels