cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dunken
by New Contributor III
  • 5624 Views
  • 7 replies
  • 3 kudos

Resolved! Databricks and CD4ML

I would like to use Databricks in a CD4ML way (see also https://martinfowler.com/articles/cd4ml.html). Is this possible? I would like to develop and train models in one environment once qualified, I would like to deploy the model with the application...

  • 5624 Views
  • 7 replies
  • 3 kudos
Latest Reply
Atanu
Databricks Employee
  • 3 kudos

something below you are looking for @Armin Galliker​ ?

  • 3 kudos
6 More Replies
brickster_2018
by Databricks Employee
  • 6187 Views
  • 2 replies
  • 2 kudos
  • 6187 Views
  • 2 replies
  • 2 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 2 kudos

Since Workflows (Multi-Task jobs) is now GA, one way to work around the 1000 concurrent jobs limit is to use tasks within a job. Each job can have 100 tasks, and these tasks do not count toward the concurrent job limit.

  • 2 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 3865 Views
  • 4 replies
  • 5 kudos

Resolved! Show Vacuum operation result (files deleted) without DRY RUN

Hi, I'm runing some scheduled vacuum jobs and would like to know how many files were deleted without making all the computation twice, with and without DRY RUN, is there a way to accomplish this?Thanks!

  • 3865 Views
  • 4 replies
  • 5 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 5 kudos

We have to enable logging to capture the logs for vacuum.spark.conf.set("spark.databricks.delta.vacuum.logging.enabled","true")

  • 5 kudos
3 More Replies
Oliver_Floyd
by Contributor
  • 2974 Views
  • 2 replies
  • 3 kudos

Resolved! How to update external metastore cluster configuration on the fly ?

Hello,In my use case, my data is pushed to an adls gen2 container called ingestAfter some data processing on a databricks cluster of the ingest workspace, I declare the associated table in an external metastore for this workspaceAt the end of this pr...

  • 2974 Views
  • 2 replies
  • 3 kudos
Latest Reply
Oliver_Floyd
Contributor
  • 3 kudos

Hello @Atanu Sarkar​ ,Thank you for your answer. I have created a feature request. I hope, it will be soon accepted ^^

  • 3 kudos
1 More Replies
Mradula
by New Contributor
  • 1271 Views
  • 0 replies
  • 0 kudos

Displaying the queried data from mounted data from Azure Blob storage to databricks is slow

I have mounted by Azure blob storage json file to databricks which has around 18GB and trying to perform a simple count operation on it and I am noticing that it takes 14 mins for the same in the Community edition . seeking answers on whether this is...

14 min count
  • 1271 Views
  • 0 replies
  • 0 kudos
SM
by New Contributor III
  • 8458 Views
  • 8 replies
  • 3 kudos

Resolved! Delta Live Tables has duplicates created by multiple workers

Hello, I am working with Delta Live Tables, I am trying to create a DLT from a combination of Dataframes from a 'for loop' which are unioned and then DLT is created over the Unioned Dataframe. However I noticed that the delta table has duplciates. An...

  • 8458 Views
  • 8 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Shikha Mathew​ - Does your last answer mean that your issue is resolved? Would you be happy to mark whichever answer helped as best? Or, if it wasn't a specific one, would you tell us what worked?

  • 3 kudos
7 More Replies
Direo
by Contributor II
  • 2543 Views
  • 3 replies
  • 4 kudos

Resolved! Which cluster mode should I choose for most efficient graph modelling?

Is there a difference between cluster modes in this case? Can it be that Graphx would work better on single than on standart cluster or high concurrency cluster (for multiple users)? Does less concurrent cluster wourld be more efficient for graph mod...

  • 2543 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Direo Direo​ - What do you think of these answers? If either of them stands out as best, would you please mark it that way? If you have more questions, please, bring them on!

  • 4 kudos
2 More Replies
baatchus
by New Contributor III
  • 5195 Views
  • 4 replies
  • 0 kudos

Resolved! parameterize azure storage account name in spark cluster config databricks

wondering if this is to parameterize the azure storage account name part in the spark cluster config in Databricks?I have a working example where the values are referencing secret scopes:spark.hadoop.fs.azure.account.oauth2.client.id.<azurestorageacc...

  • 5195 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Fantastic! Thanks for letting us know!

  • 0 kudos
3 More Replies
tarente
by New Contributor III
  • 14130 Views
  • 8 replies
  • 4 kudos

Resolved! Pyspark - how to save the schema of a csv file in a delta table's column

How to save the schema of a csv file in a delta table's column?In a previous project implemented in Databricks using Scala notebooks, we stored the schema of csv files as a "json string" in a SQL Server table.When we needed to read or write the csv a...

  • 14130 Views
  • 8 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@tarente - Thanks for letting us know.

  • 4 kudos
7 More Replies
TufanRakshit
by New Contributor
  • 5070 Views
  • 1 replies
  • 1 kudos
  • 5070 Views
  • 1 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

@Tufan Rakshit​  : Could you please provide a sample example / pseudocode for above requirement that you are trying to implement.

  • 1 kudos
Jeff1
by Contributor II
  • 3248 Views
  • 3 replies
  • 5 kudos

Resolved! How to convert Data Chr Strings to Date Strings

Databricks CommunityNew to Databricks and work in R code. I have a data from with a date field that is a chr string and need to convert to a date field. Tried the standard as.Date(x, format = "%Y-%m-%d") , then tried the dplyr::mutate function and th...

  • 3248 Views
  • 3 replies
  • 5 kudos
Latest Reply
Jeff1
Contributor II
  • 5 kudos

Based upon the initial response I went with:my_data.frame <- my_data.frame %>% mutate(date = to_date(data.frame_variable, "yyyy-mm-dd"))

  • 5 kudos
2 More Replies
shrikant_kulkar
by New Contributor III
  • 1409 Views
  • 0 replies
  • 0 kudos

autoloader schema inference date column

I have hire_date and term_dates in the "MM/dd/YYYY" format in underneath csv files. Schema hint "cloudFiles.schemaHints" : "Hire_Date Date,Term_Date Date" - push data into _rescued_data column due to conversion failure. I am looking out solution to c...

  • 1409 Views
  • 0 replies
  • 0 kudos
Rex
by New Contributor III
  • 23357 Views
  • 7 replies
  • 3 kudos

Resolved! Cannot connect to Databricks SQL Endpoint using PHP and ODBC

I am trying to connect to our Databricks SQL endpoint using PHP in a Docker container.I setup my Docker container to download and configure the ODBC driver as specified here: https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#install-and-c...

  • 23357 Views
  • 7 replies
  • 3 kudos
Latest Reply
Rex
New Contributor III
  • 3 kudos

The problem was that the Databricks SQL driver does not yet support ARM, which my laptop and Docker container was building for. See ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/opt/simba/spark/lib/64/libsparkodbc_sb64.so' : file not ...

  • 3 kudos
6 More Replies
Suman
by New Contributor III
  • 4623 Views
  • 4 replies
  • 2 kudos

Change Data Feed functionality from SQL Endpoint

I am trying to run command to retrieve change data from sql endpoint. It is throwing below error."The input query contains unsupported data source(s).Only csv, json, avro, delta, parquet, orc, text data sources are supported on Databricks SQL."But th...

  • 4623 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

It is separate runtime https://docs.microsoft.com/en-us/azure/databricks/sql/release-notes/#channels

  • 2 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels