cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cosminsanda
by New Contributor III
  • 5508 Views
  • 5 replies
  • 3 kudos

Resolved! Unit Testing with the new Databricks Connect in Python

I would like to create a regular PySpark session in an isolated environment against which I can run my Spark based tests. I don't see how that's possible with the new Databricks Connect. I'm going in circles here, is it even possible?I don't want to ...

  • 5508 Views
  • 5 replies
  • 3 kudos
Latest Reply
thibault
Contributor III
  • 3 kudos

Given this doesn't work on serverless compute, aren't those tests very slow to complete due to the compute startup time? I'm trying to steer away from databricks connect for unit testing for this reason. If they supported serverless, that would be a ...

  • 3 kudos
4 More Replies
opl12
by New Contributor II
  • 4034 Views
  • 1 replies
  • 1 kudos

SQL Sub Query Not Working

Olá pessoal, espero que todos estejam bem!Por favor, você pode ajudar? ou orientação?Está retornando um erro na instrução "CASE WHEN"A lógica é a seguinte: Se o campo `valor` FOR NULO ENTÃO eu executo uma Sub Consulta usando os filtros: origem, desti...

  • 4034 Views
  • 1 replies
  • 1 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 1 kudos

Olá @opl12 , tudo bem?Se possível, poderia enviar a mensagem de erro completa? Aparecerá uma lista separada por vírgulas de possíveis colunas logo após "Did you mean one of the following? [...]".Poderia testar sem os `` e me informar o resultado? Bes...

  • 1 kudos
leobocci
by New Contributor
  • 3291 Views
  • 1 replies
  • 0 kudos

Parallel read of many delta tables

I need to read many delta tables in azure object storage (block blobs). There is no root object delta table, but rather many fragmented delta tables that share a common schema but not common paths.Iterating over the paths with a for loop is performin...

  • 3291 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 0 kudos

Hello @leobocci , In order to read multiple Delta tables, multiple read operations are required. You can trigger the read operations simultaneously through the Job Workflows, DLT, Databricks CLI, DBSQL, Interactive Clusters and other resources.If the...

  • 0 kudos
Akuhei05
by Databricks Partner
  • 4470 Views
  • 3 replies
  • 1 kudos

How to Programmatically Retrieve Cluster Memory Usage?

Hi!I need help with the following:Programmatically retrieve the maximum memory configured for the cluster attached to the notebook/job - I think this is achievable through the system tables or Clusters API, but I'm open to other suggestionsExecute a ...

  • 4470 Views
  • 3 replies
  • 1 kudos
Latest Reply
anardinelli
Databricks Employee
  • 1 kudos

Great use case! Have you ever heard about Prometheus with Spark 3.0? Its a tool that can export live metrics for your jobs and runs which writes to a sink where you can read with a stream. I've personally never used in such use case, but there you ca...

  • 1 kudos
2 More Replies
thiagoawstest
by Contributor
  • 3522 Views
  • 1 replies
  • 0 kudos

Resolved! databricks cli create job

Hi, using the Databricks cli, I exported the jobs in json format from the workspace in Azure, using the same json to create a new job, but in a workspace in AWS, the error below occurs.To create a job via Databricks cli on AWS, do you need to change ...

Data Engineering
AWS
jobs
migration
  • 3522 Views
  • 1 replies
  • 0 kudos
Latest Reply
thiagoawstest
Contributor
  • 0 kudos

Hi, I already found the error, you need to use @ in the path.Thanks.

  • 0 kudos
KKo
by Contributor III
  • 16111 Views
  • 5 replies
  • 1 kudos

Move whole workflow from Dev to Prod

I have a workflow created in Dev, now I want to move the whole thing to prod and schedule it. The workflow has multiple notebooks, dependent libraries, parameters and such. How to move the whole thing to prod, instead of moving each notebooks and rec...

  • 16111 Views
  • 5 replies
  • 1 kudos
Latest Reply
mkassa
New Contributor II
  • 1 kudos

I ended up creating a python script to just do the export, here is the code below.It will match on Job name, if it matches it will update otherwise it will import. import requests source_token = '' source_instance = 'adb-000000000000000.00.azuredata...

  • 1 kudos
4 More Replies
WarmCat
by New Contributor II
  • 3775 Views
  • 2 replies
  • 0 kudos

Read data shared using Delta Sharing open sharing (for recipients) FileNotFoundError

I'm using the docs here: https://docs.databricks.com/en/data-sharing/read-data-open.html#store-credsHowever I am unable to read the stored file which is sucessfully created with the following code:%scala dbutils.fs.put("dbfs:/FileStore/extraction/con...

Data Engineering
DELTA SHARING
  • 3775 Views
  • 2 replies
  • 0 kudos
Latest Reply
WarmCat
New Contributor II
  • 0 kudos

Thanks Daniel, but that does not work either.The only thing that allowed progress was to use:client = delta_sharing.SharingClient(f"file:///tmp/config.share")I gave up and installed Apache Spark locally in a venv for now, and will be using AWS going ...

  • 0 kudos
1 More Replies
bampo
by New Contributor II
  • 6124 Views
  • 4 replies
  • 0 kudos

Streaming Reads Full Table with Liquid Clustering

Each merge/update on a table with liquid clustering force the streaming to read whole table.Databricks Runtime: 14.3 LTSBelow I prepare a simple scripts to reproduce the issue:Create schema. %sql CREATE SCHEMA IF NOT EXISTS test; Create table with si...

  • 6124 Views
  • 4 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

It seems You are not using checkpoint location, is that intended?https://docs.databricks.com/en/structured-streaming/query-recovery.htmlThat might be the reason your streaming query is reading the whole table every time You trigger the process.

  • 0 kudos
3 More Replies
skarpeck
by Databricks Partner
  • 5697 Views
  • 5 replies
  • 0 kudos

Spark structured streaming - not working with checkpoint location set

We have structured streaming that reads from external delta table defined in following way: try: df_silver = ( spark.readStream .format("delta") .option("skipChangeCommits", True) .table(src_location) ...

  • 5697 Views
  • 5 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Whats the logic of merge function? merge_silver_to_gold Whats the output of describe history against that destination delta table after running the streaming query?

  • 0 kudos
4 More Replies
Anonymous
by Not applicable
  • 10319 Views
  • 7 replies
  • 24 kudos

Resolved! Couldn't create new catalog?

I used DBR version 11.0

  • 10319 Views
  • 7 replies
  • 24 kudos
Latest Reply
seanzy
New Contributor II
  • 24 kudos

In the 2.9 Comprehensive lab of getting started with data engineering on data bricks, I try to run:%run ../Includes/Classroom-Setup-11and get the following errorThe execution of this command did not finish successfully    Resetting the learning envir...

  • 24 kudos
6 More Replies
Erik_L
by Contributor II
  • 1299 Views
  • 0 replies
  • 0 kudos

How to force delta live tables legacy execution mode?

We've been running delta live tables for some time with unity catalog and it's as slow as a sloth on a Hawaiian vacation.Anyway, DLT had three consecutive failures (due to the data source being unreliable) and then the logs printed: "MaxRetryThreshol...

  • 1299 Views
  • 0 replies
  • 0 kudos
Chris_Konsur
by New Contributor III
  • 26116 Views
  • 4 replies
  • 7 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 26116 Views
  • 4 replies
  • 7 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 7 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 7 kudos
3 More Replies
mh_db
by Databricks Partner
  • 4410 Views
  • 1 replies
  • 1 kudos

How to get different dynamic value for each task in workflow

I created a workflow with two tasks. It runs the first notebook and then it wait for that to finish to start the second notebook. I want to use this dynamic value as one of the parameters {{job.start_time.iso_datetime}} for both tasks. This should gi...

  • 4410 Views
  • 1 replies
  • 1 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 1 kudos

Hello @mh_db , The dynamic value {{job.start_time.iso_datetime}} you are using in your workflow is designed to capture the start time of the job run, not the individual tasks within the job. This is why you are seeing the same date and time for both ...

  • 1 kudos
WWoman
by Databricks Partner
  • 1401 Views
  • 1 replies
  • 1 kudos

Identifying invalid views

Is there a way to identify all  invalid views  in a schema or catalog without querying the view to see if it succeeds?

  • 1401 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @WWoman, I don't think there's a feature for that. If you think this would be a cool feature you could submit an idea in Databricks' Ideas Portal.

  • 1 kudos
Labels