cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

KKo
by Contributor III
  • 15107 Views
  • 5 replies
  • 1 kudos

Move whole workflow from Dev to Prod

I have a workflow created in Dev, now I want to move the whole thing to prod and schedule it. The workflow has multiple notebooks, dependent libraries, parameters and such. How to move the whole thing to prod, instead of moving each notebooks and rec...

  • 15107 Views
  • 5 replies
  • 1 kudos
Latest Reply
mkassa
New Contributor II
  • 1 kudos

I ended up creating a python script to just do the export, here is the code below.It will match on Job name, if it matches it will update otherwise it will import. import requests source_token = '' source_instance = 'adb-000000000000000.00.azuredata...

  • 1 kudos
4 More Replies
WarmCat
by New Contributor II
  • 3233 Views
  • 2 replies
  • 0 kudos

Read data shared using Delta Sharing open sharing (for recipients) FileNotFoundError

I'm using the docs here: https://docs.databricks.com/en/data-sharing/read-data-open.html#store-credsHowever I am unable to read the stored file which is sucessfully created with the following code:%scala dbutils.fs.put("dbfs:/FileStore/extraction/con...

Data Engineering
DELTA SHARING
  • 3233 Views
  • 2 replies
  • 0 kudos
Latest Reply
WarmCat
New Contributor II
  • 0 kudos

Thanks Daniel, but that does not work either.The only thing that allowed progress was to use:client = delta_sharing.SharingClient(f"file:///tmp/config.share")I gave up and installed Apache Spark locally in a venv for now, and will be using AWS going ...

  • 0 kudos
1 More Replies
bampo
by New Contributor II
  • 5646 Views
  • 4 replies
  • 0 kudos

Streaming Reads Full Table with Liquid Clustering

Each merge/update on a table with liquid clustering force the streaming to read whole table.Databricks Runtime: 14.3 LTSBelow I prepare a simple scripts to reproduce the issue:Create schema. %sql CREATE SCHEMA IF NOT EXISTS test; Create table with si...

  • 5646 Views
  • 4 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

It seems You are not using checkpoint location, is that intended?https://docs.databricks.com/en/structured-streaming/query-recovery.htmlThat might be the reason your streaming query is reading the whole table every time You trigger the process.

  • 0 kudos
3 More Replies
skarpeck
by New Contributor III
  • 4541 Views
  • 5 replies
  • 0 kudos

Spark structured streaming - not working with checkpoint location set

We have structured streaming that reads from external delta table defined in following way: try: df_silver = ( spark.readStream .format("delta") .option("skipChangeCommits", True) .table(src_location) ...

  • 4541 Views
  • 5 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Whats the logic of merge function? merge_silver_to_gold Whats the output of describe history against that destination delta table after running the streaming query?

  • 0 kudos
4 More Replies
Anonymous
by Not applicable
  • 9602 Views
  • 7 replies
  • 24 kudos

Resolved! Couldn't create new catalog?

I used DBR version 11.0

  • 9602 Views
  • 7 replies
  • 24 kudos
Latest Reply
seanzy
New Contributor II
  • 24 kudos

In the 2.9 Comprehensive lab of getting started with data engineering on data bricks, I try to run:%run ../Includes/Classroom-Setup-11and get the following errorThe execution of this command did not finish successfully    Resetting the learning envir...

  • 24 kudos
6 More Replies
Erik_L
by Contributor II
  • 1181 Views
  • 0 replies
  • 0 kudos

How to force delta live tables legacy execution mode?

We've been running delta live tables for some time with unity catalog and it's as slow as a sloth on a Hawaiian vacation.Anyway, DLT had three consecutive failures (due to the data source being unreliable) and then the logs printed: "MaxRetryThreshol...

  • 1181 Views
  • 0 replies
  • 0 kudos
Chris_Konsur
by New Contributor III
  • 25172 Views
  • 4 replies
  • 7 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 25172 Views
  • 4 replies
  • 7 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 7 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 7 kudos
3 More Replies
mh_db
by New Contributor III
  • 3698 Views
  • 1 replies
  • 1 kudos

How to get different dynamic value for each task in workflow

I created a workflow with two tasks. It runs the first notebook and then it wait for that to finish to start the second notebook. I want to use this dynamic value as one of the parameters {{job.start_time.iso_datetime}} for both tasks. This should gi...

  • 3698 Views
  • 1 replies
  • 1 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 1 kudos

Hello @mh_db , The dynamic value {{job.start_time.iso_datetime}} you are using in your workflow is designed to capture the start time of the job run, not the individual tasks within the job. This is why you are seeing the same date and time for both ...

  • 1 kudos
WWoman
by Contributor
  • 1242 Views
  • 1 replies
  • 1 kudos

Identifying invalid views

Is there a way to identify all  invalid views  in a schema or catalog without querying the view to see if it succeeds?

  • 1242 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @WWoman, I don't think there's a feature for that. If you think this would be a cool feature you could submit an idea in Databricks' Ideas Portal.

  • 1 kudos
NhanNguyen
by Contributor III
  • 2600 Views
  • 3 replies
  • 0 kudos

Resolved! Disk cache for csv file in Databricks

Dear team,I'm investigate to improve performance when reading large csv file as input and find this https://learn.microsoft.com/en-us/azure/databricks/optimizations/disk-cache.I just wonder Do disk-cache also apply for csv file?Thanks!

  • 2600 Views
  • 3 replies
  • 0 kudos
Latest Reply
NhanNguyen
Contributor III
  • 0 kudos

Thanks @-werners-,That's right, I tried and get some significantly performance.

  • 0 kudos
2 More Replies
saichandu_25
by New Contributor III
  • 5289 Views
  • 9 replies
  • 0 kudos

Not able to read the file content completely using head

Hi,We want to read the file content of the file and encode the content into base64. For that we have used below code file_path = "/path/to/your/file.csv"file_content = dbutils.fs.head(file_path, 512000000)encode_content = base64.b64encode(file_conten...

  • 5289 Views
  • 9 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I am curious what the use case if for wanting to load large files into github, which is a code repo.Depending on the file format different parsing is necessary.  you could foresee logic for that in your program.

  • 0 kudos
8 More Replies
DataEngineer
by New Contributor II
  • 2318 Views
  • 2 replies
  • 0 kudos

AWS Email sending challenge from Databricks with UNITY CATALOG and Multinode cluster

Hi,I have implemented the UNITY CATALOG with multinode cluster in databricks. The workspace instance profile with EC2 access is also created in IAM. but still having a challenge in sending emails from databricks using SES service.The same is working ...

  • 2318 Views
  • 2 replies
  • 0 kudos
Latest Reply
Babu_Krishnan
Contributor
  • 0 kudos

Hi @DataEngineer ,Are you able to resolve the issue. We are having the same issue when we try to use MultiNode cluster for UnityCatalog. Email functionality was working fine with Single node cluster.We are getting "ConnectionRefusedError: [Errno 111]...

  • 0 kudos
1 More Replies
Harispap
by New Contributor
  • 1354 Views
  • 0 replies
  • 0 kudos

Different result between manual and automated task run

I have a notebook where I bring info about a previous task run metadata from the API ".... /jobs/runs/get". The response should be a dictionary that contains information such as task key, run if, run page URL etc.  When I run the notebook as part of ...

  • 1354 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels