cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

KKo
by Contributor III
  • 7703 Views
  • 5 replies
  • 0 kudos

Move whole workflow from Dev to Prod

I have a workflow created in Dev, now I want to move the whole thing to prod and schedule it. The workflow has multiple notebooks, dependent libraries, parameters and such. How to move the whole thing to prod, instead of moving each notebooks and rec...

  • 7703 Views
  • 5 replies
  • 0 kudos
Latest Reply
mkassa
New Contributor II
  • 0 kudos

I ended up creating a python script to just do the export, here is the code below.It will match on Job name, if it matches it will update otherwise it will import. import requests source_token = '' source_instance = 'adb-000000000000000.00.azuredata...

  • 0 kudos
4 More Replies
WarmCat
by New Contributor II
  • 1096 Views
  • 2 replies
  • 0 kudos

Read data shared using Delta Sharing open sharing (for recipients) FileNotFoundError

I'm using the docs here: https://docs.databricks.com/en/data-sharing/read-data-open.html#store-credsHowever I am unable to read the stored file which is sucessfully created with the following code:%scala dbutils.fs.put("dbfs:/FileStore/extraction/con...

Data Engineering
DELTA SHARING
  • 1096 Views
  • 2 replies
  • 0 kudos
Latest Reply
WarmCat
New Contributor II
  • 0 kudos

Thanks Daniel, but that does not work either.The only thing that allowed progress was to use:client = delta_sharing.SharingClient(f"file:///tmp/config.share")I gave up and installed Apache Spark locally in a venv for now, and will be using AWS going ...

  • 0 kudos
1 More Replies
bampo
by New Contributor II
  • 1905 Views
  • 4 replies
  • 0 kudos

Streaming Reads Full Table with Liquid Clustering

Each merge/update on a table with liquid clustering force the streaming to read whole table.Databricks Runtime: 14.3 LTSBelow I prepare a simple scripts to reproduce the issue:Create schema. %sql CREATE SCHEMA IF NOT EXISTS test; Create table with si...

  • 1905 Views
  • 4 replies
  • 0 kudos
Latest Reply
radothede
Contributor II
  • 0 kudos

It seems You are not using checkpoint location, is that intended?https://docs.databricks.com/en/structured-streaming/query-recovery.htmlThat might be the reason your streaming query is reading the whole table every time You trigger the process.

  • 0 kudos
3 More Replies
skarpeck
by New Contributor III
  • 1897 Views
  • 5 replies
  • 0 kudos

Spark structured streaming - not working with checkpoint location set

We have structured streaming that reads from external delta table defined in following way: try: df_silver = ( spark.readStream .format("delta") .option("skipChangeCommits", True) .table(src_location) ...

  • 1897 Views
  • 5 replies
  • 0 kudos
Latest Reply
radothede
Contributor II
  • 0 kudos

Whats the logic of merge function? merge_silver_to_gold Whats the output of describe history against that destination delta table after running the streaming query?

  • 0 kudos
4 More Replies
Anonymous
by Not applicable
  • 6430 Views
  • 7 replies
  • 24 kudos

Resolved! Couldn't create new catalog?

I used DBR version 11.0

  • 6430 Views
  • 7 replies
  • 24 kudos
Latest Reply
seanzy
New Contributor II
  • 24 kudos

In the 2.9 Comprehensive lab of getting started with data engineering on data bricks, I try to run:%run ../Includes/Classroom-Setup-11and get the following errorThe execution of this command did not finish successfully    Resetting the learning envir...

  • 24 kudos
6 More Replies
Erik_L
by Contributor II
  • 656 Views
  • 0 replies
  • 0 kudos

How to force delta live tables legacy execution mode?

We've been running delta live tables for some time with unity catalog and it's as slow as a sloth on a Hawaiian vacation.Anyway, DLT had three consecutive failures (due to the data source being unreliable) and then the logs printed: "MaxRetryThreshol...

  • 656 Views
  • 0 replies
  • 0 kudos
StephenDsouza
by New Contributor II
  • 1544 Views
  • 1 replies
  • 0 kudos

Error during build process for serving model caused by detectron2

Hi All,Introduction: I am trying to register my model on Databricks so that I can serve it as an endpoint. The packages that I need are "torch", "mlflow", "torchvision", "numpy" and "git+https://github.com/facebookresearch/detectron2.git". For this, ...

  • 1544 Views
  • 1 replies
  • 0 kudos
Latest Reply
StephenDsouza
New Contributor II
  • 0 kudos

Found an answer!Basically pip was somehow installed the dependencies from the git repo first and was not following the given order so in order to solve this, I added the libraries for conda to install.``` conda_env = { "channels": [ "defa...

  • 0 kudos
Chris_Konsur
by New Contributor III
  • 17541 Views
  • 4 replies
  • 6 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 17541 Views
  • 4 replies
  • 6 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 6 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 6 kudos
3 More Replies
mh_db
by New Contributor III
  • 2098 Views
  • 1 replies
  • 1 kudos

How to get different dynamic value for each task in workflow

I created a workflow with two tasks. It runs the first notebook and then it wait for that to finish to start the second notebook. I want to use this dynamic value as one of the parameters {{job.start_time.iso_datetime}} for both tasks. This should gi...

  • 2098 Views
  • 1 replies
  • 1 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 1 kudos

Hello @mh_db , The dynamic value {{job.start_time.iso_datetime}} you are using in your workflow is designed to capture the start time of the job run, not the individual tasks within the job. This is why you are seeing the same date and time for both ...

  • 1 kudos
WWoman
by New Contributor III
  • 590 Views
  • 1 replies
  • 1 kudos

Identifying invalid views

Is there a way to identify all  invalid views  in a schema or catalog without querying the view to see if it succeeds?

  • 590 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 1 kudos

Hello @WWoman, I don't think there's a feature for that. If you think this would be a cool feature you could submit an idea in Databricks' Ideas Portal.

  • 1 kudos
NhanNguyen
by Contributor II
  • 1246 Views
  • 3 replies
  • 0 kudos

Resolved! Disk cache for csv file in Databricks

Dear team,I'm investigate to improve performance when reading large csv file as input and find this https://learn.microsoft.com/en-us/azure/databricks/optimizations/disk-cache.I just wonder Do disk-cache also apply for csv file?Thanks!

  • 1246 Views
  • 3 replies
  • 0 kudos
Latest Reply
NhanNguyen
Contributor II
  • 0 kudos

Thanks @-werners-,That's right, I tried and get some significantly performance.

  • 0 kudos
2 More Replies
saichandu_25
by New Contributor III
  • 2511 Views
  • 9 replies
  • 0 kudos

Not able to read the file content completely using head

Hi,We want to read the file content of the file and encode the content into base64. For that we have used below code file_path = "/path/to/your/file.csv"file_content = dbutils.fs.head(file_path, 512000000)encode_content = base64.b64encode(file_conten...

  • 2511 Views
  • 9 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I am curious what the use case if for wanting to load large files into github, which is a code repo.Depending on the file format different parsing is necessary.  you could foresee logic for that in your program.

  • 0 kudos
8 More Replies
DataEngineer
by New Contributor II
  • 1057 Views
  • 2 replies
  • 0 kudos

AWS Email sending challenge from Databricks with UNITY CATALOG and Multinode cluster

Hi,I have implemented the UNITY CATALOG with multinode cluster in databricks. The workspace instance profile with EC2 access is also created in IAM. but still having a challenge in sending emails from databricks using SES service.The same is working ...

  • 1057 Views
  • 2 replies
  • 0 kudos
Latest Reply
Babu_Krishnan
Contributor
  • 0 kudos

Hi @DataEngineer ,Are you able to resolve the issue. We are having the same issue when we try to use MultiNode cluster for UnityCatalog. Email functionality was working fine with Single node cluster.We are getting "ConnectionRefusedError: [Errno 111]...

  • 0 kudos
1 More Replies
NaeemS
by New Contributor III
  • 837 Views
  • 1 replies
  • 0 kudos

Handling Aggregations in Feature Function

Hi,Is it possible to cater aggregation using Feature Functions somehow. As we know that the logic defined in feature function is applied on a single row when a join is being performed. But do we have any mechanism to handle to aggregations too someho...

Data Engineering
Feature Functions
Feature Store
  • 837 Views
  • 1 replies
  • 0 kudos
Latest Reply
NaeemS
New Contributor III
  • 0 kudos

Hi @Retired_mod ,Thanks for your reply. I'm familiar with both of these. But I was wondering if can include that part while logging our pipeline using feature stores to handle the grouping and filtering as well.

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels