cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Abdul1
by New Contributor
  • 526 Views
  • 1 replies
  • 0 kudos

How to output data from Databricks?

Hello,I am just starting with Databricks in Azure and I need to output the data to an Affinity CRM system.Affinity has an API and I am wondering is there any sort of automated / data pipeline sort of way to tell databricks to just pump the data into ...

  • 526 Views
  • 1 replies
  • 0 kudos
Latest Reply
Edthehead
New Contributor III
  • 0 kudos

We need more info on what kind of data, volume and what the called APi can handle. Calling an API for single records in parallel can be achieved using UDF(see THIS). You need to be careful to batch the records so that the target API can handle the pa...

  • 0 kudos
Edthehead
by New Contributor III
  • 912 Views
  • 2 replies
  • 0 kudos

Parameterized Delta live table pipeline

I'm trying to create an ETL framework on delta live tables and basically use the same pipeline for all the transformation from bronze to silver to gold. This works absolutely fine when I hard code the tables and the SQL transformations as an array wi...

Data Engineering
Databricks
Delta Live Table
dlt
  • 912 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Edthehead, Configuring your ETL framework for Delta Live Tables (DLT) can be done in a flexible and maintainable way. Let’s explore some options: Pipeline Settings in DLT: DLT provides a user-friendly interface for configuring pipeline settin...

  • 0 kudos
1 More Replies
george_ognyanov
by New Contributor III
  • 610 Views
  • 1 replies
  • 1 kudos

Orchestrate jobs using a parameter set in a notebook

I am trying to orchestrate my Databricks Workflows tasks using a parameter I would set in a notebook.Given the workflow below I am trying to set a parameter in the Cinderella task which is a python notebook. Once set I would like to use this paramete...

george_ognyanov_0-1701258224857.png george_ognyanov_0-1701259276034.png
  • 610 Views
  • 1 replies
  • 1 kudos
Latest Reply
Panda
New Contributor II
  • 1 kudos

Here's how we can proceed, follow the instructions below:In your previous task, depending on whether you're using Python or Scala, set the task value like this:dbutils.jobs.taskValues.set("check_value", "2")In your if-else task, you must reference th...

  • 1 kudos
tyas
by New Contributor II
  • 1371 Views
  • 1 replies
  • 1 kudos

Defining Keys

Hello,I have a DataFrame in a Databricks notebook that I've already read and transformed using PySpark-Python. I want to create a table with defined keys (primary and foreign). What is the best method to do this:Create a table and directly define key...

  • 1371 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Remember that keys are for information purposes (they don't validate data integrity). They are used for information in a few places (Feature tables, online tables, PowerBi modelling). The best is to define them in CREATE TABLE syntax, for example:CRE...

  • 1 kudos
kickbuttowski
by New Contributor II
  • 882 Views
  • 1 replies
  • 1 kudos

Resolved! issue in loading the json files in same container with different schemas

Could you tell whether this scenario will work or not  Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementall...

  • 882 Views
  • 1 replies
  • 1 kudos
Latest Reply
MichTalebzadeh
Contributor III
  • 1 kudos

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files ass...

  • 1 kudos
Sans1
by New Contributor II
  • 963 Views
  • 2 replies
  • 1 kudos

Delta table vs dynamic views

Hi,My current design is to host the gold layer as dynamic views with masking. I will have couple of use cases that needs the views to be queried with filters.Does this provide equal performance like tables (which has data skipping based on transactio...

  • 963 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @Sans1  Have you only used masking, or you have used any row or column level access control?If it's only masking, then you should go with delta table and if it's row or column level access control then you should prefer dynamic views

  • 1 kudos
1 More Replies
Sans
by New Contributor III
  • 1580 Views
  • 7 replies
  • 3 kudos

Unable to create new compute in community databricks

Hi Team,I am unable to create computer in databricks community due to below error. Please advice.Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-0ab6798b2c762fb25 @ 10.172.246.217. Please check network connectivity between the ...

  • 1580 Views
  • 7 replies
  • 3 kudos
Latest Reply
Sans
New Contributor III
  • 3 kudos

This issue was resolved for some time but again reoccurring from yesterday. Please advice

  • 3 kudos
6 More Replies
colinsorensen
by New Contributor III
  • 967 Views
  • 1 replies
  • 0 kudos

Upgrading to UC. Parent external location for path `s3://____` does not exist"...but it does?

Topic. I am trying to upgrade some external tables in our hive metastore to the unity catalog. I used the upgrade functionality in the UI, as well as its provided SQL: CREATE TABLE `unity_catalog`.`default`.`table` LIKE `hive_metastore`.`schema`.`tab...

  • 967 Views
  • 1 replies
  • 0 kudos
Latest Reply
colinsorensen
New Contributor III
  • 0 kudos

When I have tried to edit the location to include the dbfs component (CREATE TABLE`unity_catalog`.`default`.`table`LIKE`hive_metastore`.`schema`.`table` LOCATION 'dbfs:/mnt/foobarbaz')I get a new error:"[UPGRADE_NOT_SUPPORTED.UNSUPPORTED_FILE_SCHEME]...

  • 0 kudos
colinsorensen
by New Contributor III
  • 1873 Views
  • 3 replies
  • 1 kudos

"All trials either failed or did not return results to hyperopt." AutoML is not working on a fairly simple classification problem.

First the exploratory notebook fails, though when I run it manually it works just fine.After that, the AutoML notebook eventually fails without completing any trials. I get this: Tried to attach usage logger `pyspark.databricks.pandas.usage_logger`, ...

  • 1873 Views
  • 3 replies
  • 1 kudos
Latest Reply
colinsorensen
New Contributor III
  • 1 kudos

Ultimately this problem magically resolved itself. I think I updated the cluster or something.

  • 1 kudos
2 More Replies
Avinash_Narala
by Contributor
  • 2137 Views
  • 3 replies
  • 0 kudos

Bootstrap Timeout: DURING CLUSTER START

Hi,When I start a cluster, I am getting below error:Bootstrap Timeout:[id: InstanceId(i-05bbcfbb30027ce2c), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-2247916891060257-01b40fb4-3eb1-4a26-99b4-30d6aa0bfe83), lastStatusChangeTime:...

  • 2137 Views
  • 3 replies
  • 0 kudos
Latest Reply
dhtubong
New Contributor II
  • 0 kudos

Hello - if you're using DB Community Edition and having Bootstrap Timeout issue, then below resolution may help.Error: Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-00f21ee2d3ca61424 @ 10.172.245.1. Please check network conne...

  • 0 kudos
2 More Replies
Dick1960
by New Contributor II
  • 1768 Views
  • 3 replies
  • 2 kudos

how to know what is the domain of my databricks workspace

hi,I'm trying to open a support case and it asks me for my domain. in the browser I have: https://adb-27xxxx4341636xxx.5.azuredatabricks.net can you help me ? 

  • 1768 Views
  • 3 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 2 kudos

@Dick1960 The numeric value you have in the workspace URL is the domain name.In your case, it would be 27xxxx4341636xxx

  • 2 kudos
2 More Replies
Brad
by Contributor
  • 629 Views
  • 2 replies
  • 0 kudos

WAL for structured streaming

Hi, I cannot find deep-dive on this from latest links. So far the understanding is:Previously SS (structured streaming) copies and caches the data in WAL. After a version, with retrieve less, SS doesn't copy the data to WAL any more, and only stores ...

  • 629 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Your understanding is partially correct. Let’s delve into the details of Structured Streaming in Apache Spark. Write-Ahead Log (WAL): In the past, Structured Streaming used to copy and cache data in the Write-Ahead Log (WAL).The WAL served as a r...

  • 0 kudos
1 More Replies
lilo_z
by New Contributor III
  • 1754 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks Asset Bundles - job specific "run_as" user/service_principle

Was wondering if this was possible, since a use case came up in my team. Would it be possible to use a different service principle for a single job than what is specified for that target environment? For example:bundle: name: hello-bundle resource...

  • 1754 Views
  • 3 replies
  • 0 kudos
Latest Reply
lilo_z
New Contributor III
  • 0 kudos

Found a working solution, posting it here for anyone else hitting the same issue - trick was to redefine "resources" under the target you want to make an exception for:bundle: name: hello_bundle include: - resources/*.yml targets: dev: w...

  • 0 kudos
2 More Replies
dbx-user7354
by New Contributor III
  • 1498 Views
  • 3 replies
  • 4 kudos

Create a Job via SKD with JobSettings Object

Hey, I want to create a Job via the Python SDK with a JobSettings object.import os import time from databricks.sdk import WorkspaceClient from databricks.sdk.service import jobs from databricks.sdk.service.jobs import JobSettings w = WorkspaceClien...

  • 1498 Views
  • 3 replies
  • 4 kudos
Latest Reply
nenetto
New Contributor II
  • 4 kudos

I just faced the same problem. The issue is that the when you do JobSettings.as_dict()the settings are parsed to a dict where all the values are also parsed recursively. When you pass the parameters as **params, the create method again tries to parse...

  • 4 kudos
2 More Replies
noname123
by New Contributor III
  • 1499 Views
  • 2 replies
  • 0 kudos

Resolved! Delta table version protocol

I do:df.write.format("delta").mode("append").partitionBy("timestamp").option("mergeSchema", "true").save(destination)If table doesn't exist, it creates new table with "minReaderVersion":3,"minWriterVersion":7.Yesterday it was creating table with "min...

  • 1499 Views
  • 2 replies
  • 0 kudos
Latest Reply
noname123
New Contributor III
  • 0 kudos

Thanks for help.Issue was caused by "Auto-Enable Deletion Vectors" setting. 

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels