cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Abdul1
by New Contributor
  • 2048 Views
  • 1 replies
  • 0 kudos

How to output data from Databricks?

Hello,I am just starting with Databricks in Azure and I need to output the data to an Affinity CRM system.Affinity has an API and I am wondering is there any sort of automated / data pipeline sort of way to tell databricks to just pump the data into ...

  • 2048 Views
  • 1 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

We need more info on what kind of data, volume and what the called APi can handle. Calling an API for single records in parallel can be achieved using UDF(see THIS). You need to be careful to batch the records so that the target API can handle the pa...

  • 0 kudos
george_ognyanov
by New Contributor III
  • 1889 Views
  • 1 replies
  • 1 kudos

Orchestrate jobs using a parameter set in a notebook

I am trying to orchestrate my Databricks Workflows tasks using a parameter I would set in a notebook.Given the workflow below I am trying to set a parameter in the Cinderella task which is a python notebook. Once set I would like to use this paramete...

george_ognyanov_0-1701258224857.png george_ognyanov_0-1701259276034.png
  • 1889 Views
  • 1 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

Here's how we can proceed, follow the instructions below:In your previous task, depending on whether you're using Python or Scala, set the task value like this:dbutils.jobs.taskValues.set("check_value", "2")In your if-else task, you must reference th...

  • 1 kudos
tyas
by New Contributor II
  • 4023 Views
  • 1 replies
  • 1 kudos

Defining Keys

Hello,I have a DataFrame in a Databricks notebook that I've already read and transformed using PySpark-Python. I want to create a table with defined keys (primary and foreign). What is the best method to do this:Create a table and directly define key...

  • 4023 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Remember that keys are for information purposes (they don't validate data integrity). They are used for information in a few places (Feature tables, online tables, PowerBi modelling). The best is to define them in CREATE TABLE syntax, for example:CRE...

  • 1 kudos
kickbuttowski
by New Contributor II
  • 2482 Views
  • 1 replies
  • 1 kudos

Resolved! issue in loading the json files in same container with different schemas

Could you tell whether this scenario will work or not  Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementall...

  • 2482 Views
  • 1 replies
  • 1 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 1 kudos

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files ass...

  • 1 kudos
Sans1
by New Contributor III
  • 3058 Views
  • 2 replies
  • 1 kudos

Delta table vs dynamic views

Hi,My current design is to host the gold layer as dynamic views with masking. I will have couple of use cases that needs the views to be queried with filters.Does this provide equal performance like tables (which has data skipping based on transactio...

  • 3058 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 1 kudos

Hi @Sans1  Have you only used masking, or you have used any row or column level access control?If it's only masking, then you should go with delta table and if it's row or column level access control then you should prefer dynamic views

  • 1 kudos
1 More Replies
colinsorensen
by New Contributor III
  • 2520 Views
  • 1 replies
  • 0 kudos

Upgrading to UC. Parent external location for path `s3://____` does not exist"...but it does?

Topic. I am trying to upgrade some external tables in our hive metastore to the unity catalog. I used the upgrade functionality in the UI, as well as its provided SQL: CREATE TABLE `unity_catalog`.`default`.`table` LIKE `hive_metastore`.`schema`.`tab...

  • 2520 Views
  • 1 replies
  • 0 kudos
Latest Reply
colinsorensen
New Contributor III
  • 0 kudos

When I have tried to edit the location to include the dbfs component (CREATE TABLE`unity_catalog`.`default`.`table`LIKE`hive_metastore`.`schema`.`table` LOCATION 'dbfs:/mnt/foobarbaz')I get a new error:"[UPGRADE_NOT_SUPPORTED.UNSUPPORTED_FILE_SCHEME]...

  • 0 kudos
colinsorensen
by New Contributor III
  • 4081 Views
  • 3 replies
  • 1 kudos

"All trials either failed or did not return results to hyperopt." AutoML is not working on a fairly simple classification problem.

First the exploratory notebook fails, though when I run it manually it works just fine.After that, the AutoML notebook eventually fails without completing any trials. I get this: Tried to attach usage logger `pyspark.databricks.pandas.usage_logger`, ...

  • 4081 Views
  • 3 replies
  • 1 kudos
Latest Reply
colinsorensen
New Contributor III
  • 1 kudos

Ultimately this problem magically resolved itself. I think I updated the cluster or something.

  • 1 kudos
2 More Replies
Avinash_Narala
by Databricks Partner
  • 7037 Views
  • 2 replies
  • 0 kudos

Bootstrap Timeout: DURING CLUSTER START

Hi,When I start a cluster, I am getting below error:Bootstrap Timeout:[id: InstanceId(i-05bbcfbb30027ce2c), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-2247916891060257-01b40fb4-3eb1-4a26-99b4-30d6aa0bfe83), lastStatusChangeTime:...

  • 7037 Views
  • 2 replies
  • 0 kudos
Latest Reply
dhtubong
New Contributor II
  • 0 kudos

Hello - if you're using DB Community Edition and having Bootstrap Timeout issue, then below resolution may help.Error: Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-00f21ee2d3ca61424 @ 10.172.245.1. Please check network conne...

  • 0 kudos
1 More Replies
Dick1960
by New Contributor II
  • 4368 Views
  • 3 replies
  • 2 kudos

how to know what is the domain of my databricks workspace

hi,I'm trying to open a support case and it asks me for my domain. in the browser I have: https://adb-27xxxx4341636xxx.5.azuredatabricks.net can you help me ? 

  • 4368 Views
  • 3 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

@Dick1960 The numeric value you have in the workspace URL is the domain name.In your case, it would be 27xxxx4341636xxx

  • 2 kudos
2 More Replies
Coders
by New Contributor II
  • 5359 Views
  • 0 replies
  • 0 kudos

New delta log folder is not getting created

I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity enti...

  • 5359 Views
  • 0 replies
  • 0 kudos
MikeGo
by Contributor II
  • 2134 Views
  • 1 replies
  • 0 kudos

WAL for structured streaming

Hi, I cannot find deep-dive on this from latest links. So far the understanding is:Previously SS (structured streaming) copies and caches the data in WAL. After a version, with retrieve less, SS doesn't copy the data to WAL any more, and only stores ...

  • 2134 Views
  • 1 replies
  • 0 kudos
lilo_z
by New Contributor III
  • 5265 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Asset Bundles - job specific "run_as" user/service_principle

Was wondering if this was possible, since a use case came up in my team. Would it be possible to use a different service principle for a single job than what is specified for that target environment? For example:bundle: name: hello-bundle resource...

  • 5265 Views
  • 2 replies
  • 0 kudos
Latest Reply
lilo_z
New Contributor III
  • 0 kudos

Found a working solution, posting it here for anyone else hitting the same issue - trick was to redefine "resources" under the target you want to make an exception for:bundle: name: hello_bundle include: - resources/*.yml targets: dev: w...

  • 0 kudos
1 More Replies
dbx-user7354
by New Contributor III
  • 4849 Views
  • 3 replies
  • 4 kudos

Create a Job via SKD with JobSettings Object

Hey, I want to create a Job via the Python SDK with a JobSettings object.import os import time from databricks.sdk import WorkspaceClient from databricks.sdk.service import jobs from databricks.sdk.service.jobs import JobSettings w = WorkspaceClien...

  • 4849 Views
  • 3 replies
  • 4 kudos
Latest Reply
nenetto
New Contributor II
  • 4 kudos

I just faced the same problem. The issue is that the when you do JobSettings.as_dict()the settings are parsed to a dict where all the values are also parsed recursively. When you pass the parameters as **params, the create method again tries to parse...

  • 4 kudos
2 More Replies
nihar_ghude
by New Contributor II
  • 5699 Views
  • 1 replies
  • 0 kudos

OSError: [Errno 107] Transport endpoint is not connected

Hi,I am facing this error when performing write operation in foreach() on a dataframe. The piece of code was working fine for over 3 months but started failing since last week.To give some context, I have a dataframe extract_df which contains 2 colum...

nihar_ghude_0-1710175215407.png
Data Engineering
ADLS
azure
python
spark
  • 5699 Views
  • 1 replies
  • 0 kudos
GOW
by New Contributor II
  • 2473 Views
  • 1 replies
  • 0 kudos

Databricks to s3

I am new to data engineering in Databricks. I need some guidance surrounding Databricks to s3. Can I get an example job or approach to do this?

  • 2473 Views
  • 1 replies
  • 0 kudos
Latest Reply
GOW
New Contributor II
  • 0 kudos

Thank you for the reply. Can I apply this to dbt or using a dbt macro to unload the data? So dbt models running in Databricks?

  • 0 kudos
Labels