cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

mexcram
by New Contributor II
  • 717 Views
  • 1 replies
  • 1 kudos

Glue database and saveAsTable

Hello all,I am saving my data frame as a Delta Table to S3 and AWS Glue using pyspark and `saveAsTable`, so far I can do this but something curious happens when I try to change the `path` (as an option or as an argument of `saveAsTable`).The location...

  • 717 Views
  • 1 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
Hello all,I am saving my data frame as a Delta Table to S3 and AWS Glue using pyspark and `saveAsTable`, so far I can do this but something curious happens when I try to change the `path` (as an option or as an argument of `saveAsTable`).The location...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.
narisuna
by New Contributor
  • 192 Views
  • 0 replies
  • 0 kudos

single node Cluster CPU not fully used

Hello community,I use a cluster (Single node: Standard_F64s_v2 · DBR: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)) for a job. In this job I didn't use spark multiprocessing. Instead I use this single node cluster as a VM and use python multipr...

  • 192 Views
  • 0 replies
  • 0 kudos
hpant
by New Contributor III
  • 1891 Views
  • 5 replies
  • 1 kudos

Autoloader error "Failed to infer schema for format json from existing files in input"

I have two json files in one of the location in Azure gen 2 storage e.g. '/mnt/abc/Testing/'. When I trying to read the files using autoloader I am getting this error: "Failed to infer schema for format json from existing files in input path /mnt/abc...

  • 1891 Views
  • 5 replies
  • 1 kudos
Latest Reply
holly
Databricks Employee
  • 1 kudos

Hi @hpant would you consider testing the new VARIANT type for your JSON data? I appreciate it will require rewriting the next step in your pipeline, but should be more robust wrt errors.  Disclaimer: I haven't personally tested variant with Autoloade...

  • 1 kudos
4 More Replies
Devsql
by New Contributor III
  • 288 Views
  • 1 replies
  • 1 kudos

For a given Notebook, how to find the calling Job

Hi Team,I came across a situation that I have a Notebook but I am Not able to find a Job/DLT which calls this Notebook.So is there any query or any mechanism, using which i can find out ( or i can list out ) Jobs/scripts which has called given Notebo...

Data Engineering
Azure Databricks
  • 288 Views
  • 1 replies
  • 1 kudos
Latest Reply
Devsql
New Contributor III
  • 1 kudos

Hi @Retired_mod , would you like to help me for above question !!!

  • 1 kudos
semsim
by Contributor
  • 540 Views
  • 1 replies
  • 0 kudos

List and iterate over files in Databricks workspace

Hi DE Community,I need to be able to list/iterate over a set of files in a specific directory within the Databricks workspace. For example:"/Workspace/SharedFiles/path/to/file_1"..."/Workspace/SharedFiles/path/to/file_n"Thanks for your direction and ...

  • 540 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @semsim ,You can use File system utility (dbutils.fs)Databricks Utilities (dbutils) reference | Databricks on AWSWork with files on Databricks | Databricks on AWSdbutils.fs.ls("file:/Workspace/Users/<user-folder>/")

  • 0 kudos
Zeruno
by New Contributor
  • 455 Views
  • 1 replies
  • 0 kudos

DLT - Get pipeline_id and update_id

I need to insert pipeline_id and update_id in my Delta Live Table (DLT), the point being to know which pipeline created which row. How can I obtain this information?I know you can get job_id and run_id from widgets but I don't know if these are the s...

  • 455 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @Zeruno ,Those values are rather static. Maybe you can design process that as a first step will extract information from List Pipepline API and save them in delta table.List pipelines | Pipelines API | REST API reference | Databricks on AWSThan in...

  • 0 kudos
Shazaamzaa
by New Contributor III
  • 715 Views
  • 1 replies
  • 0 kudos

Setup dbt-core with Azure Entra ID

Hey team, I'm trying to standardize the development environment setup in our team. I've written up a shell script that I want our devs to run in WSL2 after setup. The shell script does the following:1. setup Azure CLI - install and authenticate2. Ins...

  • 715 Views
  • 1 replies
  • 0 kudos
Latest Reply
Shazaamzaa
New Contributor III
  • 0 kudos

Hey @Retired_mod thanks for the response. I persisted a little more with the logs and the issue appears to be related to WSL2 not having a backend credential manager to handle management of tokens supplied by the OAuth process. To be honest, this is ...

  • 0 kudos
acj1459
by New Contributor
  • 225 Views
  • 0 replies
  • 0 kudos

Azure Databricks Data Load

Hi All,I have 10 tables present on On-prem MS SQL DB and want to load 10 table data incrementally into Bronze delta table as append only. From Bronze to Silver , using merge query I want to load latest record into Silver delta table . Whatever latest...

  • 225 Views
  • 0 replies
  • 0 kudos
MRTN
by New Contributor III
  • 4339 Views
  • 3 replies
  • 2 kudos

Resolved! Configure multiple source paths for auto loader

I am currently using two streams to monitor data in two different containers on an Azure storage account. Is there any way to configure an autoloader to read from two different locations? The schemas of the files are identical.

  • 4339 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Morten Stakkeland​ :Yes, it's possible to configure an autoloader to read from multiple locations.You can define multiple CloudFiles sources for the autoloader, each pointing to a different container in the same storage account. In your case, since ...

  • 2 kudos
2 More Replies
jfvizoso
by New Contributor II
  • 8383 Views
  • 4 replies
  • 0 kudos

Can I pass parameters to a Delta Live Table pipeline at running time?

I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a parameter. I know you can have settings in the pipeline that you use in the DLT notebook, but it seems you can only assign values to them when crea...

  • 8383 Views
  • 4 replies
  • 0 kudos
Latest Reply
lprevost
Contributor
  • 0 kudos

This seems to be the key to this question:parameterize for dlt  My understanding of this is that you can add the parameter either in the DLT settings UI via Advanced Config/Add Configuration, key, value dialog.   Or via the corresponding pipeline set...

  • 0 kudos
3 More Replies
N_M
by Contributor
  • 6060 Views
  • 7 replies
  • 3 kudos

Resolved! use job parameters in scripts

Hi CommunityI made some research, but I wasn't lucky, and I'm a bit surprised I can't find anything about it.So, I would simply access the job parameters when using python scripts (not notebooks).My flow doesn't use notebooks, but I still need to dri...

  • 6060 Views
  • 7 replies
  • 3 kudos
Latest Reply
N_M
Contributor
  • 3 kudos

The only working workaround I found has been provided in another threadRe: Retrieve job-level parameters in Python - Databricks Community - 44720I will repost it here (thanks @julio_resende )You need to push down your parameters to a task level. Eg:C...

  • 3 kudos
6 More Replies
Shiva3
by New Contributor III
  • 415 Views
  • 1 replies
  • 0 kudos

How to know actual size of delta and non-delta tables also the no of files actually exists on S3.

I have set of delta and non-delta tables, their data is on AWS s3, I want to know the total size of my delta and non-delta table in actual excluding files belongs to operations DELETE, VACCUM etc. , also I need to know how much files each delta versi...

  • 415 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
I have set of delta and non-delta tables, their data is on AWS s3, I want to know the total size of my delta and non-delta table in actual excluding files belongs to operations DELETE, VACCUM etc. , also I need to know how much files each delta versi...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Prashanth24
by New Contributor III
  • 662 Views
  • 1 replies
  • 0 kudos

Databricks workflow creation using databricks sdk programming

I am trying to create Databricks workflow using sdk programming. I am successful in this but struck at how to use libraries whl files in the task from yaml file means which sdk package or code to be used to associate library whl in the notebook/pytho...

  • 662 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
I am trying to create Databricks workflow using sdk programming. I am successful in this but struck at how to use libraries whl files in the task from yaml file means which sdk package or code to be used to associate library whl in the notebook/pytho...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
qwerty1
by Contributor
  • 4732 Views
  • 6 replies
  • 15 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

  • 4732 Views
  • 6 replies
  • 15 kudos
Latest Reply
777
New Contributor II
  • 15 kudos

Currently, I can use databricks-connect only with scala 2.12. When a runtime and databricks-connect with Scala 2.13 are introduced, that would open the possibility to use Scala 3 with databricks-connect, and that would be amazing.

  • 15 kudos
5 More Replies
pjv
by New Contributor III
  • 557 Views
  • 1 replies
  • 1 kudos

Resolved! Connection error when accessing dbutils secrets

We have daily running pipelines that need to access dbutils secrets for API keys. However, the dbutils.secrets.get function within our python code we get the following error:org.apache.http.conn.HttpHostConnectException: Connect to us-central1.gcp.da...

  • 557 Views
  • 1 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
We have daily running pipelines that need to access dbutils secrets for API keys. However, the dbutils.secrets.get function within our python code we get the following error:org.apache.http.conn.HttpHostConnectException: Connect to us-central1.gcp.da...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels