cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Avinash_Narala
by Contributor
  • 1785 Views
  • 1 replies
  • 0 kudos

Unable to use SQL UDF

Hello, I want to create an sql udf as follows:%sqlCREATE or replace FUNCTION get_type(s STRING)  RETURNS STRING  LANGUAGE PYTHON  AS $$    def get_type(table_name):      from pyspark.sql.functions import col      from pyspark.sql import SparkSession ...

  • 1785 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Avinash_Narala, The error message indicates that the execution of your user-defined function (UDF) get_type failed. This could be due to a variety of reasons. Here are a few things you could check: Data Type Mismatch: Ensure that the data ty...

  • 0 kudos
kiko_roy
by Contributor
  • 1623 Views
  • 2 replies
  • 1 kudos

Resolved! DLT cluster : can be manipulated?

Hi All I am using DLT pipeline to pull data from a ADLS gen2 which is mounted. The cluster that is getting fired has access mode is set to shared. I want to change it to single user. But the cluster being attached to DLT , I am not ablr to update and...

  • 1623 Views
  • 2 replies
  • 1 kudos
Latest Reply
AlliaKhosla
Contributor
  • 1 kudos

Hi @kiko_roy  Greetings! You can't use a single-user cluster to query tables from a Unity Catalog-enabled Delta Live Tables pipeline, including streaming tables and materialized views in Databricks SQL. To access these tables, you need to use a share...

  • 1 kudos
1 More Replies
ss6
by New Contributor II
  • 782 Views
  • 1 replies
  • 0 kudos

Resolved! Liquid Cluster - SHOW CREATE TABLE error

We've got this table with liquid clustering turned on at first, but then we switch off with below command.ALTER TABLE table_name CLUSTER BY NONE;Now, our downstream process that usually runs "SHOW CREATE TABLE" is hitting a snag. It's throwing this e...

  • 782 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Honored Contributor
  • 0 kudos

Hi @ss6 , Hope you are doing well!  We would like to inform you that currently, SHOW CREATE TABLE is not supported after running ALTER TABLE CLUSTER BY NONE. This is a known issue and our Engineering team is prioritizing a fix to retain the clusterin...

  • 0 kudos
isaac_gritz
by Valued Contributor II
  • 17697 Views
  • 3 replies
  • 4 kudos

Using Plotly Dash with Databricks

How to use Plotly Dash with DatabricksWe recommend checking out this article for the latest on building Dash Applications on top of the Databricks Lakehouse.Let us know in the comments if you use Plotly and if you're planning on adopting the latest i...

  • 17697 Views
  • 3 replies
  • 4 kudos
Latest Reply
dave-at-plotly
New Contributor III
  • 4 kudos

Hey all.  Just wanted to make sure everyone had some up-to-date intel regarding leveraging Plotly Dash with Databricks.Most Dash app integrations w Databricks today leverage the Databricks Python SQL Connector.  More technical details are available v...

  • 4 kudos
2 More Replies
Dhruv_Sinha
by New Contributor II
  • 5860 Views
  • 2 replies
  • 1 kudos

Parallelizing processing of multiple spark dataframes

Hi all, I am trying to create a collection rd that contains a list of spark dataframes. I want to parallelize the cleaning process for each of these dataframes. Later on, I am sending each of these dataframes to another method. However, when I parall...

  • 5860 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Dhruv_Sinha, The issue you’re encountering—where the Spark context (sc) cannot be accessed from worker nodes—is a common challenge when working with Spark. Let’s explore why this happens and discuss potential workarounds. Spark Context and W...

  • 1 kudos
1 More Replies
CaptainJack
by New Contributor III
  • 1749 Views
  • 2 replies
  • 0 kudos

File arrival trigger customization

Hi all. I have workflow which I would like to trigger when new file arrive. Problem is that in my storage account, there are few different types of files. Lets assume that I have big csv file and small xlsx mapping file. I would like to trigger job, ...

  • 1749 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

option pathGlobFilter or fileNamePatternhttps://docs.databricks.com/en/ingestion/auto-loader/options.html 

  • 0 kudos
1 More Replies
JakeerDE
by New Contributor III
  • 3078 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks SQL - Deduplication in DLT APPLY CHANGES INTO

Hi @Kaniz_Fatma,We have a kafka source appending the data into bronze table and a subsequent DLT apply changes into to do the SCD handling. Finally, we have materialized views to create dims/facts.We are facing issues, when we perform deduplication i...

  • 3078 Views
  • 3 replies
  • 0 kudos
Latest Reply
JakeerDE
New Contributor III
  • 0 kudos

Hi @Palash01 Thanks for the response. Below is what I am trying to do. However, it is throwing an error. APPLY CHANGES INTO LIVE.targettable FROM ( SELECT DISTINCT * FROM STREAM(sourcetable_1) tbl1 INNER JOIN STREAM(sourcetable_2) tbl2 ON tbl1.id = ...

  • 0 kudos
2 More Replies
sumitdesai
by New Contributor II
  • 2303 Views
  • 1 replies
  • 0 kudos

Job not able to access notebook from github

I have created a job in Databricks and configured to use a cluster having single user access enabled and using github as a source. When I am trying to run the job, getting following error-run failed with error message Unable to access the notebook "d...

  • 2303 Views
  • 1 replies
  • 0 kudos
Latest Reply
ezhil
New Contributor III
  • 0 kudos

I think you need to link the git account with databricks by passing the access token which is generated in githubFollow the document for reference: https://docs.databricks.com/en/repos/get-access-tokens-from-git-provider.htmlNote : While creating the...

  • 0 kudos
Gilg
by Contributor II
  • 1724 Views
  • 5 replies
  • 2 kudos

Multiple Autoloader reading the same directory path

HiOriginally, I only have 1 pipeline looking to a directory. Now as a test, I cloned the existing pipeline and edited the settings to a different catalog. Now both pipelines is basically reading the same directory path and running continuous mode.Que...

  • 1724 Views
  • 5 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Gilg, When multiple pipelines are simultaneously accessing the same directory path and utilizing Autoloader in continuous mode, it is crucial to consider the management of file locks and data consistency carefully.    Let's delve into the specifi...

  • 2 kudos
4 More Replies
cltj
by New Contributor III
  • 1417 Views
  • 1 replies
  • 0 kudos

Managed tables and ADLS - infrastructure

Hi all. I want to get this right and therefore I am reaching out to the community. We are using azure, and currently are using 1 Azure Data Lake Storage for development, and 1 for production. These are connected to dev and prod databricks workspaces....

  • 1417 Views
  • 1 replies
  • 0 kudos
Latest Reply
ossinova
Contributor II
  • 0 kudos

I recommend you read this article (Managed vs External tables) and answer the following questions:do I require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses?If yes, then External is your only optionIn rel...

  • 0 kudos
Marcin_U
by New Contributor II
  • 1314 Views
  • 2 replies
  • 0 kudos

AutoLoader - problem with adding new source location

Hello,I have some trouble with AutoLoader. Currently we use many diffrent source location on ADLS to read parquet files and write it to delta table using AutoLoader. Files in locations have the same schema.Every things works fine untill we have to ad...

  • 1314 Views
  • 2 replies
  • 0 kudos
Latest Reply
Marcin_U
New Contributor II
  • 0 kudos

Thanks for the reply @Kaniz_Fatma . I have some questions related to you answer.Checkpoint Location:Does deleteing checkpoint folder (or only files?) mean that next run of AutoLoader will load all files from provided source locations? So it will dupl...

  • 0 kudos
1 More Replies
essura
by New Contributor II
  • 1653 Views
  • 2 replies
  • 1 kudos

Create a docker image for dbt task

Hi there,We are trying to setup up a docker image for our dbt execution, primarily to improve execution speed, but also to simplify deployment (we are using a private repos for both the dbt project and some of the dbt packages).It seems to work curre...

  • 1653 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @essura, Setting up a Docker image for your dbt execution is a great approach. Let’s dive into the details. Prebuilt Docker Images: dbt Core and all adapter plugins maintained by dbt Labs are available as Docker images. These images are distr...

  • 1 kudos
1 More Replies
Innov
by New Contributor
  • 889 Views
  • 1 replies
  • 0 kudos

Parse nested json for building footprints

Looking for some help. If anyone has worked with nested json file in Databricks notebook. I am trying to parse nested json file to get coordinates and use that to create polygon for footprint. Do I need to read it as txt? How can I use the Databricks...

  • 889 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Innov, Working with nested JSON files in Databricks Notebooks is a common task, and I can guide you through the process. Let’s break it down step by step: Reading the Nested JSON File: You don’t need to read the JSON file as plain text (.txt...

  • 0 kudos
zero234
by New Contributor III
  • 1144 Views
  • 1 replies
  • 1 kudos

Data is not loaded when creating two different streaming table from one delta live table pipeline

 i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.where as when i try to run each table indiv...

Data Engineering
dlt
spark
STREAMINGTABLE
  • 1144 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @zero234, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where you’re trying to create two streaming tables from different sources with distinct schemas. Let’s dive into this! DLT is a powerful feature in Data...

  • 1 kudos
vijaykumar99535
by New Contributor III
  • 1221 Views
  • 1 replies
  • 0 kudos

How to create job cluster using rest api

I am creating cluster using rest api call but every-time it is creating all purpose cluster. Is there a way to create job cluster and run notebook using python code?

  • 1221 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

job_cluster_key string [ 1 .. 100 ] characters ^[\w\-\_]+$ If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.Create a new job | Jobs API | REST API reference | Databricks on AWS

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels