cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RabahO
by New Contributor III
  • 1839 Views
  • 2 replies
  • 0 kudos

Resolved! Unit tests in notebook not working

Hello, I'm trying to setup a notebook for tests or data quality checks. The name is not important.I basically read a table (the ETL output process - actual data).Then I read another table and do the calculation in the notebook (expected data)I'm stuc...

  • 1839 Views
  • 2 replies
  • 0 kudos
Latest Reply
RabahO
New Contributor III
  • 0 kudos

thank you for the nutter. Tried it and it seems to answer my problematic.

  • 0 kudos
1 More Replies
Avinash_Narala
by Contributor
  • 2166 Views
  • 0 replies
  • 0 kudos

Unable to use SQL UDF

Hello, I want to create an sql udf as follows:%sqlCREATE or replace FUNCTION get_type(s STRING)  RETURNS STRING  LANGUAGE PYTHON  AS $$    def get_type(table_name):      from pyspark.sql.functions import col      from pyspark.sql import SparkSession ...

  • 2166 Views
  • 0 replies
  • 0 kudos
kiko_roy
by Contributor
  • 1962 Views
  • 2 replies
  • 1 kudos

Resolved! DLT cluster : can be manipulated?

Hi All I am using DLT pipeline to pull data from a ADLS gen2 which is mounted. The cluster that is getting fired has access mode is set to shared. I want to change it to single user. But the cluster being attached to DLT , I am not ablr to update and...

  • 1962 Views
  • 2 replies
  • 1 kudos
Latest Reply
AlliaKhosla
Databricks Employee
  • 1 kudos

Hi @kiko_roy  Greetings! You can't use a single-user cluster to query tables from a Unity Catalog-enabled Delta Live Tables pipeline, including streaming tables and materialized views in Databricks SQL. To access these tables, you need to use a share...

  • 1 kudos
1 More Replies
ss6
by New Contributor II
  • 1032 Views
  • 1 replies
  • 0 kudos

Resolved! Liquid Cluster - SHOW CREATE TABLE error

We've got this table with liquid clustering turned on at first, but then we switch off with below command.ALTER TABLE table_name CLUSTER BY NONE;Now, our downstream process that usually runs "SHOW CREATE TABLE" is hitting a snag. It's throwing this e...

  • 1032 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @ss6 , Hope you are doing well!  We would like to inform you that currently, SHOW CREATE TABLE is not supported after running ALTER TABLE CLUSTER BY NONE. This is a known issue and our Engineering team is prioritizing a fix to retain the clusterin...

  • 0 kudos
isaac_gritz
by Databricks Employee
  • 19479 Views
  • 2 replies
  • 2 kudos

Using Plotly Dash with Databricks

How to use Plotly Dash with DatabricksWe recommend checking out this article for the latest on building Dash Applications on top of the Databricks Lakehouse.Let us know in the comments if you use Plotly and if you're planning on adopting the latest i...

  • 19479 Views
  • 2 replies
  • 2 kudos
Latest Reply
dave-at-plotly
New Contributor III
  • 2 kudos

Hey all.  Just wanted to make sure everyone had some up-to-date intel regarding leveraging Plotly Dash with Databricks.Most Dash app integrations w Databricks today leverage the Databricks Python SQL Connector.  More technical details are available v...

  • 2 kudos
1 More Replies
exilon
by New Contributor
  • 1265 Views
  • 0 replies
  • 0 kudos

DLT streaming with sliding window missing last windows interval

Hello, I have a DLT pipeline where I want to calculate the rolling average of a column for the last 24 hours which is updated every hour.I'm using the below code to achieve this:       @Dlt.table() def gold(): df = dlt.read_stream("silver_table")...

Data Engineering
dlt
spark
streaming
window
  • 1265 Views
  • 0 replies
  • 0 kudos
CaptainJack
by New Contributor III
  • 2152 Views
  • 2 replies
  • 0 kudos

File arrival trigger customization

Hi all. I have workflow which I would like to trigger when new file arrive. Problem is that in my storage account, there are few different types of files. Lets assume that I have big csv file and small xlsx mapping file. I would like to trigger job, ...

  • 2152 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

option pathGlobFilter or fileNamePatternhttps://docs.databricks.com/en/ingestion/auto-loader/options.html 

  • 0 kudos
1 More Replies
alxsbn
by Contributor
  • 1793 Views
  • 0 replies
  • 1 kudos

Compute pool and AWS instance profiles

Hi everyone,We're looking at using the compute pool feature. Now we're mostly relying on all-purpose and job compute. On these two we're using instance profiles to let the clusters access our s3 buckets and more.We don't see anything related to insta...

  • 1793 Views
  • 0 replies
  • 1 kudos
JakeerDE
by New Contributor III
  • 3769 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks SQL - Deduplication in DLT APPLY CHANGES INTO

Hi @Retired_mod,We have a kafka source appending the data into bronze table and a subsequent DLT apply changes into to do the SCD handling. Finally, we have materialized views to create dims/facts.We are facing issues, when we perform deduplication i...

  • 3769 Views
  • 3 replies
  • 0 kudos
Latest Reply
JakeerDE
New Contributor III
  • 0 kudos

Hi @Palash01 Thanks for the response. Below is what I am trying to do. However, it is throwing an error. APPLY CHANGES INTO LIVE.targettable FROM ( SELECT DISTINCT * FROM STREAM(sourcetable_1) tbl1 INNER JOIN STREAM(sourcetable_2) tbl2 ON tbl1.id = ...

  • 0 kudos
2 More Replies
sumitdesai
by New Contributor II
  • 2981 Views
  • 1 replies
  • 0 kudos

Job not able to access notebook from github

I have created a job in Databricks and configured to use a cluster having single user access enabled and using github as a source. When I am trying to run the job, getting following error-run failed with error message Unable to access the notebook "d...

  • 2981 Views
  • 1 replies
  • 0 kudos
Latest Reply
ezhil
New Contributor III
  • 0 kudos

I think you need to link the git account with databricks by passing the access token which is generated in githubFollow the document for reference: https://docs.databricks.com/en/repos/get-access-tokens-from-git-provider.htmlNote : While creating the...

  • 0 kudos
cltj
by New Contributor III
  • 1951 Views
  • 1 replies
  • 0 kudos

Managed tables and ADLS - infrastructure

Hi all. I want to get this right and therefore I am reaching out to the community. We are using azure, and currently are using 1 Azure Data Lake Storage for development, and 1 for production. These are connected to dev and prod databricks workspaces....

  • 1951 Views
  • 1 replies
  • 0 kudos
Latest Reply
ossinova
Contributor II
  • 0 kudos

I recommend you read this article (Managed vs External tables) and answer the following questions:do I require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses?If yes, then External is your only optionIn rel...

  • 0 kudos
Marcin_U
by New Contributor II
  • 1585 Views
  • 1 replies
  • 0 kudos

AutoLoader - problem with adding new source location

Hello,I have some trouble with AutoLoader. Currently we use many diffrent source location on ADLS to read parquet files and write it to delta table using AutoLoader. Files in locations have the same schema.Every things works fine untill we have to ad...

  • 1585 Views
  • 1 replies
  • 0 kudos
Latest Reply
Marcin_U
New Contributor II
  • 0 kudos

Thanks for the reply @Retired_mod . I have some questions related to you answer.Checkpoint Location:Does deleteing checkpoint folder (or only files?) mean that next run of AutoLoader will load all files from provided source locations? So it will dupl...

  • 0 kudos
oussValrho
by New Contributor
  • 5674 Views
  • 0 replies
  • 0 kudos

Cannot resolve due to data type mismatch: incompatible types ("STRING" and ARRAY<STRING>

hey i have this error from a while : Cannot resolve "(needed_skill_id = needed_skill_id)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("STRING" and "ARRAY<STRING>"). SQLSTATE: 42K09;and these ...

  • 5674 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels