cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ashraf1395
by Valued Contributor
  • 652 Views
  • 1 replies
  • 1 kudos

Resolved! Querying Mysql db from Azure databricks where public access is disabled

Hi there,We are trying to setup a infra that ingest data from MySQL hosted on awa EC2 instance with pyspark and azure databricks and dump to the adls storage.Since databases has public accessibility disabled and how can I interact with MySQL from azu...

  • 652 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you will need some kind of tunnel that opens the db server to external access.Perhaps a vpn is an option?If not: won't be possible.An alternative way would be to have some local LAN tool extract the data and then move it to S3/... and afterwards let ...

  • 1 kudos
vvzadvor
by New Contributor III
  • 1162 Views
  • 4 replies
  • 1 kudos

Resolved! Debugging python code outside of Notebooks

Hi experts,Does anyone know if there's a way of properly debugging python code outside of notebooks?We have a complicated python-based framework for loading files, transforming them according to the business specification and saving the results into ...

  • 1162 Views
  • 4 replies
  • 1 kudos
Latest Reply
vvzadvor
New Contributor III
  • 1 kudos

OK, I can now confirm that remote debugging with stepping into your own libraries installed on the cluster is possible and is actually pretty convenient using a combination of databricks-connect Python library and a Databricks extension for VSCode. S...

  • 1 kudos
3 More Replies
ashraf1395
by Valued Contributor
  • 758 Views
  • 1 replies
  • 2 kudos

Resolved! Reading a materialised view locally or using databricks api

Hi there, This was my previous approach - I had a databricks notebook with a streaming table bronze level reading data from volumes which created a 2 downstream tables.- 1st A a materialised view gold level, another a table for storing ingestion_meta...

  • 758 Views
  • 1 replies
  • 2 kudos
Latest Reply
ashraf1395
Valued Contributor
  • 2 kudos

I used this approach - Querying the materialised view using databricks serverless SQL endpoint by connecting it with SQL connect. Its working right now. If I face any issues, I will write it into a normal table and delta share it.Thanks for your repl...

  • 2 kudos
MyTrh
by New Contributor III
  • 2026 Views
  • 7 replies
  • 3 kudos

Resolved! Delta table with unique columns incremental refresh

Hi Team,We have one huge streaming table from which we want to create another streaming table in which we will pick few columns from the original streaming table. But in this new table the rows must be unique.Can someone please help me with the imple...

  • 2026 Views
  • 7 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @MyTrh ,Ok, I think I created similiar use case to yours. I have streaming table with column structure based on your exampleCREATE OR REFRESH STREAMING TABLE clicks_raw AS SELECT *, current_timestamp() as load_time FROM cloud_files('/Volumes/dev/d...

  • 3 kudos
6 More Replies
Soma
by Valued Contributor
  • 1507 Views
  • 3 replies
  • 1 kudos

Resolved! Where does custom state store the data

There are couple of custom state functions like mapgroupswithstate,ApplyinpandaswithStateWhich has a internal state maintained is it maintained in same statestore(rocksdb) as aggregation state store function ​

  • 1507 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @somanath Sankaran​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 1 kudos
2 More Replies
Box_clown
by New Contributor II
  • 1350 Views
  • 3 replies
  • 3 kudos

Set Not null changes Data type

Hello,Just found this issue this week and thought I would ask. An Alter Table alter column set not null is changing a varchar(x) data type to string type. I believe this should happen in most environments so I wouldn't need to supply code...Create a ...

  • 1350 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @Box_clown ,To be precise, Delta Lake format is based on parquet files. For strings, Parquet only has one data type: StringTypeSo, basically varchar(n) data type under the hood is represented as string with check constraint on the length of the st...

  • 3 kudos
2 More Replies
acegerace
by New Contributor II
  • 829 Views
  • 1 replies
  • 1 kudos

RLS

When applying a function to a table for RLS, do users require SELECT privileges on the table used for RLS. And, do users also require EXECUTE privileges on the function. Not clear on this form doco.

  • 829 Views
  • 1 replies
  • 1 kudos
Latest Reply
mahfooz_iiitian
New Contributor III
  • 1 kudos

Yes, you require select permission for the table.For functions, if it is a built-in function (such as is_account_group_member), then you do not require permission. However, if it is a custom function, you must have access to execute it.You can refer ...

  • 1 kudos
ayush25091995
by New Contributor III
  • 417 Views
  • 1 replies
  • 0 kudos

Get queries history run on UC enabled interactive cluster

Hi Team,I want to derived couple of kpis like most frequent queries, top queries, query type like select, insert or update on UC enabled interactive cluster. I know we can do this for SQL warehouse but what is the way we can do this interactive clust...

  • 417 Views
  • 1 replies
  • 0 kudos
Latest Reply
ayush25091995
New Contributor III
  • 0 kudos

@Retired_mod , this table will only the query history for sql warehouse cluster, i need for UC interactive/All purpose cluster. 

  • 0 kudos
thiagoawstest
by Contributor
  • 3735 Views
  • 1 replies
  • 0 kudos

Save file to /tmp

Hello, I have python code that collects data in json, and sends it to an S3 bucket, everything works fine. But when there is a lot of data, it causes memory overflow.So I want to save locally, for example in /tmp or dbfs:/tmp and after sending it to ...

  • 3735 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @thiagoawstest , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 0 kudos
Mathias_Peters
by Contributor
  • 1107 Views
  • 2 replies
  • 3 kudos

Resolved! Service principal seemingly cannot access its own workspace folder

We have implemented an asset bundle (DAB) that creates a wheel. During DAB deployment, the wheel is built and stored in the folder of the service principal running the deployment via GH workflow. The full path is/Workspace/Users/SERVICE-PRINCIPAL-ID/...

  • 1107 Views
  • 2 replies
  • 3 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 3 kudos

Thank you for sharing the solution that worked for you, I am sure it will help other community members. ThanksRishabh

  • 3 kudos
1 More Replies
Littlesheep_
by New Contributor
  • 1871 Views
  • 3 replies
  • 0 kudos

How to run a notebook in a .py file in databricks

The situation is that my colleague was using pycharm and now needs to adapt to databricks. They are now doing their job by connecting VScode to databricks and run the .py file using databricks clusters.The problem is they want to call a notebook in d...

  • 1871 Views
  • 3 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @Littlesheep_ , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your fe...

  • 0 kudos
2 More Replies
EdwardLui
by New Contributor
  • 577 Views
  • 1 replies
  • 0 kudos

How to extend the retention duration on steaming table created by DLT

The steaming table from DLT is default retention duration is 7 days. we would like to extend to 60 days. since we cannot alter the table properties, how can I achieve this change?

  • 577 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @EdwardLui , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedb...

  • 0 kudos
georgecalvert
by New Contributor
  • 1064 Views
  • 2 replies
  • 0 kudos

ConcurrentAppendException Liquid Clustered Table Different Row Concurrent Writes

I have multiple databricks jobs performing a MERGE command simultaneously into the same liquid clustered table but for different rows of data and I am receiving the following error message: [DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files w...

  • 1064 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @georgecalvert , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 0 kudos
1 More Replies
ibrar_aslam
by New Contributor
  • 808 Views
  • 1 replies
  • 0 kudos

Delta live table not refreshing - window function

We have a list of streaming tables populated by Autoloader from files on S3, which serve as sources for our live tables. After the Autoloader Delta pipeline completes, we trigger a second Delta Live Tables (DLT) pipeline to perform a deduplication op...

  • 808 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @ibrar_aslam , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your fee...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels