cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

spicysheep
by New Contributor
  • 1553 Views
  • 3 replies
  • 2 kudos

Where to find comprehensive docs on databricks.yaml / DAB settings options

Where can I find documentation on how to set cluster settings (e.g., AWS instance type, spot vs on-demand, number of machines) in Databricks Asset Bundle databicks.yaml files? The only documentation I've come across mentions these things indirectly, ...

  • 1553 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 2 kudos

Hi @spicysheep , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 2 kudos
2 More Replies
inagar
by New Contributor
  • 827 Views
  • 1 replies
  • 0 kudos

Copying file from DBFS to a table of Databricks, Is there a way to get the errors at record level ?

We have file of data to be ingested into a table of Databricks. Following below approach,Uploaded file to DBFSCreating a temporary table and loading above file to the temporary table. CREATE TABLE [USING] Use MERGE INTO to merge temp_table created in...

  • 827 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @inagar , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback...

  • 0 kudos
Maatari
by New Contributor III
  • 1927 Views
  • 2 replies
  • 0 kudos

Resolved! How to monitor Kafka consumption / lag when working with spark structured streaming?

I have just find out spark structured streaming do not commit offset to kafka but use its internal checkpoint system and that there is no way to visualize its consumption lag in typical kafka UI- https://community.databricks.com/t5/data-engineering/c...

  • 1927 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @Maatari , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
thackman
by New Contributor III
  • 10568 Views
  • 5 replies
  • 0 kudos

Databricks cluster random slow start times.

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python l...

thackman_1-1720639616797.png thackman_0-1720639478363.png
  • 10568 Views
  • 5 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @thackman , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 0 kudos
4 More Replies
ashraf1395
by Honored Contributor
  • 753 Views
  • 1 replies
  • 0 kudos

Spark code not running bcz of incorrect compute size

I have a dataset having 260 billion recordsI need to group by 4 columns and find out the sum on four other columnsI increased the memory to e32 for driver and workers nodes, max workers is 40The job still is stuck in this aggregate step where I’m wri...

  • 753 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @ashraf1395 , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 0 kudos
seefoods
by New Contributor III
  • 442 Views
  • 1 replies
  • 1 kudos

audit log for workspace users

Hello Everyone, How to retrieve trace execution of a Notebook databricks GCP Users Workspace.  Thanks

  • 442 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @seefoods ,I think you can use system tables to get such information:https://docs.databricks.com/en/admin/system-tables/audit-logs.html

  • 1 kudos
WAHID
by New Contributor II
  • 486 Views
  • 0 replies
  • 0 kudos

GDAL on Databricks serverless compute

I am wondering if it's possible to install and use GDAL on Databricks serverless compute. I couldn't manage to do that using pip install gdal, and I discovered that init scripts are not supported on serverless compute.

  • 486 Views
  • 0 replies
  • 0 kudos
mr_robot
by New Contributor
  • 2473 Views
  • 3 replies
  • 3 kudos

Update datatype of a column in a table

I have a table in databricks with fields name: string, id: string, orgId: bigint, metadata: struct, now i want to rename one of the columns and change it type. In my case I want to update orgId to orgIds and change its type to map<string, string> One...

Data Engineering
tables delta-tables
  • 2473 Views
  • 3 replies
  • 3 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 3 kudos

You can use REPLACE COLUMNS.ALTER TABLE your_table_name REPLACE COLUMNS ( name STRING, id BIGINT, orgIds MAP<STRING, STRING>, metadata STRUCT<...> );

  • 3 kudos
2 More Replies
ashraf1395
by Honored Contributor
  • 755 Views
  • 1 replies
  • 1 kudos

Resolved! Querying Mysql db from Azure databricks where public access is disabled

Hi there,We are trying to setup a infra that ingest data from MySQL hosted on awa EC2 instance with pyspark and azure databricks and dump to the adls storage.Since databases has public accessibility disabled and how can I interact with MySQL from azu...

  • 755 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you will need some kind of tunnel that opens the db server to external access.Perhaps a vpn is an option?If not: won't be possible.An alternative way would be to have some local LAN tool extract the data and then move it to S3/... and afterwards let ...

  • 1 kudos
vvzadvor
by New Contributor III
  • 1991 Views
  • 4 replies
  • 1 kudos

Resolved! Debugging python code outside of Notebooks

Hi experts,Does anyone know if there's a way of properly debugging python code outside of notebooks?We have a complicated python-based framework for loading files, transforming them according to the business specification and saving the results into ...

  • 1991 Views
  • 4 replies
  • 1 kudos
Latest Reply
vvzadvor
New Contributor III
  • 1 kudos

OK, I can now confirm that remote debugging with stepping into your own libraries installed on the cluster is possible and is actually pretty convenient using a combination of databricks-connect Python library and a Databricks extension for VSCode. S...

  • 1 kudos
3 More Replies
ashraf1395
by Honored Contributor
  • 920 Views
  • 1 replies
  • 2 kudos

Resolved! Reading a materialised view locally or using databricks api

Hi there, This was my previous approach - I had a databricks notebook with a streaming table bronze level reading data from volumes which created a 2 downstream tables.- 1st A a materialised view gold level, another a table for storing ingestion_meta...

  • 920 Views
  • 1 replies
  • 2 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 2 kudos

I used this approach - Querying the materialised view using databricks serverless SQL endpoint by connecting it with SQL connect. Its working right now. If I face any issues, I will write it into a normal table and delta share it.Thanks for your repl...

  • 2 kudos
MyTrh
by New Contributor III
  • 2535 Views
  • 7 replies
  • 3 kudos

Resolved! Delta table with unique columns incremental refresh

Hi Team,We have one huge streaming table from which we want to create another streaming table in which we will pick few columns from the original streaming table. But in this new table the rows must be unique.Can someone please help me with the imple...

  • 2535 Views
  • 7 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @MyTrh ,Ok, I think I created similiar use case to yours. I have streaming table with column structure based on your exampleCREATE OR REFRESH STREAMING TABLE clicks_raw AS SELECT *, current_timestamp() as load_time FROM cloud_files('/Volumes/dev/d...

  • 3 kudos
6 More Replies
Soma
by Valued Contributor
  • 1779 Views
  • 3 replies
  • 1 kudos

Resolved! Where does custom state store the data

There are couple of custom state functions like mapgroupswithstate,ApplyinpandaswithStateWhich has a internal state maintained is it maintained in same statestore(rocksdb) as aggregation state store function ​

  • 1779 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @somanath Sankaran​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 1 kudos
2 More Replies
Box_clown
by New Contributor II
  • 1784 Views
  • 3 replies
  • 3 kudos

Set Not null changes Data type

Hello,Just found this issue this week and thought I would ask. An Alter Table alter column set not null is changing a varchar(x) data type to string type. I believe this should happen in most environments so I wouldn't need to supply code...Create a ...

  • 1784 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @Box_clown ,To be precise, Delta Lake format is based on parquet files. For strings, Parquet only has one data type: StringTypeSo, basically varchar(n) data type under the hood is represented as string with check constraint on the length of the st...

  • 3 kudos
2 More Replies
acegerace
by New Contributor II
  • 996 Views
  • 1 replies
  • 1 kudos

RLS

When applying a function to a table for RLS, do users require SELECT privileges on the table used for RLS. And, do users also require EXECUTE privileges on the function. Not clear on this form doco.

  • 996 Views
  • 1 replies
  • 1 kudos
Latest Reply
mahfooz_iiitian
New Contributor III
  • 1 kudos

Yes, you require select permission for the table.For functions, if it is a built-in function (such as is_account_group_member), then you do not require permission. However, if it is a custom function, you must have access to execute it.You can refer ...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels