cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

spicysheep
by New Contributor
  • 705 Views
  • 3 replies
  • 2 kudos

Where to find comprehensive docs on databricks.yaml / DAB settings options

Where can I find documentation on how to set cluster settings (e.g., AWS instance type, spot vs on-demand, number of machines) in Databricks Asset Bundle databicks.yaml files? The only documentation I've come across mentions these things indirectly, ...

  • 705 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 2 kudos

Hi @spicysheep , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 2 kudos
2 More Replies
Maatari
by New Contributor III
  • 818 Views
  • 2 replies
  • 0 kudos

Resolved! How to monitor Kafka consumption / lag when working with spark structured streaming?

I have just find out spark structured streaming do not commit offset to kafka but use its internal checkpoint system and that there is no way to visualize its consumption lag in typical kafka UI- https://community.databricks.com/t5/data-engineering/c...

  • 818 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @Maatari , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
thackman
by New Contributor II
  • 4443 Views
  • 5 replies
  • 0 kudos

Databricks cluster random slow start times.

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python l...

thackman_1-1720639616797.png thackman_0-1720639478363.png
  • 4443 Views
  • 5 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @thackman , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 0 kudos
4 More Replies
ashraf1395
by Contributor II
  • 444 Views
  • 1 replies
  • 0 kudos

Spark code not running bcz of incorrect compute size

I have a dataset having 260 billion recordsI need to group by 4 columns and find out the sum on four other columnsI increased the memory to e32 for driver and workers nodes, max workers is 40The job still is stuck in this aggregate step where I’m wri...

  • 444 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @ashraf1395 , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 0 kudos
seefoods
by New Contributor III
  • 249 Views
  • 1 replies
  • 1 kudos

audit log for workspace users

Hello Everyone, How to retrieve trace execution of a Notebook databricks GCP Users Workspace.  Thanks

  • 249 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 1 kudos

Hi @seefoods ,I think you can use system tables to get such information:https://docs.databricks.com/en/admin/system-tables/audit-logs.html

  • 1 kudos
WAHID
by New Contributor II
  • 285 Views
  • 0 replies
  • 0 kudos

GDAL on Databricks serverless compute

I am wondering if it's possible to install and use GDAL on Databricks serverless compute. I couldn't manage to do that using pip install gdal, and I discovered that init scripts are not supported on serverless compute.

  • 285 Views
  • 0 replies
  • 0 kudos
mr_robot
by New Contributor
  • 623 Views
  • 3 replies
  • 3 kudos

Update datatype of a column in a table

I have a table in databricks with fields name: string, id: string, orgId: bigint, metadata: struct, now i want to rename one of the columns and change it type. In my case I want to update orgId to orgIds and change its type to map<string, string> One...

Data Engineering
tables delta-tables
  • 623 Views
  • 3 replies
  • 3 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 3 kudos

You can use REPLACE COLUMNS.ALTER TABLE your_table_name REPLACE COLUMNS ( name STRING, id BIGINT, orgIds MAP<STRING, STRING>, metadata STRUCT<...> );

  • 3 kudos
2 More Replies
ashraf1395
by Contributor II
  • 423 Views
  • 1 replies
  • 1 kudos

Resolved! Querying Mysql db from Azure databricks where public access is disabled

Hi there,We are trying to setup a infra that ingest data from MySQL hosted on awa EC2 instance with pyspark and azure databricks and dump to the adls storage.Since databases has public accessibility disabled and how can I interact with MySQL from azu...

  • 423 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you will need some kind of tunnel that opens the db server to external access.Perhaps a vpn is an option?If not: won't be possible.An alternative way would be to have some local LAN tool extract the data and then move it to S3/... and afterwards let ...

  • 1 kudos
vvzadvor
by New Contributor III
  • 632 Views
  • 4 replies
  • 1 kudos

Resolved! Debugging python code outside of Notebooks

Hi experts,Does anyone know if there's a way of properly debugging python code outside of notebooks?We have a complicated python-based framework for loading files, transforming them according to the business specification and saving the results into ...

  • 632 Views
  • 4 replies
  • 1 kudos
Latest Reply
vvzadvor
New Contributor III
  • 1 kudos

OK, I can now confirm that remote debugging with stepping into your own libraries installed on the cluster is possible and is actually pretty convenient using a combination of databricks-connect Python library and a Databricks extension for VSCode. S...

  • 1 kudos
3 More Replies
ashraf1395
by Contributor II
  • 453 Views
  • 1 replies
  • 2 kudos

Resolved! Reading a materialised view locally or using databricks api

Hi there, This was my previous approach - I had a databricks notebook with a streaming table bronze level reading data from volumes which created a 2 downstream tables.- 1st A a materialised view gold level, another a table for storing ingestion_meta...

  • 453 Views
  • 1 replies
  • 2 kudos
Latest Reply
ashraf1395
Contributor II
  • 2 kudos

I used this approach - Querying the materialised view using databricks serverless SQL endpoint by connecting it with SQL connect. Its working right now. If I face any issues, I will write it into a normal table and delta share it.Thanks for your repl...

  • 2 kudos
MyTrh
by New Contributor III
  • 1231 Views
  • 7 replies
  • 3 kudos

Resolved! Delta table with unique columns incremental refresh

Hi Team,We have one huge streaming table from which we want to create another streaming table in which we will pick few columns from the original streaming table. But in this new table the rows must be unique.Can someone please help me with the imple...

  • 1231 Views
  • 7 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 3 kudos

Hi @MyTrh ,Ok, I think I created similiar use case to yours. I have streaming table with column structure based on your exampleCREATE OR REFRESH STREAMING TABLE clicks_raw AS SELECT *, current_timestamp() as load_time FROM cloud_files('/Volumes/dev/d...

  • 3 kudos
6 More Replies
Soma
by Valued Contributor
  • 1214 Views
  • 3 replies
  • 1 kudos

Resolved! Where does custom state store the data

There are couple of custom state functions like mapgroupswithstate,ApplyinpandaswithStateWhich has a internal state maintained is it maintained in same statestore(rocksdb) as aggregation state store function ​

  • 1214 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @somanath Sankaran​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 1 kudos
2 More Replies
Box_clown
by New Contributor II
  • 706 Views
  • 3 replies
  • 3 kudos

Set Not null changes Data type

Hello,Just found this issue this week and thought I would ask. An Alter Table alter column set not null is changing a varchar(x) data type to string type. I believe this should happen in most environments so I wouldn't need to supply code...Create a ...

  • 706 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 3 kudos

Hi @Box_clown ,To be precise, Delta Lake format is based on parquet files. For strings, Parquet only has one data type: StringTypeSo, basically varchar(n) data type under the hood is represented as string with check constraint on the length of the st...

  • 3 kudos
2 More Replies
acegerace
by New Contributor II
  • 569 Views
  • 1 replies
  • 1 kudos

RLS

When applying a function to a table for RLS, do users require SELECT privileges on the table used for RLS. And, do users also require EXECUTE privileges on the function. Not clear on this form doco.

  • 569 Views
  • 1 replies
  • 1 kudos
Latest Reply
mahfooz_iiitian
New Contributor II
  • 1 kudos

Yes, you require select permission for the table.For functions, if it is a built-in function (such as is_account_group_member), then you do not require permission. However, if it is a custom function, you must have access to execute it.You can refer ...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels