cancel
Showing results for 
Search instead for 
Did you mean: 
Knowledge Sharing Hub
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SumitSingh
by Contributor
  • 3003 Views
  • 7 replies
  • 9 kudos

From Associate to Professional: My Learning Plan to ace all Databricks Data Engineer Certifications

In today’s data-driven world, the role of a data engineer is critical in designing and maintaining the infrastructure that allows for the efficient collection, storage, and analysis of large volumes of data. Databricks certifications holds significan...

SumitSingh_0-1721402402230.png SumitSingh_1-1721402448677.png SumitSingh_2-1721402469214.png
  • 3003 Views
  • 7 replies
  • 9 kudos
Latest Reply
sandeepmankikar
New Contributor III
  • 9 kudos

As an additional tip for those working towards both the Associate and Professional certifications, I recommend avoiding a long gap between the two exams to maintain your momentum. If possible, try to schedule them back-to-back with just a few days in...

  • 9 kudos
6 More Replies
DouglasMoore
by Databricks Employee
  • 3640 Views
  • 1 replies
  • 1 kudos

How to enable unity catalog system tables?

Unity Catalog system tables provide lots of metadata & log data related to the operations of Databricks. System tables are organized into separate schemas containing one to a few tables owned and updated by Databricks. The storage and the cost of the...

  • 3640 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hi @DouglasMoore , Thank you for sharing this. Until now I used to check the information schema. I am not able to find these details in any official doc. Do you have a reference?   

  • 1 kudos
NandiniN
by Databricks Employee
  • 1065 Views
  • 0 replies
  • 0 kudos

Monitoring a Streaming Job

If you have a streaming job, you need to check the batch metrics to be able to understand the stream progress. However, here are some other suggestions which we can use to monitor a streaming job and be stuck in a "hung" state. Streaming Listeners sp...

  • 1065 Views
  • 0 replies
  • 0 kudos
NandiniN
by Databricks Employee
  • 1001 Views
  • 0 replies
  • 0 kudos

Why configure a job timeout?

If you use Databricks Jobs for your workloads, it is possible you might have run into a situation where you find your jobs to be in "hung" state. Before cancelling the job it is important to collect the thread dump as I described here to be able to f...

  • 1001 Views
  • 0 replies
  • 0 kudos
MichTalebzadeh
by Valued Contributor
  • 1273 Views
  • 1 replies
  • 0 kudos

A handy tool called spark-column-analyser

I just wanted to share a tool I built called spark-column-analyzer. It's a Python package that helps you dig into your Spark DataFrames with ease.Ever spend ages figuring out what's going on in your columns? Like, how many null values are there, or h...

Knowledge Sharing Hub
Generative AI
python
spark
  • 1273 Views
  • 1 replies
  • 0 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 0 kudos

An example added to README in GitHubDoing analysis for column PostcodeJson formatted output{"Postcode": {"exists": true,"num_rows": 93348,"data_type": "string","null_count": 21921,"null_percentage": 23.48,"distinct_count": 38726,"distinct_percentage"...

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1163 Views
  • 0 replies
  • 2 kudos

VariantType + Parse_json()

In Spark 4.0, there are no more data type mismatches when converting dynamic JSONs, as the new data type VariantType comes with a new function to parse JSONs. Stay tuned for 4.0 release.

variant.png
  • 1163 Views
  • 0 replies
  • 2 kudos
youssefmrini
by Databricks Employee
  • 1372 Views
  • 0 replies
  • 1 kudos

Type widening is in Public Preview

You can now enable type widening on tables backed by Delta Lake. Tables with type widening enabled allow changing the type of columns to a wider data type without rewriting underlying data files.   For more information:https://docs.databricks.co...

  • 1372 Views
  • 0 replies
  • 1 kudos
Yassine_bens
by New Contributor
  • 1202 Views
  • 1 replies
  • 0 kudos

How to convert txt files to delta tables

Hello members of Databricks's comunity,I am currently working on a project where we collect data from machines, that data is in .txt format. The data is currently in an Azure container, I need to clean the files and convert them to delta tables, how ...

  • 1202 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

https://docs.databricks.com/en/ingestion/add-data/upload-data.html 

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 616 Views
  • 0 replies
  • 0 kudos

RocksDB for storing state stream

Now, you can keep the state of stateful streaming in RocksDB. For example, retrieving keys from memory to check for duplicate records inside the watermark is now faster. #databricks

state2.png
  • 616 Views
  • 0 replies
  • 0 kudos
legobricks
by New Contributor II
  • 1646 Views
  • 4 replies
  • 0 kudos

Unable to mount GCS bucket with underscores in the name

I have two buckets with the same configurations and labels.One is named my-bucket and the other is my_bucket. I am able to mount my-bucket but get an opaque error message when trying to mount my_bucket. Is this known/expected behavior? Are underscore...

  • 1646 Views
  • 4 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @legobricks , Curious on the error that you are getting. However, for GCS - https://cloud.google.com/storage/docs/buckets#naming I do see underscores are allowed but there is also a note below: You can use a bucket name in a DNS record as part of ...

  • 0 kudos
3 More Replies
MichTalebzadeh
by Valued Contributor
  • 1244 Views
  • 0 replies
  • 0 kudos

Financial Crime detection with the help of Apache Spark, Data Mesh and Data Lake

For those interested in Data Mesh and Data Lakes for FinCrime detection:Data mesh is a relatively new architectural concept for data management that emphasizes domain-driven data ownership and self-service data availability. It promotes the decentral...

Knowledge Sharing Hub
data lakes
Data Mesh
financial crime
spark
  • 1244 Views
  • 0 replies
  • 0 kudos
Hp3
by New Contributor
  • 1182 Views
  • 0 replies
  • 0 kudos

Hiring Databricks Data Architect roles

Hi,I am a recruiter and I am looking for places to post some data bricks I have coming out. I have several fully remote, high-level data databricks, architect roles. Of course I will post to LinkedIn, but I was just curious if there are any other pla...

  • 1182 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now