cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

PraveenReddy21
by New Contributor III
  • 580 Views
  • 7 replies
  • 2 kudos

Resolved! i created External database but unable to transferring table to Storage Acc(BLOBcontainer-Gold)

Hi , I done activities  Bronze and Silver , after i trying to saving table to Gold  container but unable to storing .i created External database .I want store  data to PARQUET but not supporting ,only DELTA.only  MANAGED LOCATION supporting but unabl...

  • 580 Views
  • 7 replies
  • 2 kudos
Latest Reply
PraveenReddy21
New Contributor III
  • 2 kudos

Thank You  Rishabh.

  • 2 kudos
6 More Replies
szatricia
by New Contributor
  • 214 Views
  • 0 replies
  • 0 kudos

Test And Tren Cycle Reviews 2024: Safe Or Not?

Unquestionably, "Look before you leap." Sure, why not? Where can their circles smoke out incomparable Muscle Building Supplement schedules? This didn't take long. Are we satisfied to assume that in connection with this trite remark? That is pointless...

  • 214 Views
  • 0 replies
  • 0 kudos
Filippo
by New Contributor
  • 178 Views
  • 0 replies
  • 0 kudos

Issue with View Ownership Reassignment in Unity Catalog

Hello,It appears that the ownership rules for views and functions in Unity Catalog do not align with the guidelines provided in the “Manage Unity Catalog object ownership” documentation on Microsoft Learn.When attempting to reassign the ownership of ...

  • 178 Views
  • 0 replies
  • 0 kudos
KosmaS
by New Contributor III
  • 340 Views
  • 2 replies
  • 0 kudos

Skewness / Salting with countDistinct

Hey Everyone,I experience data skewness for: df = (source_df .unionByName(source_df.withColumn("region", lit("Country"))) .groupBy("zip_code", "region", "device_type") .agg(countDistinct("device_id").alias("total_active_unique"), count("device_id").a...

Screenshot 2024-08-05 at 17.24.08.png
  • 340 Views
  • 2 replies
  • 0 kudos
Latest Reply
KosmaS
New Contributor III
  • 0 kudos

Hey @Kaniz_Fatma thanks for the reply. I tried to spend some time on your response.You're suggesting 'double aggregation' and as I'd be guessing it should look more or less this way:df = (source_df .unionByName(source_df.withColumn("region", lit("Cou...

  • 0 kudos
1 More Replies
rameshybr
by New Contributor II
  • 402 Views
  • 4 replies
  • 0 kudos

DQ-Quality Check- what are the best method to validate the two parquet files .

DQ-Quality Check. we have to validate the data between landing data and bronze data with data quality . below are the data quality checks.  1. find the counts between the 2 files. if it is matched then go for 2 point.2. if counts are matched, then va...

  • 402 Views
  • 4 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 0 kudos

Try with this , this is for second point if first points already matches .# Define key columns key_columns = ["key_column1", "key_column2"] # Adjust according to your data schema # Perform an outer join to find mismatches joined_df = landing_df.ali...

  • 0 kudos
3 More Replies
CaptainJack
by New Contributor III
  • 266 Views
  • 1 replies
  • 0 kudos

Upload files from Databricks to Google Drive

Is it possible to upload files from Databricks to Google Drive? How?

  • 266 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@CaptainJack You can use Python + Google Drive API. Example: https://medium.com/the-team-of-future-learning/integrating-google-drive-api-with-python-a-step-by-step-guide-7811fcd16c44

  • 0 kudos
shahabm
by New Contributor II
  • 1275 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks job keep getting failed due to GC issue

There is a job that running successful but it's for more than a month we are experiencing long run which gets failed. In the stdout log file(attached), there are numerous following messages:[GC (Allocation Failure) [PSYoungGen:...]    and   [Full GC ...

  • 1275 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @shahabm, To resolve this, try increasing executor memory and enabling off-heap memory, experimenting with the G1GC garbage collector. Check for data skew and optimize partitioning to balance load, ensure adequate resources to avoid executor decom...

  • 1 kudos
1 More Replies
ArjunS310
by New Contributor III
  • 1262 Views
  • 4 replies
  • 3 kudos

Resolved! Did not receive a badge upon completing databricks fundamentals assessment

Team,I completed the training and assessment on databricks assesment and passed with 80% and received a certificate of completion but did not receive a badge as mentioned in the description of the course. Could you please help.​

  • 1262 Views
  • 4 replies
  • 3 kudos
Latest Reply
Danny_Lee
Contributor III
  • 3 kudos

I find the badge usually comes the next day with a link to https://credentials.databricks.com/ where you can download a certificate and share a badge in social media.

  • 3 kudos
3 More Replies
KuruDev
by New Contributor II
  • 236 Views
  • 1 replies
  • 0 kudos

Databricks Asset Bundle - Not fully deploying in Azure Pipeline

 Hello Community, I'm encountering a challenging issue with my Azure Pipeline and I'm hoping someone here might have some insights. I'm attempting to deploy a Databricks bundle that includes both notebooks and workflow YAML files. When deploying the ...

  • 236 Views
  • 1 replies
  • 0 kudos
Latest Reply
KuruDev
New Contributor II
  • 0 kudos

Not the workflow yaml files are missing, i meant, that it doesnt create my workflows.

  • 0 kudos
tonypiazza
by New Contributor II
  • 179 Views
  • 0 replies
  • 0 kudos

Databricks Asset Bundle - Job Cluster - JDBC HTTP Path

I am currently working on deploying dbt jobs using a Databricks Asset Bundle. In my existing job configuration, I am using an all-purpose cluster and the JDBC HTTP Path was manually copied from the web UI. Now that I am trying to switch to using a jo...

  • 179 Views
  • 0 replies
  • 0 kudos
shan_chandra
by Esteemed Contributor
  • 163 Views
  • 0 replies
  • 0 kudos

How to calculate the individual file count, file size and number of rows on a Delta table?

There are instances where we need to know the individual file size or file count present in the delta table rather than the average size. we can use the below query to determine that. %sql select count(*) as rows, file_path, file_size from (select * ...

  • 163 Views
  • 0 replies
  • 0 kudos
rameshybr
by New Contributor II
  • 263 Views
  • 2 replies
  • 0 kudos

How to get the files one by one in blob storage using pyspark/python

how to write the pyspark/python to get the files one by one in blob storage.

  • 263 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 0 kudos

@rameshybr # List files in a directory files = dbutils.fs.ls("/mnt/<mount-name>/path/to/directory") for file in files: file_path = file.path # Read each file into a DataFrame / if you file format is parquet for example i am taking df = s...

  • 0 kudos
1 More Replies
alxsbn
by New Contributor III
  • 3195 Views
  • 5 replies
  • 7 kudos

How to change SQL editor / schema browser defalut catalog / database

On SQL editor / schema browser Is there a way to change the default catalog / database ? My mine always fixed on my unity catalog. 

  • 3195 Views
  • 5 replies
  • 7 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 7 kudos

Hi, From the dropdown you can get the data objects.https://docs.databricks.com/sql/user/queries/queries.html#browse-data-objects-in-sql-editorPlease let us know if this helps. Also, please tag @Debayan​ with your next comment so that I will get notif...

  • 7 kudos
4 More Replies
ashraf1395
by Contributor II
  • 160 Views
  • 0 replies
  • 0 kudos

Schema issue while fetching data from oracle

I dont have the complete context of the issue.But Here it is what I know, a friend of mine facing this""I am fetching data from Oracle data in databricks using python.But every time i do it the schema gets changesso if the column is of type decimal f...

  • 160 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels