cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

subhas_hati
by New Contributor
  • 216 Views
  • 1 replies
  • 0 kudos

JOIN Two Big Tables, each being some terabytes.

What is the strategy for joinning two big tables, each being some terrabytes. 

  • 216 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @subhas_hati, Enable AQE to dynamically optimize the join strategy at runtime based on the actual data distribution. This can help in choosing the best join strategy automatically If you are using Delta tables, you can leverage the MERGE statement...

  • 0 kudos
vineet_chaure
by New Contributor
  • 386 Views
  • 1 replies
  • 0 kudos

Handling Large Integers and None Values in pandas UDFs on Databricks

Hi Everyone,I hope this message finds you well.I am encountering an issue with pandas UDFs on a Databricks shared cluster and would like to seek assistance from the community. Below is a summary of the problem:Description:I am working with pandas UDF...

  • 386 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @vineet_chaure, By default, Spark converts LongType to float64 when transferring data to pandas. You can use Arrow-optimized pandas UDFs introduced in Apache Spark 3.5. Please try with below code: import pandas as pdimport pyarrow as pafrom pys...

  • 0 kudos
TejeshS
by New Contributor III
  • 784 Views
  • 2 replies
  • 2 kudos

How to apply Column masking and RLS on Views in Databricks

Hello Databricks Community,We are working on a use case where we need to apply column masking and row-level filtering on top of existing views or while creating new views dynamically.Currently, we know that Delta tables support column masking and row...

  • 784 Views
  • 2 replies
  • 2 kudos
Latest Reply
MadhuB
Valued Contributor
  • 2 kudos

@TejeshS  You can alternatively use, mask function instead of hardcoding with value 'Masked'.CREATE OR REPLACE VIEW masked_employees AS SELECT Name, Department, CASE WHEN current_user() IN ('ab***@gmail.com', 'xy***@gmail.com') THEN Salary ELSE mask(...

  • 2 kudos
1 More Replies
AndyM
by New Contributor II
  • 9263 Views
  • 2 replies
  • 2 kudos

DAB wheel installation job fails, user error Library from /Workspace not allowed

Hi Community!I am getting started with DABs and just recently ran into a following error after deployment trying to run my bundle that has a wheel installation job. Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task main_task fail...

  • 9263 Views
  • 2 replies
  • 2 kudos
Latest Reply
BillBishop
New Contributor III
  • 2 kudos

Did you try this in your databricks.yml?experimental: python_wheel_wrapper: true

  • 2 kudos
1 More Replies
Direo
by Contributor II
  • 2296 Views
  • 2 replies
  • 7 kudos

Performance Issue with UC Read from Federated SQL Table vs JDBC Read from SQL Server

Hi everyone,I'm currently facing a significant performance issue when comparing the execution times of a query sent through JDBC versus a similar query executed through Databricks SQL (using Unity Catalog to access a federated SQL table).JDBC Query:j...

Data Engineering
Federated queries
JDBC
performance issue
Unity Catalog
  • 2296 Views
  • 2 replies
  • 7 kudos
Latest Reply
pdiamond
Contributor
  • 7 kudos

I've found the JDBC query to be faster than the federated query because in our testing, the federated query does not pass down the full query to the source database. Instead, it's running "select * from table", pulling all of the data into Databricks...

  • 7 kudos
1 More Replies
juchom
by New Contributor II
  • 414 Views
  • 4 replies
  • 0 kudos

Error when creating a compute resource with runtime 16.1

Hello,Is there any restriction on community edition runtimes, because when I try to create a compute resource with runtime 16.1 it takes a long time and then always end with a failure like the following screenshot.Thanks for your help,

juchom_0-1739270362738.png
  • 414 Views
  • 4 replies
  • 0 kudos
Latest Reply
juchom
New Contributor II
  • 0 kudos

Thanks for the details, are you using any type of container (docker) image? Or any special setting on the cluster?Nothing special, I just click on create compute button, then select 16.1 from the dropdown and click create compute button again.If you ...

  • 0 kudos
3 More Replies
jv_v
by Contributor
  • 1689 Views
  • 7 replies
  • 2 kudos

Resolved! Issue with Installing Remorph Reconcile Tool and Compatibility Clarification

I am currently working on a table migration project from a source Hive Metastore workspace to a target Unity Catalog workspace. After migrating the tables, I intend to write table validation scripts using the Remorph Reconcile tool. However, I am enc...

  • 1689 Views
  • 7 replies
  • 2 kudos
Latest Reply
VINTER_S
New Contributor II
  • 2 kudos

Try to see the python path during installation by using where python or similar command.it should show the correct folder installation path which is generally script folder for windows and bin folder for mac

  • 2 kudos
6 More Replies
Johannes_E
by New Contributor II
  • 1349 Views
  • 1 replies
  • 1 kudos

Resolved! How to develop with databricks connect smoothly?

We are working with Databrick Connect and Visual Studio Code in our project. We mainly want to program in the IDE (VS Code) so that we can use the advantages of the IDE compared to notebooks. Therefore, we write most of the code in .py files and actu...

  • 1349 Views
  • 1 replies
  • 1 kudos
Latest Reply
ChrisChieu
Databricks Employee
  • 1 kudos

You can set break points and debug within notebook cells. There's an example in this DAIS talk at 15:27. I recommend the entire talk as a demo. To complete the point, here is an additional documentation about notebook cells debugging with Databricks ...

  • 1 kudos
dbuserng
by New Contributor II
  • 606 Views
  • 1 replies
  • 0 kudos

Trigger Databricks Workflow when other workflows succeeded

Hi,I have 3 separate workflows with 3 different triggers and the thing I would like to achieve is - after all of these 3 jobs completed & succeeded I would like to trigger another job. Is it possible?These 3 jobs have to stay separate (I cannot combi...

  • 606 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @dbuserng, It is possible but it requires custom code setup based on your use-case. You can use job REST API: https://docs.databricks.com/api/workspace/jobs. Create a Monitoring Job: Set up a job that will monitor the completion status of the thre...

  • 0 kudos
priyansh
by New Contributor III
  • 4397 Views
  • 10 replies
  • 0 kudos

Error in migration with UCX tool

Hey folks!I am facing an issue while migration the tables from Hive to UC using UCX tool; after completely running the setup and getting the assessment overview, we ran the following commands i.e.  "databricks labs ucx create-table-mapping" but after...

  • 4397 Views
  • 10 replies
  • 0 kudos
Latest Reply
Akash_Wadhankar
New Contributor III
  • 0 kudos

While running the UCX tool in workspace we are not able to access the tables which are created in hive metastore on default schema. When we run the migrate-table workflow we get the error that the tables are not accessible. We get the following error...

  • 0 kudos
9 More Replies
KristiLogos
by Contributor
  • 572 Views
  • 3 replies
  • 0 kudos

Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection

Hello, I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, but I keep seeig this Simba JDBC exception. ...

  • 572 Views
  • 3 replies
  • 0 kudos
Latest Reply
KristiLogos
Contributor
  • 0 kudos

@Alberto_Umana  In addition to my last comment:For adjusting the spark.sql.shuffle.partitions and spark.executor.memory, I tried this but I was still seeing the same error  spark = (    SparkSession.builder    .appName("GA4 Bronze Table Ingestion")  ...

  • 0 kudos
2 More Replies
SteveC527
by New Contributor
  • 1576 Views
  • 5 replies
  • 0 kudos

Medallion Architecture and Databricks Assistant

I am in the process of rebuilding the data lake at my current company with databricks and I'm struggling to find comprehensive best practices for naming conventions and structuring medallion architecture to work optimally with the Databricks assistan...

  • 1576 Views
  • 5 replies
  • 0 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 0 kudos

Our initial approach was to have catalogs for sources and uses, but that was confusing and the grade of data wasn't obvious.  And, as we began to join data, it was unclear where to put the final datasets.  We switched to three catalogs--one for bronz...

  • 0 kudos
4 More Replies
zmsoft
by Contributor
  • 691 Views
  • 4 replies
  • 0 kudos

Resolved! How to load PowerBI Dataset into databricks

Hi there, I would like to know how to load power bi dataset into databricks Thanks&Regards, zmsoft

  • 691 Views
  • 4 replies
  • 0 kudos
Latest Reply
jack533
New Contributor III
  • 0 kudos

I don't think it's possible. While loading a table from DataBricks into a PowerBI dataset is possible, the opposite is not true.

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels