cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

LarsMewa
by New Contributor III
  • 1142 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

I'm getting the following SparkOutOfMemoryError message while reading a 500mb json file, see below. I'm loading four csv files (around 150mb per file) and the json file in the same pipeline. When I load the json file alone it reads fine, same when I ...

  • 1142 Views
  • 4 replies
  • 1 kudos
Latest Reply
LarsMewa
New Contributor III
  • 1 kudos

This fixed it:As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large J...

  • 1 kudos
3 More Replies
tana_sakakimiya
by Contributor
  • 1160 Views
  • 1 replies
  • 1 kudos

Resolved! Any Advice on Dynamic Masking while maintaining performance?

I plan to mask columns with a specific tag like "sensitive", "PII" which represents that the column values are ought to be seen by privileged user groups because they contain credentials or personal identity data.To implement that i plan to create a ...

  • 1160 Views
  • 1 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 1 kudos

Hi @tana_sakakimiya Your approach—using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tags—is a recommended and scalable way to manage data access in Databricks, especially for compliance and priva...

  • 1 kudos
shweta_m
by New Contributor III
  • 676 Views
  • 3 replies
  • 2 kudos

Resolved! Assigning Databricks Account Admin role to User group

Hi,As per our company policy, individual users should not be given elevated privileges. Permissions should be assigned to user groups, so that group membership can be managed at the AD level.In that context, is there a way to assign the 'Databricks A...

  • 676 Views
  • 3 replies
  • 2 kudos
Latest Reply
shweta_m
New Contributor III
  • 2 kudos

Hi @szymon_dybczak I tried this and it worked.Thanks!

  • 2 kudos
2 More Replies
Travis84
by New Contributor II
  • 591 Views
  • 1 replies
  • 1 kudos

Which table should i use for a range join hint?

I am a bit confused about how to use range join hints. Consider the following query  ```SELECT  p.id,  p.ts,  p.value,  rg.metric1,  rg.metric2,  rg.ts AS range_tsFROM points pLEFT JOIN LATERAL (  SELECT r.metric1, r.metric2, r.ts  FROM ranges r  WHE...

  • 591 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Travis84 , Below are the answers to your questions: Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join querie...

  • 1 kudos
youssefmrini
by Databricks Employee
  • 6224 Views
  • 2 replies
  • 0 kudos
  • 6224 Views
  • 2 replies
  • 0 kudos
Latest Reply
idtaylor
New Contributor II
  • 0 kudos

No you can not. Demo video was from 2023, Alation no longer appears in Databricks Marketplace, and Alation no longer allows free Trials, instead having you fill out a request for a demo--not the best for proving out practical functionality.

  • 0 kudos
1 More Replies
TetianaDromova
by Databricks Partner
  • 1924 Views
  • 1 replies
  • 1 kudos

PGP decryption in python file

The same decryption code works in notebook, but fails in python file: import gnupgfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)gpg = gnupg.GPG()decryption_key = dbutils.secrets.get(secret_scope, secret_name)gpg.import_keys(decryption_ke...

  • 1924 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

Hi @TetianaDromova  Thank you for reaching out and waiting for a response. Having your code working on a notebook is a significant first step, so you are on the right path, but then moving into a Python file, we must consider specific details: How is...

  • 1 kudos
Phani1
by Databricks MVP
  • 3183 Views
  • 3 replies
  • 1 kudos

Available Connectors for ServiceNow to Databricks

 Hi Team,What are the available connectors to bring data and metadata from ServiceNow to Databricks, and what are the best options/best practices for integrating ServiceNow with Databricks ?Regards,Phani

  • 3183 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Phani1 ,There's official databricks service now connection. You can read about it at below link:Configure ServiceNow for Databricks ingestion - Azure Databricks | Microsoft LearnAnd here you have example of how to create ingestion pipeline using ...

  • 1 kudos
2 More Replies
khasim76
by New Contributor II
  • 577 Views
  • 3 replies
  • 1 kudos
Data Engineering
community edition
json
sql
  • 577 Views
  • 3 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 1 kudos

Hello @khasim76 Good day!!Can i know what is the error msg you are getting?  is it  legacy community edition or free community edition? Also there are some limitation like - Table results: ≤10,000 rows or 2 MB (whichever first). - Text outputs: Trunc...

  • 1 kudos
2 More Replies
seefoods
by Valued Contributor
  • 3387 Views
  • 10 replies
  • 4 kudos

Resolved! append using foreach batch autoloader

Hello Guys, when i append i have this error someone knows how to fix it? raise converted from None pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `s_test` because it already exists. Ch...

  • 3387 Views
  • 10 replies
  • 4 kudos
Latest Reply
seefoods
Valued Contributor
  • 4 kudos

Thanks a lot @szymon_dybczak 

  • 4 kudos
9 More Replies
Malthe
by Valued Contributor II
  • 2491 Views
  • 1 replies
  • 2 kudos

Resolved! Trigger table sync from job

When setting up a table sync using the UI, a pipeline is created, but this is not visible through the Pipelines overview–presumably because it's "managed" by the target table (at least this is where you manage the data ingest process.This means that ...

  • 2491 Views
  • 1 replies
  • 2 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 2 kudos

Hi @Malthe  The recommended method is to manage and trigger the sync via table-level APIs or management interfaces, not the pipeline-level job triggers: For Unity Catalog synced tables (e.g., syncing to Postgres), triggering a sync or refresh is per...

  • 2 kudos
Balram-snaplogi
by New Contributor II
  • 1573 Views
  • 1 replies
  • 0 kudos

Resolved! Clarification on Default Socket Timeout Behavior Introduced in v2.6.35

Hi Team,I’m currently using the Databricks JDBC driver version 2.6.40, and I’ve noticed intermittent socket timeout errors during pipeline execution.I came across the release note for version 2.6.35 mentioning the following change:[SPARKJ-688] The co...

  • 1573 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @Balram-snaplogi  The default socket timeout value introduced in Databricks JDBC driver version 2.6.35 is 300 seconds (5 minutes). Please refer to the below kb.  https://kb.databricks.com/dbsql/job-timeout-when-connecting-to-a-sql-endpoint-over-jd...

  • 0 kudos
bhargavabasava
by New Contributor III
  • 587 Views
  • 1 replies
  • 0 kudos

Resolved! Establish connectivity from databricks serverless to Cloud SQL (GCP) Postgres database

Hi team,We have setup workspace using databricks managed network (databricks is setup on GCP). There is a cloud sql db in customer managed network. We want to establish connectivity from databricks serverless to database. Could someone please tell di...

  • 587 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @bhargavabasava  Can you please refer to the below doc for complete details.  https://docs.databricks.com/gcp/en/security/network/classic/private-service-connect

  • 0 kudos
kkg33
by New Contributor II
  • 845 Views
  • 2 replies
  • 1 kudos

Resolved! unable to overwrite a table using add data in hive metastore.

I used to see the dropdown of tables under hive_metastore while using add data -> Create or modify table from file upload -> overwrite existing table but not able to from couple days. No change in permissions. I can still see this for unity catalog t...

  • 845 Views
  • 2 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

hi @kkg33  Thank you for reaching out and for your patience in awaiting a response. Have you tried to query the table directly from a notebook or using the Catalog Explorer? Or is this behavior only happening on the "Upload File UI"? If you are unabl...

  • 1 kudos
1 More Replies
priyansh
by Databricks Partner
  • 4938 Views
  • 4 replies
  • 1 kudos

What stuff does UCX can not do?

Hey folks! I want to know what are the limitations of UCX?, means what are the thing specially during migration we have to do manually?UCX is currently in developing mode that means it may have some drawbacks too, I want to know what are thsose?

  • 4938 Views
  • 4 replies
  • 1 kudos
Latest Reply
monstercop
New Contributor II
  • 1 kudos

Guess you will find some differences in before and after, such as the use of the wildcard to point to folders in ADLS2 for external tables is supported in hive but not in UC catalogs.

  • 1 kudos
3 More Replies
mkwparth
by Databricks Partner
  • 8934 Views
  • 6 replies
  • 3 kudos

Job Compute

Hey Community, I’m new to this platform and need some guidance.I’ve set up a job on a basic compute configuration: 8GB RAM, 4 Core CPU, 1 Worker (Standard F4), with DLT runtime 16.4.8. However, my job is running slower than expected. When I checked t...

  • 8934 Views
  • 6 replies
  • 3 kudos
Latest Reply
mkwparth
Databricks Partner
  • 3 kudos

Hey @Khaja_Zaffer, I’m not using any caching in my code. The cache showing up in the chart is likely from the OS page cache. What do you say?

  • 3 kudos
5 More Replies
Labels