cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Soh_m
by New Contributor
  • 879 Views
  • 1 replies
  • 0 kudos

Error accessing Managed Table with Row Level Security using Databricks Cluster

Hi Everyone,We are trying to implement Row Level Security In Delta Table and done testing (i.e. sql execution api,sql editor,sql notebook) using Sql Serverless in Unity Catalog.But when tried to access the table having RLS in Notebook using Pyspark w...

  • 879 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Soh_m, When dealing with JSON data in your streaming source, you have a couple of options for extracting fields. Let’s explore the trade-offs between using the colon sign operator and the schema+from_json function: Colon Sign Operator: The c...

  • 0 kudos
CBL
by New Contributor
  • 556 Views
  • 1 replies
  • 0 kudos

Schema Evolution in Azure databricks

Hi All -In my scenario, Loading data from 100 of Json files.Problem is, fields/columns are missing when JSON file contains new fields.Full Load: while writing JSON to delta use the option ("mergeschema", "true") so that we do not miss new columns Inc...

  • 556 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @CBL, Handling schema evolution during incremental data loads is crucial to ensure data consistency and prevent issues when new fields are introduced. Let’s explore some strategies for schema comparison in incremental loads: Checksum-based In...

  • 0 kudos
jabori
by New Contributor
  • 900 Views
  • 1 replies
  • 0 kudos

How can I pass job parameters to a dbt task?

I have a dbt task that will use dynamic parameters from the job: {"start_time": "{{job.start_time.[timestamp_ms]}}"}My SQL is edited like this:select 1 as idunion allselect null as idunion allselect {start_time} as idThis causes the task to fail. How...

  • 900 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @jabori , To correctly pass the start_time parameter in your dbt task, you can utilize dynamic value references provided by Databricks. These templated variables are replaced with appropriate values during task execution. Here’s how you can mod...

  • 0 kudos
Phani1
by Valued Contributor
  • 786 Views
  • 1 replies
  • 0 kudos

What are optimized solutions for moving on-premise Hadoop data

Hi Team ,What are optimized solutions for moving on-premise Hadoop/hadoop distributed file system  parquet data to Databricks  as Delta file?Regards,Phanindra

Data Engineering
delta
hadoop
  • 786 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, Migrating data from on-premises Hadoop to Databricks as Delta files involves several key steps. Let’s break it down: Administration: In Hadoop, you’re dealing with a monolithic distributed storage and computing platform. It consists ...

  • 0 kudos
chakradhar545
by New Contributor
  • 324 Views
  • 1 replies
  • 0 kudos

DatabricksThrottledException Error

Hi,Our scheduled job runs into below error once in a while and job fails. Any leads or thoughts please why we run into this once in a while and how to fix it pleaseshaded.databricks.org.apache.hadoop.fs.s3a.DatabricksThrottledException: Instantiate s...

  • 324 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @chakradhar545, The error message you’re encountering indicates a throttling issue when interacting with Amazon S3 using Databricks. Let’s break down the error and explore potential solutions: Error Details: The error message mentions two key...

  • 0 kudos
Surya0
by New Contributor III
  • 2254 Views
  • 4 replies
  • 0 kudos

Resolved! Unit hive-metastore.service not found

Hi Everyone,I've encountered an issue while trying to make use of the hive-metastore capability in Databricks to create a new database and table for our latest use case. The specific command I used was "create database if not exists newDB". However, ...

  • 2254 Views
  • 4 replies
  • 0 kudos
Latest Reply
rakeshprasad1
New Contributor III
  • 0 kudos

@Surya0 : i am facing same issue. stack trace is  Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-2.mysql.database.azure.com)(port=3306)(type=master) : Socket fail to connect to host:consolidated-northeuropec2-prod-metast...

  • 0 kudos
3 More Replies
alexgv12
by New Contributor III
  • 744 Views
  • 1 replies
  • 0 kudos

how to deploy sql functions in pool

we have some function definitions which we have to have available for our bi tools e.g.  CREATE FUNCTION CREATEDATE(year INT, month INT, day INT) RETURNS DATE RETURN make_date(year, month, day); how can we always have this function definition in our ...

  • 744 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexgv12
New Contributor III
  • 0 kudos

looking at some alternatives with other databricks components, I think that a CI/CD process should be created where the view can be created through the databricks api. https://docs.databricks.com/api/workspace/functions/createhttps://community.databr...

  • 0 kudos
jim12321
by New Contributor II
  • 469 Views
  • 2 replies
  • 0 kudos

Databricks CLI how to start a job and pass the parameters?

I try to start a job ID 85218616788189 and pass one parameters 'demo' in Windows Shell.This works:databricks jobs run-now 85218616788189  If I try this one,databricks jobs run-now --json '{"job_id":85218616788189,"notebook_params": {"demo":"parameter...

jim12321_0-1710267172994.png
  • 469 Views
  • 2 replies
  • 0 kudos
Latest Reply
VVS29
New Contributor II
  • 0 kudos

Hi Jim, I think the right syntax would be something like this: databricks jobs run-now --job-id 85218616788189 --notebook-params '{"demo":"parameter successful"}'. Let me know if that worked!

  • 0 kudos
1 More Replies
dbal
by New Contributor III
  • 1199 Views
  • 2 replies
  • 0 kudos

Resolved! Spark job task fails with "java.lang.NoClassDefFoundError: org/apache/spark/SparkContext$"

Hi.I am trying to run a Spark Job in Databricks (Azure) using the JAR type.I can't figure out why the job fails to run by not finding the SparkContext.Databricks Runtime: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)Error message: java.lang.NoCl...

  • 1199 Views
  • 2 replies
  • 0 kudos
Latest Reply
dbal
New Contributor III
  • 0 kudos

Update 2: I found the reason in the documentation. This is documented under "Access Mode", and it is a limitation of the Shared access mode.Link: https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations#spark-api-limitations...

  • 0 kudos
1 More Replies
msgrac
by New Contributor II
  • 487 Views
  • 2 replies
  • 0 kudos

Cant remove file on ADLS using dbutils.fs.rm because url contains illeagal character

The URL contains a "[" within, and I've tried to encode the path from "[" to "%5B%27", but it didn't work:  from urllib.parse import quotepath = ""encoded_path = quote(path)

  • 487 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @msgrac, To encode it, you should use %5B instead of trying to encode it as “%5B%27”.

  • 0 kudos
1 More Replies
Tam
by New Contributor III
  • 809 Views
  • 2 replies
  • 0 kudos

TABLE_REDIRECTION_ERROR in AWS Athena After Databricks Upgrade to 14.3 LTS

I have a Databricks pipeline set up to create Delta tables on AWS S3, using Glue Catalog as the Metastore. I was able to query the Delta table via Athena successfully. However, after upgrading Databricks Cluster from 13.3 LTS to 14.3 LTS, I began enc...

Tam_1-1707445843989.png
  • 809 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Tam,    It appears that you’ve encountered a TABLE_REDIRECTION_ERROR while working with your Databricks pipeline, AWS S3, Glue Catalog, and Athena. Let’s break down the issue and explore potential solutions: AWS Glue as a Catalog for Databric...

  • 0 kudos
1 More Replies
Coders
by New Contributor II
  • 704 Views
  • 2 replies
  • 0 kudos

How to do perform deep clone for data migration from one Datalake to another?

 I'm attempting to migrate data from Azure Data Lake to S3 using deep clone. The data in the source Data Lake is stored in Parquet format and partitioned. I've tried to follow the documentation from Databricks, which suggests that I need to register ...

  • 704 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Coders, It appears that you’re encountering an issue while attempting to migrate data from Azure data lake to S3 using deep clone. Let’s break down the problem and explore potential solutions. Error Explanation: The error message you receive...

  • 0 kudos
1 More Replies
data-warriors
by New Contributor
  • 507 Views
  • 1 replies
  • 0 kudos

workspace deletion at Databricks recovery

Hi Team,I accidentally deleted our databricks workspace, which had all our artefacts and control plane, and was the primary resource for our team's working environment.Could anyone please help on priority, regarding the recovery/ restoration mechanis...

  • 507 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @data-warriors, I understand the urgency of your situation. Unfortunately, once a Databricks subscription is cancelled, all associated workspaces are permanently deleted and cannot be recovered1

  • 0 kudos
Poonam17
by New Contributor II
  • 670 Views
  • 1 replies
  • 2 kudos

Not able to deploy cluster in databricks community edition

 Hello team, I am not able to launch databricks cluster in community edition. automatically its getting terminated. Can someone please help here ? Regards.,poonam

IMG_6296.jpeg
  • 670 Views
  • 1 replies
  • 2 kudos
Latest Reply
kakalouk
New Contributor II
  • 2 kudos

I face the exact same problem. The message i get is this:"Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-062042a9d4be8725e @ 10.172.197.194. Please check network connectivity between the data plane and the control plane."

  • 2 kudos
TheDataEngineer
by New Contributor
  • 881 Views
  • 1 replies
  • 0 kudos

'replaceWhere' clause in spark.write for a partitioned table

Hi, I want to be clear about 'replaceWhere' clause in spark.write.Here is the scenario:I would like to add a column to few existing records.The table is already partitioned on "PickupMonth" column.Here is example: Without 'replaceWhere'spark.read \.f...

  • 881 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @TheDataEngineer, Let’s dive into the details of the replaceWhere clause in Spark’s Delta Lake. The replaceWhere option is a powerful feature in Delta Lake that allows you to overwrite a subset of a table during write operations. Specifically, ...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels