cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Prasad_Koneru
by New Contributor III
  • 103 Views
  • 1 replies
  • 0 kudos

How to export metadata of catalog objects

Hi All,I want to export metadata of catalog objects (schemas, tables, volumes, functions models) and import the metadata to another catalog.So do we have any readymade process/notebook/method/api is available to do this?Please help on this.Thanks in ...

  • 103 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Prasad_Koneru, There is no direct import functionality in the Databricks Unity Catalog.

  • 0 kudos
wallco26
by New Contributor II
  • 330 Views
  • 4 replies
  • 0 kudos

Databricks External Data SQL Server Connection Dirty Reads

I've connected a SQL Server database as an external connection in Unity Catalog. It looks like when I write SELECT queries to that connection I end up locking my tables on the SQL Server. Is there a way to query these tables using a "with (nolock)" c...

Data Engineering
Database
SQL Server
  • 330 Views
  • 4 replies
  • 0 kudos
Latest Reply
wallco26
New Contributor II
  • 0 kudos

Thanks Slash - where would the "with (nolock)" command fall into the SQL Syntax...within the OPTIONS section? What would the specific command look like? 

  • 0 kudos
3 More Replies
Devsql
by New Contributor III
  • 247 Views
  • 5 replies
  • 2 kudos

Resolved! What is difference between _RAW tables and _APPEND_RAW tables of Bronze-Layer of Azure Databricks

Hi Team,I would like to know difference between _RAW tables and _APPEND_RAW tables of Bronze-Layer.As both are STREAMING Tables then why we need 2 separate tables.Note: we are following Medalion Architecture. Also above tables are created via Delta L...

Data Engineering
Azure Databricks
Delta Live Table
Delta Live Table Pipeline
  • 247 Views
  • 5 replies
  • 2 kudos
Latest Reply
Devsql
New Contributor III
  • 2 kudos

Hi @Kaniz_Fatma , I saw your replies to other posts, so thought to ask you....would you like to help me on this...!!!

  • 2 kudos
4 More Replies
Nisharunnisa
by New Contributor II
  • 114 Views
  • 1 replies
  • 0 kudos

Error: cannot create job: 'SERVICE_PRINCIPAL_NAME' cannot be set as run_as_username

Hi Team, I am trying to deploy workflows to Databricks Workspace via Databricks Asset Bundle (DAB) using Azure Service Principle. Below is my databricks.yml file which i am using for DAB.I am replacing the "SERVICE_PRINCIPAL_NAME" variable in my Jenk...

  • 114 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Nisharunnisa, This indicates a problem with the service principal configuration, possibly due to it not being recognized or configured correctly. To resolve this, ensure the service principal exists in your Azure AD with sufficient permissions li...

  • 0 kudos
operryman
by New Contributor
  • 112 Views
  • 1 replies
  • 0 kudos

Performance drop from databricks 12.2 to 14.3 LTS - solved with checkpoint(), looking for root cause

On databricks 12.2, a piece of code which has an action takes a minute to run.On databricks 14.3, the same code unchanged, same inputs, now takes 10 minutes.Attempting to debug using explain() shows the plan is huge (150k plus rows of output)Replacin...

  • 112 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @operryman, The performance decrease observed from Databricks 12.2 to 14.3 LTS, despite using the same code and inputs, is likely attributable to updates in the underlying Spark engine and optimizations between these versions. Databricks 14.3 LTS ...

  • 0 kudos
PP09
by New Contributor II
  • 535 Views
  • 2 replies
  • 1 kudos

job failing with below error massage

Caused by: HTTP Error -1; url='https://login.microsoftonline.com/271df5c2-953a-497b-93ad-7adf7a4b3cd7/oauth2/token' AzureADAuthenticator.getTokenCall threw java.net.UnknownHostException : login.microsoftonline.comshaded.databricks.azurebfs.org.apache...

  • 535 Views
  • 2 replies
  • 1 kudos
Latest Reply
jenshumrich
New Contributor III
  • 1 kudos

This was caused for me by the line (pyspark):children = [f for f in dbutils.fs.ls(node)]with the node being a"dbfs:/mnt/lifestrategy-blob/scada/"and this a mounted directory. It seems like the implementation of dbutils.fs is done with the same qualit...

  • 1 kudos
1 More Replies
Magesh2798
by New Contributor II
  • 94 Views
  • 1 replies
  • 1 kudos

Query execution after establishing Databricks to Information Design Tool JDBC Connection

Hello all,I have created a JDBC connection from Databricks to Information Design Tool using access token generated using Databricks Service Principal.But it’s throwing below error while running query on top of Databricks data in Information Design Bu...

  • 94 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Magesh2798, First, check that your JDBC connection parameters (URL, username, and password) are correctly configured, and consider using token-based authentication. Ensure you're using a compatible JDBC driver version; mismatched versions can cau...

  • 1 kudos
guangyi
by New Contributor III
  • 200 Views
  • 5 replies
  • 6 kudos

Resolved! Why is the workflow trigger status always paused?

I create a workflow job via Asset Bundle. However, after deploying the job to the databricks the trigger status is always paused even no matter how I update the cron expression. I can manually trigger it successfully. I cannot figure out why. Am I mi...

Screenshot 2024-07-16 at 12.05.51.png Screenshot 2024-07-16 at 12.06.50.png
  • 200 Views
  • 5 replies
  • 6 kudos
Latest Reply
jacovangelder
Contributor III
  • 6 kudos

Next to the cron expression, you also need the following property: pause_statusFor example:schedule: quartz_cron_expression: 0 0 6 * * ? timezone_id: Europe/Amsterdam pause_status: UNPAUSEDThe property can be set to PAUSED and UNPAUSED. Hope th...

  • 6 kudos
4 More Replies
ksenija
by Contributor
  • 141 Views
  • 2 replies
  • 1 kudos

Resolved! DLT pipeline - reading from external tables

Hello!I created a DLT pipeline where my sources are external tables. I have to apply changes (stored_as_scd_type = 1). However, when I run my pipeline, I don't see any incremental uploads. The data remains in the same state as when I first created th...

  • 141 Views
  • 2 replies
  • 1 kudos
Latest Reply
lucasrocha
New Contributor III
  • 1 kudos

Hello @ksenija, I hope this message finds you well. Is your source table receiving new records? If so, are the fields (operation/sequenceNum) being filled? If possible, please provide a sample of the code you are using to create your target table wit...

  • 1 kudos
1 More Replies
csmcpherson
by New Contributor II
  • 277 Views
  • 2 replies
  • 1 kudos

Resolved! AWS NAT (Network Address Translation) Automated On-demand Destruct / Create

Hi folks, Our company typically uses Databrick during a 12 hour block, however the AWS NAT for elastic compute is up 24 hours, and I'd rather not pay for those hours.I gather AWS lambda and cloudwatch can be used to schedule / trigger NAT destruction...

  • 277 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @csmcpherson,  Yes, you can indeed use AWS Lambda and CloudWatch to schedule and trigger NAT gateway destruction and creation. This approach allows you to save costs by only having the NAT gateway active during the hours you need it.Here are th...

  • 1 kudos
1 More Replies
theanhdo
by New Contributor II
  • 205 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Asset Bundles library dependencies - JAR file

Hi there,I have used databricks asset bundles (DAB) to deploy workflows. For each job, I will create a job cluster and install external libraries by specifying libraries in each task, for example:- task_key: my-task  job_cluster_key: my-cluster  note...

  • 205 Views
  • 2 replies
  • 0 kudos
Latest Reply
theanhdo
New Contributor II
  • 0 kudos

Thanks very much @Kaniz_Fatma for your thorough answer.

  • 0 kudos
1 More Replies
thackman
by New Contributor II
  • 167 Views
  • 2 replies
  • 0 kudos

Python udfs, Spark Connect, included modules. Compatibility issues with shared compute

Our current system uses Databricks notebooks and we have some shared notebooks that define some python udfs. This was working for us until we tried to switch from single user clusters to shared clusters. Shared clusters and serverless now use Spark C...

  • 167 Views
  • 2 replies
  • 0 kudos
Latest Reply
thackman
New Contributor II
  • 0 kudos

I'm not sure what you mean by "Ensure the Python binary's location is correctly set to resolve runtime issues" . We aren't using any binaries. Everything is just Databricks notebooks.  In our case if we define a python udf function in the root notebo...

  • 0 kudos
1 More Replies
YS1
by New Contributor III
  • 251 Views
  • 4 replies
  • 5 kudos

Resolved! SQL Server To Databricks Table Migration

Hello,Is there an equivalent SQL code for the following Pyspark code? I'm trying to copy a table from SQL Server to Databricks and save it as a managed delta table.jdbcHostname = "your_sql_server_hostname" jdbcPort = 1433 jdbcDatabase = "your_databas...

  • 251 Views
  • 4 replies
  • 5 kudos
Latest Reply
jacovangelder
Contributor III
  • 5 kudos

The only option to have it in Databricks SQL is lakehouse federation with a SQL Server connection. 

  • 5 kudos
3 More Replies
VenkateswarluAd
by New Contributor
  • 149 Views
  • 2 replies
  • 1 kudos

DLT- apply_changes() SCD2 - Rename_columns : __START_AT & __END_AT

 Renaming the column names of the __START_AT and __END_AT columns created when using the dlt.apply_changes() method for performing SCD2 type updates.

  • 149 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ravivarma
New Contributor III
  • 1 kudos

Hello @VenkateswarluAd , Greetings of the day! The columns __START_AT and __END_AT are used to track the validity period of each record for SCD Type 2 updates. Please be aware that renaming these columns could disrupt the functionality of the SCD Typ...

  • 1 kudos
1 More Replies
vkumar
by New Contributor
  • 99 Views
  • 1 replies
  • 1 kudos

Receiving Null values from Eventhub streaming.

Hi, I am new to PySpark, and facing an issue while consuming data from the Azure eventhub. I am unable to deserialize the consumed data. I see only null values upon deserializing data using the schema. Please find the below schema, eventhub message, ...

  • 99 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @vkumar, The issue you are facing is likely due to the way the data is being serialized and deserialized in the Azure Event Hub. The sample Event Hub message you provided is in a custom format, which is not the typical JSON format. The message app...

  • 1 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors