cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

DumbBeaver
by New Contributor II
  • 593 Views
  • 1 replies
  • 0 kudos

Issue while writing data to unity catalog using JDBC

While writing the data to a pre-existing table in the unity catalog using JDBC. it just writes the Delta of the data. Driver used: com.databricks:databricks-jdbc:2.6.36Lets say I have the table has rows:+-+-+ |a|b| +-+-+ |1|2| |3|4| and I am appendi...

Data Engineering
JDBC
spark
Unity Catalog
  • 593 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @DumbBeaver, When writing data to a pre-existing table in the Unity Catalog using JDBC, it’s essential to understand how the .union operation and the .overwrite mode work. Union Operation: When you use .union to append rows to an existing Data...

  • 0 kudos
himanshu_k
by New Contributor
  • 698 Views
  • 1 replies
  • 0 kudos

Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark

Hi community,I hope you're all doing well. I'm currently engaged in a PySpark project where I'm implementing pagination-like functionality using the offset and limit functions. My aim is to retrieve data between a specified starting_index and ending_...

  • 698 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @himanshu_k, Let’s delve into your questions regarding pagination using the offset and limit functions in PySpark, especially when dealing with partitioned data frames. Consistency of offset and limit Functions: The offset and limit functions ...

  • 0 kudos
Leszek
by Contributor
  • 565 Views
  • 1 replies
  • 0 kudos

[Delta Sharing - open sharing protocol] Token rotation

Hi, Do you have any experience of rotating Tokens in Delta Sharing automatically?There is an option to do that using CLI (Create and manage data recipients for Delta Sharing | Databricks on AWS). But what to do next? Sending new link to the token via...

  • 565 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Leszek, Rotating tokens in Delta Sharing is a crucial security practice. Let’s break down the steps: Token Rotation: First, you’ve already taken the right step by using the CLI to create and manage data recipients for Delta Sharing. When you...

  • 0 kudos
Check
by New Contributor
  • 916 Views
  • 1 replies
  • 0 kudos

How to call azure databricks api from azure api management

Hi,Has anyone successfully configure azure apim to access databricks rest api ? If yes, appreciate  he can provide the setup guide for me as I am stuck at this point.  Thanks.

Check_0-1712215875654.png
  • 916 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Check, Configuring Azure API Management (APIM) to access Databricks REST API can be a bit tricky, but I’ll guide you through some potential approaches: Using Environment Variables and cURL: To execute Databricks API via a curl request, you ne...

  • 0 kudos
397973
by New Contributor III
  • 1016 Views
  • 3 replies
  • 0 kudos

Having trouble installing my own Python wheel?

I want to install my own Python wheel package on a cluster but can't get it working. I tried two ways: I followed these steps: https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#:~:text=March%2025%2C%202024,code%...

Data Engineering
cluster
Notebook
  • 1016 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@397973 - Once you uploaded the .whl file, did you had a chance to list the file manually in the notebook?  Also, did you had a chance to move the files to /Volumes .whl file?  

  • 0 kudos
2 More Replies
SyedSaqib
by New Contributor II
  • 949 Views
  • 2 replies
  • 0 kudos

Delta Live Table : [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view

Hi,I have a delta live table workflow with storage enabled for cloud storage to a blob store.Syntax of bronze table in notebook===@dlt.table(spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"},table_properties = {"quality": "bron...

  • 949 Views
  • 2 replies
  • 0 kudos
Latest Reply
SyedSaqib
New Contributor II
  • 0 kudos

Hi Kaniz,Thanks for replying back.I am using python for delta live table creation, so how can I set these configurations?When creating the table, add the IF NOT EXISTS clause to tolerate pre-existing objects.consider using the OR REFRESH clause Answe...

  • 0 kudos
1 More Replies
Henrique_Lino
by New Contributor II
  • 1671 Views
  • 6 replies
  • 0 kudos

value is null after loading a saved df when using specific type in schema

 I am facing an issue when using databricks, when I set a specific type in my schema and read a json, its values are fine, but after saving my df and loading again, the value is gone.I have this sample code that shows this issue: from pyspark.sql.typ...

  • 1671 Views
  • 6 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

@Henrique_Lino , Where are you saving your df?

  • 0 kudos
5 More Replies
Anandsingh
by New Contributor
  • 518 Views
  • 1 replies
  • 0 kudos

Writing to multiple files/tables from data held within a single file through autoloader

I have a requirement to read and parse JSON files using autoloader where incoming JSON file has multiple sub entities. Each sub entity needs to go into its own delta table. Alternatively we can write each entity data to individual files. We can use D...

  • 518 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

I think using DLT's medallion architecture should be helpful in this scenario. You can write all the incoming data to one bronze table and one silver table. And you can have multiple gold tables based on the value of the sub-entities.

  • 0 kudos
Kavi_007
by New Contributor III
  • 2224 Views
  • 7 replies
  • 1 kudos

Resolved! Seeing history even after vacuuming the Delta table

Hi,I'm trying to do the vacuum on a Delta table within a unity catalog. The default retention is 7 days. Though I vacuum the table, I'm able to see the history beyond 7 days. Tried restarting the cluster but still not working. What would be the fix ?...

  • 2224 Views
  • 7 replies
  • 1 kudos
Latest Reply
Kavi_007
New Contributor III
  • 1 kudos

No, that's wrong. VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.VACUUM - Azu...

  • 1 kudos
6 More Replies
iarregui
by New Contributor
  • 1418 Views
  • 2 replies
  • 0 kudos

Getting a Databricks static IP

Hello. I want to connect from my Databricks workspace to an external API to extract some data. The owner of the API asks for an IP to provide the token necessary for the extraction of data. Therefore I would need to set a static IP in Databricks that...

  • 1418 Views
  • 2 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

Hello, the easiest way (in Azure) is to deploy Workspace in VNET Injection mode and attach NAT Gateway to you VNET. NAT GW require Public IP. This IP will be your static egress IP for all Cluster in for this Workspace.Note: both NAT GW and IP Address...

  • 0 kudos
1 More Replies
kurokaj
by New Contributor
  • 630 Views
  • 1 replies
  • 0 kudos

DLT Autoloader stuck in reading Avro files from Azure blob storage

I have a DLT pipeline joining data from streaming tables to metadata of Avro files located in Azure blob storage. The avro files are loaded using autoloader. Up until 25.3. (about 20:00UTC) the pipeline worked fine, but then suddenly got stuck in ini...

image.png
Data Engineering
autoloader
AVRO
dlt
LTS
  • 630 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @kurokaj,  If the schema of the input data changes while an update is running, the update may be logged as CANCELED and automatically retried1. Ensure that there haven’t been any unexpected schema changes in your Avro files during the problematic ...

  • 0 kudos
Jon
by New Contributor II
  • 2767 Views
  • 4 replies
  • 5 kudos

IP address fix

How can I fix the IP address of my Azure Cluster so that I can whitelist the IP address to run my job daily on my python notebook? Or can I find out the IP address to perform whitelisting? Thanks

  • 2767 Views
  • 4 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

Depends on the scenario.  You could expose a single ip address to the external internet, but databricks itself will always use many addresses.

  • 5 kudos
3 More Replies
cmilligan
by Contributor II
  • 2346 Views
  • 3 replies
  • 2 kudos

Dropdown for parameters in a job

I want to be able to denote the type of run from a predetermined list of values that a user can choose from when kicking off a run using different parameters. Our team does standardized job runs on a weekly cadence but can have timeframes that change...

  • 2346 Views
  • 3 replies
  • 2 kudos
Latest Reply
dev56
New Contributor II
  • 2 kudos

Hi @cmilligan , I have a similar requirement and would really be grateful if you could provide me with any information on how to fix this issue. Thanks a lot!

  • 2 kudos
2 More Replies
BenDataBricks
by New Contributor
  • 1176 Views
  • 1 replies
  • 0 kudos

Register more redirect URIs for OAuth U2M

I am following this guide on allowing OAuth U2M for Azure Databricks.When I get to Step 2, I make a request to account.azuredatabricks.net and specify a redirect URI to receive a code.The redirect URI in the example is localhost:8020. If I change thi...

  • 1176 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @BenDataBricks,    OAuth user-to-machine (U2M) authentication in Azure Databricks allows real-time human sign-in and consent to authenticate the target user account. After successful sign-in and consent, an OAuth token is granted to the particip...

  • 0 kudos
MAR1
by New Contributor
  • 1069 Views
  • 1 replies
  • 0 kudos

[ Databricks - Delta sharing ] Issue with Delta Sharing in Databricks: Unable to Query Shared Views

Hi guys, I've encountered an issue while attempting to query shared views via Delta Sharing in Databricks. We are using Delta Sharing Databricks-to-Databricks protocol to share data from a databricks environment deployed on azure to another databrick...

  • 1069 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @MAR1,  It seems you’ve encountered some challenges while working with Delta Sharing in Databricks. Let’s address each of the issues you’ve encountered: SQL Serverless Warehouse and IP Whitelisting: When using a SQL Serverless warehouse to que...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels