cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Henrique_Lino
by New Contributor II
  • 5831 Views
  • 6 replies
  • 0 kudos

value is null after loading a saved df when using specific type in schema

 I am facing an issue when using databricks, when I set a specific type in my schema and read a json, its values are fine, but after saving my df and loading again, the value is gone.I have this sample code that shows this issue: from pyspark.sql.typ...

  • 5831 Views
  • 6 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

@Henrique_Lino , Where are you saving your df?

  • 0 kudos
5 More Replies
Anandsingh
by New Contributor
  • 2154 Views
  • 1 replies
  • 0 kudos

Writing to multiple files/tables from data held within a single file through autoloader

I have a requirement to read and parse JSON files using autoloader where incoming JSON file has multiple sub entities. Each sub entity needs to go into its own delta table. Alternatively we can write each entity data to individual files. We can use D...

  • 2154 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

I think using DLT's medallion architecture should be helpful in this scenario. You can write all the incoming data to one bronze table and one silver table. And you can have multiple gold tables based on the value of the sub-entities.

  • 0 kudos
Kavi_007
by New Contributor III
  • 8925 Views
  • 6 replies
  • 1 kudos

Resolved! Seeing history even after vacuuming the Delta table

Hi,I'm trying to do the vacuum on a Delta table within a unity catalog. The default retention is 7 days. Though I vacuum the table, I'm able to see the history beyond 7 days. Tried restarting the cluster but still not working. What would be the fix ?...

  • 8925 Views
  • 6 replies
  • 1 kudos
Latest Reply
Kavi_007
New Contributor III
  • 1 kudos

No, that's wrong. VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.VACUUM - Azu...

  • 1 kudos
5 More Replies
Jon
by New Contributor II
  • 8734 Views
  • 4 replies
  • 5 kudos

IP address fix

How can I fix the IP address of my Azure Cluster so that I can whitelist the IP address to run my job daily on my python notebook? Or can I find out the IP address to perform whitelisting? Thanks

  • 8734 Views
  • 4 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

Depends on the scenario.  You could expose a single ip address to the external internet, but databricks itself will always use many addresses.

  • 5 kudos
3 More Replies
DE_K
by New Contributor II
  • 4413 Views
  • 4 replies
  • 0 kudos

@dlt.table throws error AttributeError: module 'dlt' has no attribute 'table'

Hi Everyone,I am new to DLT and am trying to run below code to create table dynamically. But I get error "AttributeError: module 'dlt' has no attribute 'table'". code snippet:def generate_tables(model_name   try:    spark.sql("select * from dlt.{0}"....

DE_K_0-1712148007690.png
Data Engineering
dataengineering
datapipeline
deltalivetables
dlt
  • 4413 Views
  • 4 replies
  • 0 kudos
Latest Reply
YuliyanBogdanov
New Contributor III
  • 0 kudos

Thank You, @DE_K. I see your point. I believe you are using the @dlt.table instead of @dlt.create_table to begin with, since want the table to be created and not define and existing one. (https://community.databricks.com/t5/data-engineering/differenc...

  • 0 kudos
3 More Replies
duliu
by New Contributor II
  • 2194 Views
  • 1 replies
  • 0 kudos

Spark Driver failed due to DRIVER_UNAVAILABLE but not due to memory pressure

Hello,I have a job cluster running streaming job and it unexpectedly failed on 19th March due to DRIVER_UNAVAILABLE (Request timed out, Driver is temporarily unavailable) in event log. This is the run: https://atlassian-discover.cloud.databricks.com/...

duliu_0-1712192893352.png duliu_1-1712192913524.png
  • 2194 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @duliu , Hope you are doing well! Would you kindly see if the KB article below addresses your problem? https://kb.databricks.com/en_US/jobs/driver-unavailable Please let me know if this helps and leave a like if this information is useful, followu...

  • 0 kudos
DumbBeaver
by New Contributor II
  • 1727 Views
  • 0 replies
  • 0 kudos

Issue while writing data to unity catalog using JDBC

While writing the data to a pre-existing table in the unity catalog using JDBC. it just writes the Delta of the data. Driver used: com.databricks:databricks-jdbc:2.6.36Lets say I have the table has rows:+-+-+ |a|b| +-+-+ |1|2| |3|4| and I am appendi...

Data Engineering
JDBC
spark
Unity Catalog
  • 1727 Views
  • 0 replies
  • 0 kudos
Leszek
by Contributor
  • 1329 Views
  • 0 replies
  • 0 kudos

[Delta Sharing - open sharing protocol] Token rotation

Hi, Do you have any experience of rotating Tokens in Delta Sharing automatically?There is an option to do that using CLI (Create and manage data recipients for Delta Sharing | Databricks on AWS). But what to do next? Sending new link to the token via...

  • 1329 Views
  • 0 replies
  • 0 kudos
QPeiran
by New Contributor III
  • 1879 Views
  • 1 replies
  • 1 kudos

Materialized View to External Location

As I learned the Materialized View is actually a Delta Table stored internally to Databricks (managed table ?)Is it possible to move the location of the Materialized View and the Delta Table under hood to an external location like BLOB? 

  • 1879 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @QPeiran For now there is no way to move the materialized view as it's being maintained by Databricks only

  • 1 kudos
Sachin_
by New Contributor II
  • 4477 Views
  • 3 replies
  • 0 kudos

The spark context has stopped and the driver is restarting. Your notebook will be automatically

I am trying to execute a scala jar in notebook. When I execute it explicitly I am able to run the jar like this :but when I am trying to run a notebook through databricks workflow I get the below error : The spark context has stopped and the driver i...

Sachin__1-1709881658170.png Sachin__2-1709881677411.png Sachin__3-1709881874830.png
Data Engineering
dataengineering
  • 4477 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Could you share the code you have in your JAR file? how are you creating your Spark context in your JAR file?  

  • 0 kudos
2 More Replies
aurora
by New Contributor
  • 3493 Views
  • 0 replies
  • 0 kudos

JDBC drivers for Microsoft Dataverse IO

I want to run Databricks ETLs on on-prem Unix, Azure and on AWS (in future). I am trying to find suitable JDBC drivers but couldn't find anything except CDATA which is very costly.Can someone please help me? Also, what could be other viable solutions...

Data Engineering
dataverse
JDBC
spark
  • 3493 Views
  • 0 replies
  • 0 kudos
Jamie_209389
by New Contributor III
  • 9846 Views
  • 7 replies
  • 3 kudos

Resolved! In Azure Databricks CLI, how to pass in the parameter notebook_params? Error: Got unexpected extra argument

I am trying to call run-now with notebook_params in Azure Databricks CLI, following https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/jobs-cliandescapse the quotes as stated in the documentationhttps://learn.microsoft.com/en-us/azure/d...

  • 9846 Views
  • 7 replies
  • 3 kudos
Latest Reply
Vaitheesh
New Contributor II
  • 3 kudos

I have the latest Databricks CLI setup and configured in my Ubuntu VM. When I tried to run a job using the json template I generated using databricks jobs get 'xxxjob_idxxx' > orig.json it throws an unknown error.Databricks CLI v0.216.0databricks job...

  • 3 kudos
6 More Replies
vijay_boopathy
by New Contributor
  • 11251 Views
  • 1 replies
  • 1 kudos

Hive vs Delta

I'm curious about your experiences with Hive and Delta Lake. What are the advantages of using Delta over Hive, and in what scenarios would you recommend choosing Delta for data processing tasks? I'd appreciate any insights or recommendations based on...

  • 11251 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Delta Lake offers several advantages over Hive. One of the key benefits is its design for petabyte-scale data lakes with streaming and fast access at the forefront. This makes it more suitable for near-real-time streams, unlike Hive. Delta Lake also ...

  • 1 kudos
William_Scardua
by Valued Contributor
  • 3049 Views
  • 2 replies
  • 0 kudos

Drop array in a struct field

Hi guys,look my table definitionwell, I need to remove 'med array' inside that 'equip' field.have any idea ?Thank you

Screenshot 2024-04-02 at 19.03.38.png
  • 3049 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sampath_Kumar
New Contributor II
  • 0 kudos

Hi William,There is array_remove method that can help to remove the elements from an array. Here med array is an element in equip array.If it not helpful, please share some sample data so that we can solve it together.Reference: array_removeThanks

  • 0 kudos
1 More Replies
sharma_kamal
by New Contributor III
  • 3367 Views
  • 2 replies
  • 1 kudos

Resolved! Getting errors while reading data from URL

I'm encountering some issues while trying to read a public dataset from a URL using Databricks. Here's the code snippet(along with errors) I'm working with: I'm confused about Delta format error here.When I read data from a URL, how would it have a D...

sharma_kamal_1-1710132330915.png
  • 3367 Views
  • 2 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

@sharma_kamal  Please disable the formatCheck in notebook and check if you could read the data The configuration command %sql SET spark.databricks.delta.formatCheck.enabled=false will disable the format check for Delta tables in Databricks. Databrick...

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels