cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

nikhil_kumawat
by Visitor
  • 62 Views
  • 4 replies
  • 0 kudos

Not able to retain precision while reading data from source file

Hi, I am trying to read a csv file located in S3 bucket folder. The csv file contains around 50 columns out of which one of the column is "litre_val" which contains values like "60211.952", "59164.608'. Upto 3 decimal points. Now to read this csv we ...

precision.png
  • 62 Views
  • 4 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

But I agree with @Walter_C on specifying the schema and making sure String type does not cause any truncation.

  • 0 kudos
3 More Replies
minhhung0507
by New Contributor III
  • 24 Views
  • 1 replies
  • 0 kudos

How to reduce cost of "Regional Standard Class A Operations"

Hi Databricks experts,We're experiencing unexpectedly high costs from Regional Standard Class A Operations in GCS while running a Databricks pipeline. The costs seem related to frequent metadata queries, possibly tied to Delta table operations.In las...

image.png
  • 24 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

There are some approaches you can test: Delta Caching: Enable Delta caching to reduce the number of metadata queries. This can be done by setting the spark.databricks.io.cache.enabled configuration to true.Optimize Command: Use the OPTIMIZE command t...

  • 0 kudos
Edthehead
by Contributor II
  • 38 Views
  • 1 replies
  • 0 kudos

Restoring a table from a Delta live pipeline

I have a DLT pipeline running to ingest files from storage using autoloader. We have a bronze table and a Silver table.A question came up from the team on how to restore DLT tables to a previous version in case of some incorrect transformation. When ...

  • 38 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

  The RESTORE command is not supported on streaming tables, which is why you encountered the error. Instead, you can use the TIME TRAVEL feature of Delta Lake to query previous versions of the table. You can use the VERSION AS OF or TIMESTAMP AS OF c...

  • 0 kudos
vivek_cloudde
by Visitor
  • 18 Views
  • 1 replies
  • 0 kudos

Issue while creating on-demand cluster in azure databricks using pyspark

Hello,I am trying to create an on demand cluster in azure databricks using below code and i am getting the error message{"error_code":"INVALID_PARAMETER_VALUE","message":"Exactly 1 of virtual_cluster_size, num_workers or autoscale must be specified."...

  • 18 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You cannot specify both num_workers and autoscale simultaneously. To resolve the issue, you should remove the autoscale parameter if you want to use a fixed number of workers.

  • 0 kudos
Einsatz
by Visitor
  • 31 Views
  • 1 replies
  • 0 kudos

Photon enabled UC cluster has less executor memory(1/4th) compared to normal cluster.

I have a Unity Catalog Enabled cluster with Node type Standard_DS4_v2 (28 GB Memory, 8 Cores). When "Use Photon Acceleration" option is disabled spark.executor.memory is 18409m. But if I enable Photon Acceleration it shows spark.executor.memory as 46...

  • 31 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Enabling Photon Acceleration on your Databricks cluster reduces the available executor memory because Photon uses a different memory management strategy compared to standard Spark. Photon is designed to optimize performance by leveraging the underlyi...

  • 0 kudos
Abishrp
by New Contributor II
  • 114 Views
  • 7 replies
  • 3 kudos

Resolved! Issue in getting list of pricing details in json

I can view the pricing details using databricks pricing calculator. Can i able to get pricing details in the form of json or APIS available to get pricing details? I particularly need how much the instance dbu rate for per hour.

  • 114 Views
  • 7 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @Abishrp ,Unfortunately, as of now, Databricks does not provide a dedicated public API to directly retrieve pricing information in JSON format. (or to be precise, Azure Pricing Calulator doesn't have such option)

  • 3 kudos
6 More Replies
NathanSundarara
by Contributor
  • 1735 Views
  • 1 replies
  • 0 kudos

Lakehouse federation bringing data from SQL Server

Did any one tried to bring data using the newly announced Lakehouse federation and ingest using DELTA LIVE TABLES? I'm currently testing using Materialized Views. First loaded the full data and now loading last 3 days daily and recomputing using Mate...

Data Engineering
dlt
Lake house federation
  • 1735 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nam_Nguyen
Databricks Employee
  • 0 kudos

Hi @NathanSundarara , regarding your current approach, here are the potential solutions and considerations- Deduplication: Implement deduplication strategies within your DLT pipeline. For example clicksDedupDf = ( spark.readStream.table("LIVE.rawCl...

  • 0 kudos
QueryingQuagga
by New Contributor III
  • 444 Views
  • 7 replies
  • 4 kudos

Resolved! Working with semi-structured data (complex - variant)

Edit: value of inner key "value" was an array - I have added the square brackets to the example below.Hello all,I'm working with Spark SQL API for querying semi-structured data in Databricks. Currently I'm having a hard time understanding how I can n...

Data Engineering
Complex datatypes
Databricks SQL Warehouse
spark sql
Variant datatype
  • 444 Views
  • 7 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @QueryingQuagga ,Maybe something like that?: %sql WITH src AS ( SELECT parse_json('{ "extendedinformation":[ { "name": "CHANNEL", "value": [{\"id\":\"DUMMYID1\",\"name\":\"DUMMYCHANNEL1\",\"role\":\"DUMMYROLE1\"}]}, ...

  • 4 kudos
6 More Replies
berserkersap
by Contributor
  • 4587 Views
  • 3 replies
  • 1 kudos

Speed Up JDBC Write from Databricks Notebook to MS SQL Server

Hello Everyone,I have a use case where I need to write a delta table from DataBricks to a SQL Server Table using Pyspark/ python/ spark SQL .The delta table I am writing contains around 3 million records and the SQL Server Table is neither partitione...

Data Engineering
JDBC
MS SQL Server
pyspark
Table Write
  • 4587 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html   numPartitions(none)The maximum number of partitions that can be used for parallelism in table reading and writing. This also determines the maximum number of concurrent JDBC connection...

  • 1 kudos
2 More Replies
Myousief
by New Contributor
  • 92 Views
  • 7 replies
  • 1 kudos

Can't login with password, SSO Enabled OIDC, Secret Key Expired

I am currently unable to log in to Databricks Account ConsoleOpenID SSO is enabled for our workspace using Microsoft Entra ID, but the client secret has expired. As a result, SSO login is no longer functional. I attempted to log in using a password, ...

  • 92 Views
  • 7 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Glad to hear you got unblocked.

  • 1 kudos
6 More Replies
sangwan
by New Contributor
  • 46 Views
  • 1 replies
  • 0 kudos

Remorph : Getting Error while running remorph-core-0.2.0-SNAPSHOT.jar after Maven build

We are encountering an issue while running the remorph-core-0.2.0-SNAPSHOT.jar file after successfully building it using Maven. The build completes without errors, but when we try to execute the generated .jar file, we get the following exception att...

  • 46 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Here are a few steps you can take to debug and resolve this issue: Check the Code for Either Usage: Look through your codebase for instances where Either is used. Ensure that you are handling both Left and Right cases properly. The error suggests th...

  • 0 kudos
JamesY
by New Contributor III
  • 629 Views
  • 1 replies
  • 0 kudos

Databricks JDBC write to table with PK column, error, key not found.

Hello, I am trying to write data to table, it works find before, but after I recreated the table with one column as PK, there is an error.Unable to write into the A_Table table....key not found: id What is the correct way of doing this?PK column: â€ƒâ€ƒ[...

Data Engineering
Databricks
SqlMi
  • 629 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Looks like the primary key column ID is not being found during the write operation. Kindly verify the schema.  Use a command like below to create the table with id as the primary key. CREATE TABLE A_Table ( ID BIGINT IDENTITY(1,1) PRIMARY KEY NOT NUL...

  • 0 kudos
xamry
by New Contributor
  • 4483 Views
  • 1 replies
  • 0 kudos

java.lang.ClassCastException in JDBC driver's logger

Hi, Our Java application is using latest version of Databricks JDBC driver (2.6.38). This application already uses Log4j 2.17.1 and SLF4J 2.0.13. When querying data from Databricks, there are java.lang.ClassCastException printing on the console. Data...

  • 4483 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Kindly use the latest jars from maven, and if that does not help shading common packages to avoid conflicts is something that should be our next resort.   

  • 0 kudos
guiferviz
by New Contributor III
  • 133 Views
  • 6 replies
  • 3 kudos

Resolved! How to Determine if Materialized View is Performing Full or Incremental Refresh?

I'm currently testing materialized views and I need some help understanding the refresh behavior. Specifically, I want to know if my materialized view is querying the full table (performing a full refresh) or just doing an incremental refresh.From so...

  • 133 Views
  • 6 replies
  • 3 kudos
Latest Reply
DelaneyClark
New Contributor II
  • 3 kudos

Thank you so much for the solution.

  • 3 kudos
5 More Replies
soumiknow
by Contributor
  • 168 Views
  • 10 replies
  • 2 kudos

Resolved! How to resolved 'connection refused' error while using a google-cloud lib in Databricks Notebook?

I want to use google-cloud-bigquery library in my PySpark code though I know that spark-bigquery-connector is available. The reason I want to use is that the Databricks Cluster 15.4LTS comes with 0.22.2-SNAPSHOT version of spark-bigquery-connector wh...

  • 168 Views
  • 10 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

@soumiknow sounds good ! Please let me know if you need some internal assistance with the communication process.

  • 2 kudos
9 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels