cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nikhil_kumawat
by New Contributor II
  • 2883 Views
  • 8 replies
  • 2 kudos

Not able to retain precision while reading data from source file

Hi, I am trying to read a csv file located in S3 bucket folder. The csv file contains around 50 columns out of which one of the column is "litre_val" which contains values like "60211.952", "59164.608'. Upto 3 decimal points. Now to read this csv we ...

precision.png
  • 2883 Views
  • 8 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

@nikhil_kumawat can you provide more details to reproduce this and better help you? e.g.: sample data set, dbr version, reproducer code, etc. I'm having this sample data: csv_content = """column1,column2,litre_val,another_decimal_column 1,TypeA,60211...

  • 2 kudos
7 More Replies
Algocrat
by New Contributor II
  • 6470 Views
  • 2 replies
  • 2 kudos

Resolved! Discover and redact pii

Hi! What is the best way to discover and redact pii. Does Databricks offer any frameworks, or set of methods, or processes that we may follow?  

  • 6470 Views
  • 2 replies
  • 2 kudos
Latest Reply
viswesh
New Contributor II
  • 2 kudos

Hey @Algocrat  @szymon_dybczak , just wanted to let you know that Databricks is currently working on a product to tackle PII / sensitive data classification. If you're a current customer, we recommend you reach out to your account representative to l...

  • 2 kudos
1 More Replies
semsim
by Contributor
  • 5168 Views
  • 6 replies
  • 0 kudos

Resolved! Installing LibreOffice on Databricks

Hi, I need to install libreoffice to do a document conversion from .docx to .pdf. The requirement is no use of containers. Any idea on how I should go about this? Environment: Databricks 13.3 LTSThanks,Sem

  • 5168 Views
  • 6 replies
  • 0 kudos
Latest Reply
furkan
New Contributor II
  • 0 kudos

Hi @semsim I'm attempting to install LibreOffice for converting DOCX files to PDF and tried running your shell commands from notebook. However, I encountered the 404 errors shown below. Do you have any suggestions on how to resolve this issue? I real...

  • 0 kudos
5 More Replies
soumiknow
by Contributor II
  • 5499 Views
  • 10 replies
  • 2 kudos

Resolved! How to resolved 'connection refused' error while using a google-cloud lib in Databricks Notebook?

I want to use google-cloud-bigquery library in my PySpark code though I know that spark-bigquery-connector is available. The reason I want to use is that the Databricks Cluster 15.4LTS comes with 0.22.2-SNAPSHOT version of spark-bigquery-connector wh...

  • 5499 Views
  • 10 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

@soumiknow sounds good ! Please let me know if you need some internal assistance with the communication process.

  • 2 kudos
9 More Replies
Edthehead
by Contributor III
  • 1695 Views
  • 1 replies
  • 0 kudos

Restoring a table from a Delta live pipeline

I have a DLT pipeline running to ingest files from storage using autoloader. We have a bronze table and a Silver table.A question came up from the team on how to restore DLT tables to a previous version in case of some incorrect transformation. When ...

  • 1695 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

  The RESTORE command is not supported on streaming tables, which is why you encountered the error. Instead, you can use the TIME TRAVEL feature of Delta Lake to query previous versions of the table. You can use the VERSION AS OF or TIMESTAMP AS OF c...

  • 0 kudos
Abishrp
by Contributor
  • 2251 Views
  • 7 replies
  • 3 kudos

Resolved! Issue in getting list of pricing details in json

I can view the pricing details using databricks pricing calculator. Can i able to get pricing details in the form of json or APIS available to get pricing details? I particularly need how much the instance dbu rate for per hour.

  • 2251 Views
  • 7 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @Abishrp ,Unfortunately, as of now, Databricks does not provide a dedicated public API to directly retrieve pricing information in JSON format. (or to be precise, Azure Pricing Calulator doesn't have such option)

  • 3 kudos
6 More Replies
QueryingQuagga
by New Contributor III
  • 4384 Views
  • 7 replies
  • 5 kudos

Resolved! Working with semi-structured data (complex - variant)

Edit: value of inner key "value" was an array - I have added the square brackets to the example below.Hello all,I'm working with Spark SQL API for querying semi-structured data in Databricks. Currently I'm having a hard time understanding how I can n...

Data Engineering
Complex datatypes
Databricks SQL Warehouse
spark sql
Variant datatype
  • 4384 Views
  • 7 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @QueryingQuagga ,Maybe something like that?: %sql WITH src AS ( SELECT parse_json('{ "extendedinformation":[ { "name": "CHANNEL", "value": [{\"id\":\"DUMMYID1\",\"name\":\"DUMMYCHANNEL1\",\"role\":\"DUMMYROLE1\"}]}, ...

  • 5 kudos
6 More Replies
kazinahian
by New Contributor III
  • 4253 Views
  • 2 replies
  • 0 kudos

Lowcode ETL in Databricks

Hello everyone,I work as a Business Intelligence practitioner, employing tools like Alteryx or various low-code solutions to construct ETL processes and develop data pipelines for my Dashboards and reports. Currently, I'm delving into Azure Databrick...

  • 4253 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nam_Nguyen
Databricks Employee
  • 0 kudos

Hello @kazinahian , Azure Databricks offers several options for building ETL (Extract, Transform, Load) data pipelines, ranging from low-code to more code-centric approaches: Delta Live Tables Delta Live Tables (DLT) is a declarative framework for bu...

  • 0 kudos
1 More Replies
NathanSundarara
by Valued Contributor
  • 2525 Views
  • 1 replies
  • 0 kudos

Lakehouse federation bringing data from SQL Server

Did any one tried to bring data using the newly announced Lakehouse federation and ingest using DELTA LIVE TABLES? I'm currently testing using Materialized Views. First loaded the full data and now loading last 3 days daily and recomputing using Mate...

Data Engineering
dlt
Lake house federation
  • 2525 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nam_Nguyen
Databricks Employee
  • 0 kudos

Hi @NathanSundarara , regarding your current approach, here are the potential solutions and considerations- Deduplication: Implement deduplication strategies within your DLT pipeline. For example clicksDedupDf = ( spark.readStream.table("LIVE.rawCl...

  • 0 kudos
jamson
by New Contributor
  • 22880 Views
  • 2 replies
  • 0 kudos

What are the best practices for optimizing Power BI reports and dashboards for performance in the PL

I’m studying for the PL-300 exam and would love some advice on how to optimize Power BI reports and dashboards for better performance. Specifically, I’m interested in:Techniques for improving report load times and responsiveness.Best practices for ma...

  • 22880 Views
  • 2 replies
  • 0 kudos
Latest Reply
emily2056
New Contributor II
  • 0 kudos

Here are the best practices for optimizing Power BI reports and dashboards for performance in the production lifecycle (PL):1. Optimize Data ModelsUse star schema design for efficient querying.Avoid unnecessary columns and reduce column cardinality b...

  • 0 kudos
1 More Replies
tanjil
by New Contributor III
  • 4314 Views
  • 4 replies
  • 2 kudos

print(flush = True) not working

Hello, I have the following minimum example working example using multiprocessing:from multiprocessing import Pool   files_list = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]     def f(t): print('Hello from child process', flush = Tr...

  • 4314 Views
  • 4 replies
  • 2 kudos
Latest Reply
tanjil
New Contributor III
  • 2 kudos

No errors are generated. The code executes successfully, but there the print statement for "Hello from child process" does not work.

  • 2 kudos
3 More Replies
Myousief
by New Contributor II
  • 3084 Views
  • 7 replies
  • 1 kudos

Can't login with password, SSO Enabled OIDC, Secret Key Expired

I am currently unable to log in to Databricks Account ConsoleOpenID SSO is enabled for our workspace using Microsoft Entra ID, but the client secret has expired. As a result, SSO login is no longer functional. I attempted to log in using a password, ...

  • 3084 Views
  • 7 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Glad to hear you got unblocked.

  • 1 kudos
6 More Replies
sangwan
by New Contributor
  • 758 Views
  • 1 replies
  • 0 kudos

Remorph : Getting Error while running remorph-core-0.2.0-SNAPSHOT.jar after Maven build

We are encountering an issue while running the remorph-core-0.2.0-SNAPSHOT.jar file after successfully building it using Maven. The build completes without errors, but when we try to execute the generated .jar file, we get the following exception att...

  • 758 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Here are a few steps you can take to debug and resolve this issue: Check the Code for Either Usage: Look through your codebase for instances where Either is used. Ensure that you are handling both Left and Right cases properly. The error suggests th...

  • 0 kudos
JamesY
by New Contributor III
  • 1339 Views
  • 1 replies
  • 0 kudos

Databricks JDBC write to table with PK column, error, key not found.

Hello, I am trying to write data to table, it works find before, but after I recreated the table with one column as PK, there is an error.Unable to write into the A_Table table....key not found: id What is the correct way of doing this?PK column: â€ƒâ€ƒ[...

Data Engineering
Databricks
SqlMi
  • 1339 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Looks like the primary key column ID is not being found during the write operation. Kindly verify the schema.  Use a command like below to create the table with id as the primary key. CREATE TABLE A_Table ( ID BIGINT IDENTITY(1,1) PRIMARY KEY NOT NUL...

  • 0 kudos
xamry
by New Contributor
  • 7246 Views
  • 1 replies
  • 0 kudos

java.lang.ClassCastException in JDBC driver's logger

Hi, Our Java application is using latest version of Databricks JDBC driver (2.6.38). This application already uses Log4j 2.17.1 and SLF4J 2.0.13. When querying data from Databricks, there are java.lang.ClassCastException printing on the console. Data...

  • 7246 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Kindly use the latest jars from maven, and if that does not help shading common packages to avoid conflicts is something that should be our next resort.   

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels