cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

364488
by New Contributor
  • 2719 Views
  • 2 replies
  • 0 kudos

java.io.IOException: Invalid PKCS8 data error when reading data from Google Storage

Databricks workspace is hosted in AWS.  Trying to access data in Google Cloud Platform.I have followed the instructions here: https://docs.databricks.com/en/connect/storage/gcs.htmlI get error: "java.io.IOException: Invalid PKCS8 data." when trying t...

  • 2719 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Could you also please share the whole error stack?  

  • 0 kudos
1 More Replies
Faisal
by Contributor
  • 12644 Views
  • 1 replies
  • 0 kudos

DLT quarantine records

How to capture bad records that are violating expectations into quarantine tables, can someone provide DLT SQL code syntax for the same 

  • 12644 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

I would like to share the following docs, which will have examples https://docs.databricks.com/en/delta-live-tables/expectations.html

  • 0 kudos
Alva
by New Contributor
  • 2066 Views
  • 1 replies
  • 0 kudos

Error while performing async I/O for file

We're running dbt Cloud on DBSQL. And a frequent error we keep getting in our dbt jobs is  "Error while performing async I/O for file [S3 URI path]". Since we don't have access to the full logs it's very difficult to know what's actually going on her...

  • 2066 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

do you have access to create a support ticket? if you do, we can retrieve the logs for you and provide the details. If you dont, then you will need access to your driver's logs to identify the root cause of this issue.

  • 0 kudos
rt-slowth
by Contributor
  • 2499 Views
  • 2 replies
  • 1 kudos

How to writeStream with redshift

I have already checked the documentation below The documentation below does not describe how to write to streaming.Is there a way to write the gold table (type is streaming table), which is the output of the streaming pipeline of Delta Live Tables in...

  • 2499 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Only batch processing is supported.

  • 1 kudos
1 More Replies
umarkhan
by New Contributor II
  • 1832 Views
  • 1 replies
  • 0 kudos

Module not found when using applyInPandasWithState in Repos

I should start by saying that everything works fine if I copy and paste it all into a notebook and run it. The problem starts if we try to have any structure in our application repository. Also, so far we have only run into this problem with applyInP...

  • 1832 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

which DBR version are you using? does it works on non DLT jobs?

  • 0 kudos
sher
by Valued Contributor II
  • 1573 Views
  • 1 replies
  • 0 kudos

did anyone faced this issue in delta table while genrating manifest file

error message : Manifest generation is not supported for tables that leverage column mapping, as external readers cannot read these Delta tableswhy i got this issue. not sure should we need to do any process ?

  • 1573 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

could you please share the full stack trace and the repro steps?  

  • 0 kudos
VishalD
by New Contributor
  • 1390 Views
  • 1 replies
  • 0 kudos

Not able to load nested XML file with struct type

Hello Experts,I am trying to load XML with struct type and having XSI type attribute. below is sample XML format:<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="htt...

  • 1390 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

You can try to use from_xml() function, here is the link to the docs https://docs.databricks.com/en/sql/language-manual/functions/from_xml.html

  • 0 kudos
SimDarmapuri
by New Contributor II
  • 2185 Views
  • 1 replies
  • 1 kudos

Databricks Deployment using Data Thirst

Hi,I am trying to deploy Databricks Notebooks using Azure Devops to different environments using third party extension Data Thirst (Databricks Script Deployment Task by Data Thirst). The pipeline is able to generate/download artifacts but not able to...

SimDarmapuri_0-1705853167362.png
  • 2185 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

the extension is quite old and does not know about Unity Catalog.  So that is probably the reason why it fails.But why do you use the extension for notebook propagation from dev to prd?  You can do this using Repos, feature branches and pull requests...

  • 1 kudos
Michael_Appiah
by Contributor II
  • 2506 Views
  • 1 replies
  • 1 kudos

Resolved! Display Limits Catalog Explorer

It seems as if the Catalog Explorer can only display a maximum of 1000 folders within a UC Volume. I just ran into this issue when I added new folders to a volume which were not displayed in the Catalog Explorer (only folders 1-1000). I was able to r...

  • 2506 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Hi @Michael_Appiah , This is a known limitation: https://docs.databricks.com/en/connect/unity-catalog/volumes.html#limitations

  • 1 kudos
jonathan-dufaul
by Valued Contributor
  • 4781 Views
  • 2 replies
  • 0 kudos

Is there a command in sql cell to ignore formatting for some lines like `# fmt: off` in Python cells

In python cells I can add the comments `# fmt: off` before a block of code that I want black/autoformatter to ignore and `# fmt: on` afterwards. Is there anything similar I can put in sql cells to accomplish the same effect?Some of the recommendation...

Data Engineering
autoformatter
formatter
sql
  • 4781 Views
  • 2 replies
  • 0 kudos
bayerb
by New Contributor
  • 1778 Views
  • 1 replies
  • 0 kudos

Sink is not written into delta table in Spark structured streaming

I want to create a streaming job, that reads messages from a folder within TXT files, does the parsing, some processing, and appends the result into one of 3 possible delta tables depending on the parse result. There is a parse_failed table, an unknw...

  • 1778 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

There doesn't seem to any issue with code. But log needs to be analysed to get a clue of what is the issue. Could you please create a support ticket.

  • 0 kudos
vishwanath_1
by New Contributor III
  • 1790 Views
  • 1 replies
  • 0 kudos

Resolved! Need Suggestion for better caching strategy

i have below steps to perform 1.Read a csv file (considerably huge file .. ~100gb)2.add index using zipwithindex function 3.repartition dataframe 4.Passing on to another function .Can you suggest the best optimized caching strategy to execute these c...

vishwanath_1_0-1705915220664.png
  • 1790 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

Hi @vishwanath_1 , Caching only comes into picture when there are multiple reference to data source in your code. As per the flow mentioned by you, I don't see that being the case for you. You are only reading the data from source once and also there...

  • 0 kudos
sudhakargen
by New Contributor II
  • 15986 Views
  • 2 replies
  • 0 kudos

Intermittently unavailable: Maven library com.crealytics:spark-excel_2.12:3.5.0_0.20.3

The issue is that the package com.crealytics:spark-excel_2.12:3.5.0_0.20.3 is intermittently unavailable i.e. most of the times excel import works and few times it fails with exception (org.apache.spark.SparkClassNotFoundException).I have installed m...

  • 15986 Views
  • 2 replies
  • 0 kudos
Latest Reply
sudhakargen
New Contributor II
  • 0 kudos

"Looks like the issue is source is not able to reach" - Can you please let me know what you mean by this.Libraries installed on the databricks cluster are as below, I have a cluster with14.2 version on which I have installed maven library(com.crealyt...

  • 0 kudos
1 More Replies
BartoszBiskupsk
by New Contributor II
  • 2979 Views
  • 2 replies
  • 0 kudos

"Last Access" information for external delta tables (no UC)

Hi,Is there a way to make audit on all tables in hive_metastore (no UC), all are external, to check when each has been used for the last time (queried / updated / etc). ?

Data Engineering
access logs
  • 2979 Views
  • 2 replies
  • 0 kudos
Latest Reply
CharlesReily
New Contributor III
  • 0 kudos

Apache Ranger or Apache Sentry can be used for auditing Hive activities. If you have set up auditing in one of these tools, you can review the audit logs to see when tables were accessed. Audit logs are typically stored in a separate location, and yo...

  • 0 kudos
1 More Replies
hbs59
by New Contributor III
  • 9611 Views
  • 5 replies
  • 2 kudos

Resolved! Rest API Error 404

I am trying to export a notebook or directory using /api/2.0/workspace/export.When I run /api/2.0/workspace/list with a particular url and path, I get the results that I expect, a list of objects (notebooks and folders) at that location.But when I ru...

  • 9611 Views
  • 5 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, Could you please remove the parameters , (format and direct_download) and confirm? 

  • 2 kudos
4 More Replies
Labels