cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

364488
by New Contributor
  • 1988 Views
  • 2 replies
  • 0 kudos

java.io.IOException: Invalid PKCS8 data error when reading data from Google Storage

Databricks workspace is hosted in AWS.  Trying to access data in Google Cloud Platform.I have followed the instructions here: https://docs.databricks.com/en/connect/storage/gcs.htmlI get error: "java.io.IOException: Invalid PKCS8 data." when trying t...

  • 1988 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Could you also please share the whole error stack?  

  • 0 kudos
1 More Replies
Faisal
by Contributor
  • 10124 Views
  • 1 replies
  • 0 kudos

DLT quarantine records

How to capture bad records that are violating expectations into quarantine tables, can someone provide DLT SQL code syntax for the same 

  • 10124 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

I would like to share the following docs, which will have examples https://docs.databricks.com/en/delta-live-tables/expectations.html

  • 0 kudos
Alva
by New Contributor
  • 1674 Views
  • 1 replies
  • 0 kudos

Error while performing async I/O for file

We're running dbt Cloud on DBSQL. And a frequent error we keep getting in our dbt jobs is  "Error while performing async I/O for file [S3 URI path]". Since we don't have access to the full logs it's very difficult to know what's actually going on her...

  • 1674 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

do you have access to create a support ticket? if you do, we can retrieve the logs for you and provide the details. If you dont, then you will need access to your driver's logs to identify the root cause of this issue.

  • 0 kudos
rt-slowth
by Contributor
  • 1687 Views
  • 2 replies
  • 1 kudos

How to writeStream with redshift

I have already checked the documentation below The documentation below does not describe how to write to streaming.Is there a way to write the gold table (type is streaming table), which is the output of the streaming pipeline of Delta Live Tables in...

  • 1687 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Only batch processing is supported.

  • 1 kudos
1 More Replies
umarkhan
by New Contributor II
  • 1237 Views
  • 1 replies
  • 0 kudos

Module not found when using applyInPandasWithState in Repos

I should start by saying that everything works fine if I copy and paste it all into a notebook and run it. The problem starts if we try to have any structure in our application repository. Also, so far we have only run into this problem with applyInP...

  • 1237 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

which DBR version are you using? does it works on non DLT jobs?

  • 0 kudos
sher
by Valued Contributor II
  • 987 Views
  • 1 replies
  • 0 kudos

did anyone faced this issue in delta table while genrating manifest file

error message : Manifest generation is not supported for tables that leverage column mapping, as external readers cannot read these Delta tableswhy i got this issue. not sure should we need to do any process ?

  • 987 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

could you please share the full stack trace and the repro steps?  

  • 0 kudos
VishalD
by New Contributor
  • 906 Views
  • 1 replies
  • 0 kudos

Not able to load nested XML file with struct type

Hello Experts,I am trying to load XML with struct type and having XSI type attribute. below is sample XML format:<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="htt...

  • 906 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

You can try to use from_xml() function, here is the link to the docs https://docs.databricks.com/en/sql/language-manual/functions/from_xml.html

  • 0 kudos
dbx_687_3__1b3Q
by New Contributor III
  • 4464 Views
  • 1 replies
  • 1 kudos

Databricks Asset Bundle (DAB) from existing workspace?

Can anyone point us to some documentation that explains how to create a DAB from an EXISTING workspace? We've been building pipelines, notebooks, tables, etc in a single workspace and a DAB seems like a great way to deploy it all to our Test and Prod...

  • 4464 Views
  • 1 replies
  • 1 kudos
Latest Reply
RyHubb
New Contributor III
  • 1 kudos

The Last link posted no longer works. Also, the commands seem to have changed in the Databricks CLI. I see no `bundles` command, only `bundle`....and that command has no `create` option. Is there no way to convert a dbc file to a bundle any more?

  • 1 kudos
SimDarmapuri
by New Contributor II
  • 1422 Views
  • 1 replies
  • 1 kudos

Databricks Deployment using Data Thirst

Hi,I am trying to deploy Databricks Notebooks using Azure Devops to different environments using third party extension Data Thirst (Databricks Script Deployment Task by Data Thirst). The pipeline is able to generate/download artifacts but not able to...

SimDarmapuri_0-1705853167362.png
  • 1422 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

the extension is quite old and does not know about Unity Catalog.  So that is probably the reason why it fails.But why do you use the extension for notebook propagation from dev to prd?  You can do this using Repos, feature branches and pull requests...

  • 1 kudos
Michael_Appiah
by Contributor
  • 1534 Views
  • 1 replies
  • 1 kudos

Resolved! Display Limits Catalog Explorer

It seems as if the Catalog Explorer can only display a maximum of 1000 folders within a UC Volume. I just ran into this issue when I added new folders to a volume which were not displayed in the Catalog Explorer (only folders 1-1000). I was able to r...

  • 1534 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Hi @Michael_Appiah , This is a known limitation: https://docs.databricks.com/en/connect/unity-catalog/volumes.html#limitations

  • 1 kudos
jonathan-dufaul
by Valued Contributor
  • 3630 Views
  • 2 replies
  • 0 kudos

Is there a command in sql cell to ignore formatting for some lines like `# fmt: off` in Python cells

In python cells I can add the comments `# fmt: off` before a block of code that I want black/autoformatter to ignore and `# fmt: on` afterwards. Is there anything similar I can put in sql cells to accomplish the same effect?Some of the recommendation...

Data Engineering
autoformatter
formatter
sql
  • 3630 Views
  • 2 replies
  • 0 kudos
bayerb
by New Contributor
  • 1120 Views
  • 1 replies
  • 0 kudos

Sink is not written into delta table in Spark structured streaming

I want to create a streaming job, that reads messages from a folder within TXT files, does the parsing, some processing, and appends the result into one of 3 possible delta tables depending on the parse result. There is a parse_failed table, an unknw...

  • 1120 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

There doesn't seem to any issue with code. But log needs to be analysed to get a clue of what is the issue. Could you please create a support ticket.

  • 0 kudos
vishwanath_1
by New Contributor III
  • 1255 Views
  • 1 replies
  • 0 kudos

Resolved! Need Suggestion for better caching strategy

i have below steps to perform 1.Read a csv file (considerably huge file .. ~100gb)2.add index using zipwithindex function 3.repartition dataframe 4.Passing on to another function .Can you suggest the best optimized caching strategy to execute these c...

vishwanath_1_0-1705915220664.png
  • 1255 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

Hi @vishwanath_1 , Caching only comes into picture when there are multiple reference to data source in your code. As per the flow mentioned by you, I don't see that being the case for you. You are only reading the data from source once and also there...

  • 0 kudos
sudhakargen
by New Contributor II
  • 9453 Views
  • 2 replies
  • 0 kudos

Intermittently unavailable: Maven library com.crealytics:spark-excel_2.12:3.5.0_0.20.3

The issue is that the package com.crealytics:spark-excel_2.12:3.5.0_0.20.3 is intermittently unavailable i.e. most of the times excel import works and few times it fails with exception (org.apache.spark.SparkClassNotFoundException).I have installed m...

  • 9453 Views
  • 2 replies
  • 0 kudos
Latest Reply
sudhakargen
New Contributor II
  • 0 kudos

"Looks like the issue is source is not able to reach" - Can you please let me know what you mean by this.Libraries installed on the databricks cluster are as below, I have a cluster with14.2 version on which I have installed maven library(com.crealyt...

  • 0 kudos
1 More Replies
BartoszBiskupsk
by New Contributor II
  • 2241 Views
  • 2 replies
  • 0 kudos

"Last Access" information for external delta tables (no UC)

Hi,Is there a way to make audit on all tables in hive_metastore (no UC), all are external, to check when each has been used for the last time (queried / updated / etc). ?

Data Engineering
access logs
  • 2241 Views
  • 2 replies
  • 0 kudos
Latest Reply
CharlesReily
New Contributor III
  • 0 kudos

Apache Ranger or Apache Sentry can be used for auditing Hive activities. If you have set up auditing in one of these tools, you can review the audit logs to see when tables were accessed. Audit logs are typically stored in a separate location, and yo...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels