Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Phani1 • Valued Contributor

06-05-2023 10:38:32 PM

1629 Views
2 replies
1 kudos

Resolved! Convert EBCDIC (Binary) file format to ASCII

Hi Team,How can we convert EBCDIC (Binary) file format to ASCII in databricks? Do we have any libraries in Databricks?

Data Engineering

1629 Views
2 replies
1 kudos

06-05-2023 10:38:32 PM

View Replies

Latest Reply

Vartika
Moderator

06-09-2023 5:43:52 AM

1 kudos

Hi @Janga Reddy,Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

1 kudos

06-09-2023 5:43:52 AM

1 More Replies

by PK225 • New Contributor III

06-07-2023 10:34:46 AM

673 Views
2 replies
1 kudos

Resolved! when reading Json file into DF , want to see data into rows wise, What be the solution

Data Engineering

673 Views
2 replies
1 kudos

06-07-2023 10:34:46 AM

View Replies

Latest Reply

Vartika
Moderator

06-09-2023 4:28:34 AM

1 kudos

Hi @Pavan Kumar,Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

1 kudos

06-09-2023 4:28:34 AM

1 More Replies

by James1100 • New Contributor II

06-01-2023 12:13:09 AM

725 Views
2 replies
2 kudos

Resolved! Databricks connect to GCS

Hi,Would like to ask if anyone knows how to connect to GCS - basically read csv file from GCS bucket.I have no issue connecting to Data Lake.Thank you so much in advance.

Data Engineering

725 Views
2 replies
2 kudos

06-01-2023 12:13:09 AM

View Replies

Latest Reply

Vartika
Moderator

06-09-2023 4:14:01 AM

2 kudos

Hi @James C,Just checking in. If @Kaniz Fatma's answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?We'd love to hear from you.Cheers!

2 kudos

06-09-2023 4:14:01 AM

1 More Replies

by valskyyy • New Contributor II

06-01-2023 2:38:08 AM

2232 Views
5 replies
5 kudos

Resolved! Command skipped but no error message

Hi all ! .This is my first post here !I have a problem when I launch a "run all" on my notebook : at a moment (always on the same cell), all the following cells are skipped.As you can see the command 38 is correctly executed and in the command 40 I ...

Data Engineering

2232 Views
5 replies
5 kudos

06-01-2023 2:38:08 AM

View Replies

Latest Reply

Vartika
Moderator

06-09-2023 4:07:28 AM

5 kudos

Hi @valskyyy valentin.lewandowski.partner,Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd lo...

5 kudos

06-09-2023 4:07:28 AM

4 More Replies

by sanjay • Valued Contributor II

06-05-2023 11:18:17 PM

1927 Views
3 replies
2 kudos

Resolved! Autoloader maxFilesPerTrigger not working correctly

Hi,am trying to apply batch size in autoloader and code is as below. But its picking all the changes in one go even if I have put maxFilesPerTrigger as 10. Appreciate any help.(spark.readStream.format("json").schema(streamSchema).option("cloudFiles.b...

Data Engineering

1927 Views
3 replies
2 kudos

06-05-2023 11:18:17 PM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

06-07-2023 12:11:04 PM

2 kudos

Hi @Sanjay Jain , Since you have provided the trigger as once, the maxFilesPerTrigger will not take effect here. With trigger once, all the files will be read together. You need to change the trigger for this option to come into effect.Please refer ...

2 kudos

06-07-2023 12:11:04 PM

2 More Replies

by NathanSundarara • Contributor

05-12-2023 6:36:41 PM

1337 Views
4 replies
1 kudos

sample

Help parsing the JSON using Spark SQL or python. Sample json attached.

Data Engineering

1337 Views
4 replies
1 kudos

05-12-2023 6:36:41 PM

View Replies

Latest Reply

NathanSundarara
Contributor

05-22-2023 5:45:16 AM

1 kudos

@Suteja Kanuri can you please respond to my question above?

1 kudos

05-22-2023 5:45:16 AM

3 More Replies

by _deepak_ • New Contributor II

05-09-2023 4:10:25 AM

975 Views
3 replies
0 kudos

Databricks regression test suite

Hi, I am new to Databricks and setting up the non-prod environment. I am wanted to know, IS there any way by which I can run a regression suite so that existing setup should not break in case of any feature addition and also how can I make available ...

Data Engineering

975 Views
3 replies
0 kudos

05-09-2023 4:10:25 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-13-2023 8:38:50 AM

0 kudos

@deepak prasad :Yes, you can run regression tests to ensure that your changes do not break existing functionality. Databricks supports a number of testing frameworks like PyTest, which can be used to automate regression testing. You can write test c...

0 kudos

05-13-2023 8:38:50 AM

2 More Replies

by santhosh1 • New Contributor II

10-10-2022 11:57:11 PM

1078 Views
4 replies
3 kudos

Can we share exam voucher to another databricks account

Hi, I received free voucher for lakehouse webinar, My friend also got free voucher, by any chance can i use my friend voucher to shedule another exam for me.

Data Engineering

1078 Views
4 replies
3 kudos

10-10-2022 11:57:11 PM

View Replies

Latest Reply

SUMI1
New Contributor III

06-09-2023 2:34:09 AM

3 kudos

Hi guysUnfortunately, it is not possible to share an exam voucher with another Databricks account. Exam vouchers are typically tied to specific accounts or individuals and cannot be transferred or shared. Free Fire

3 kudos

06-09-2023 2:34:09 AM

3 More Replies

by tototox • New Contributor III

05-11-2023 7:08:57 AM

4566 Views
3 replies
2 kudos

how to check table size by partition?

I want to check the size of the delta table by partition.As you can see, only the size of the table can be checked, but not by partition.

Data Engineering

4566 Views
3 replies
2 kudos

05-11-2023 7:08:57 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-13-2023 8:57:50 AM

2 kudos

@jin park :You can use the Databricks Delta Lake SHOW TABLE EXTENDED command to get the size of each partition of the table. Here's an example:%sql SHOW TABLE EXTENDED LIKE '<table_name>' PARTITION (<partition_column> = '<partition_value>') SELECT...

2 kudos

05-13-2023 8:57:50 AM

2 More Replies

by Yash_542965 • New Contributor II

05-16-2023 9:11:19 AM

1950 Views
2 replies
3 kudos

Resolved! Access Excel file in delta live pipeline

I'm having an issue accessing the excel through dlt pipeline. the file is in ADLS I'm using pandas to read the Excel. It seems pandas are not able to understand abfss protocol is there any way to read Excel with pandas in dlt pipeline?I'm getting thi...

Data Engineering

1950 Views
2 replies
3 kudos

05-16-2023 9:11:19 AM

View Replies

Latest Reply

Yash_542965
New Contributor II

06-09-2023 12:16:13 AM

3 kudos

Thanks for the info. It works just need to install an additional library using "%pip install openpyxl".

3 kudos

06-09-2023 12:16:13 AM

1 More Replies

by Inna_M • New Contributor III

06-07-2023 11:04:26 AM

927 Views
1 replies
1 kudos

Resolved! Is there any maintenance (patches , upgrade for VMs created by DataBricks on Azure) from DataBricks

We are using Databricks on Azure. Infra team noticed we have some VMs created in the past for DataBricks clusters on version Linux (ubuntu 18.04). Is there maintenance previewed for that, upgrade? Are there any patches for created in Azure objects by...

Data Engineering

927 Views
1 replies
1 kudos

06-07-2023 11:04:26 AM

View Replies

Latest Reply

Inna_M
New Contributor III

06-08-2023 7:49:55 AM

1 kudos

Finally while I was posting this question, AzureDataBricks upgraded VMs to the supported version 20, not the latest , 22. It was a week after old version was no longer supported by Microsoft

1 kudos

06-08-2023 7:49:55 AM

by CoopCoop • New Contributor III

05-15-2023 9:15:58 AM

2259 Views
6 replies
7 kudos

Resolved! PDF Attachment on an Alert

Currently my Alert is an HTML table using data pointing to an SQL query.I was wondering if it is possible to attach the resulting table from this SQL query as a PDF to the alert email.If anyone has successfully implemented this, please let me know! T...

Data Engineering

2259 Views
6 replies
7 kudos

05-15-2023 9:15:58 AM

View Replies

Latest Reply

Atanu
Esteemed Contributor

06-08-2023 7:03:57 AM

7 kudos

Ok understood the concern, so basically the issue is with PDF rendering as much I understood. Let me know if I am wrong. Let me see if there is any improvement by our engineering team on this front.

7 kudos

06-08-2023 7:03:57 AM

5 More Replies

by Louis_Databrick • New Contributor II

05-31-2023 12:12:38 AM

588 Views
2 replies
0 kudos

Registering a dataframe coming from a CDC data stream removes the CDC columns from the resulting temporary view, even when explicitly adding a copy of the column to the dataframe.

df_source_records.filter(F.col("_change_type").isin("delete", "insert", "update_postimage")) .withColumn("ROW_NUMBER", F.row_number().over(window)) .filter("ROW_NUMBE...

Data Engineering

588 Views
2 replies
0 kudos

05-31-2023 12:12:38 AM

View Replies

Latest Reply

Louis_Databrick
New Contributor II

06-08-2023 4:15:24 AM

0 kudos

Seems to work now actually. No idea what changed, as I tried multiple times exactly in this way and it did.not.work.from pyspark.sql.functions import expr from pyspark.sql.utils import AnalysisException import pyspark.sql.functions as f data = [(...

0 kudos

06-08-2023 4:15:24 AM

1 More Replies

by StuartKindness_ • New Contributor II

05-05-2023 10:20:58 AM

853 Views
4 replies
2 kudos

How to replace the SSO certifcate on our workspace?

We have Azure AD SSO setup on our workspace but the three year certificate is due to expire on Monday. I have logged onto the Admin Console & Single Sign-on tab. All the options are greyed out and there is no update or edit buttons as can be seen in ...

Data Engineering

853 Views
4 replies
2 kudos

05-05-2023 10:20:58 AM

View Replies

Latest Reply

StuartKindness_
New Contributor II

05-08-2023 4:14:28 AM

2 kudos

@Debayan our version is branch-3.96-1682169174-f2e2f130 if this helps any?

2 kudos

05-08-2023 4:14:28 AM

3 More Replies

by harraz • New Contributor III

05-31-2023 3:50:32 PM

2187 Views
1 replies
0 kudos

Run result unavailable: run failed with error message Notebook not found:

I'm trying to create a workflow job that fetches the notebook from a remote git repository (Bitbucket cloud)I tried everything in the Path field and nothing is working. Note that the bitbucket repo is connected to databricks already and no issues che...

Data Engineering

2187 Views
1 replies
0 kudos

05-31-2023 3:50:32 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

06-07-2023 11:39:42 PM

0 kudos

Hi @harraz (Customer) , Could you please confirm if files in repos has been enabled? https://docs.databricks.com/files/workspace.html#configure-support-for-files-in-repos.You can use the command %sh pwd in a notebook inside a repo to check if Files ...

0 kudos

06-07-2023 11:39:42 PM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! Convert EBCDIC (Binary) file format to ASCII

Resolved! when reading Json file into DF , want to see data into rows wise, What be the solution

Resolved! Databricks connect to GCS

Resolved! Command skipped but no error message

Resolved! Autoloader maxFilesPerTrigger not working correctly

sample

Databricks regression test suite

Can we share exam voucher to another databricks account

how to check table size by partition?

Resolved! Access Excel file in delta live pipeline

Resolved! Is there any maintenance (patches , upgrade for VMs created by DataBricks on Azure) from DataBricks

Resolved! PDF Attachment on an Alert

Registering a dataframe coming from a CDC data stream removes the CDC columns from the resulting temporary view, even when explicitly adding a copy of the column to the dataframe.

How to replace the SSO certifcate on our workspace?

Run result unavailable: run failed with error message Notebook not found:

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...