Data Engineering

Forum Posts

Sorted by:

by quakenbush • Contributor

12-10-2023 11:54:41 AM

579 Views
2 replies
1 kudos

Resolved! Is Autoloader suitable to load full dumps?

Hi,I recently completed the fundamentals & advanced data engineer exam, yet I've got a question about Autoloader. Please don't go too hard on me, since I lack practical experience at this point in time Docs say this is incremental ingestion, so it's ...

Data Engineering

579 Views
2 replies
1 kudos

12-10-2023 11:54:41 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-12-2023 2:13:26 AM

1 kudos

Our End-of-Year Community Survey is here! Please take a few moments to complete the survey. Your feedback matters!

1 kudos

12-12-2023 2:13:26 AM

1 More Replies

by rvo1994 • New Contributor

12-11-2023 8:23:23 AM

313 Views
0 replies
0 kudos

Performance issue with spatial reference system conversions

Hi,I am facing a performance issue with spatial reference system conversions. My delta table has approximately 10 GB/46 files/160M records and gets +/- 5M records every week. After ingestion, I need to convert points (columns GE_XY_XCOR and GE_XY_YCO...

Data Engineering

313 Views
0 replies
0 kudos

12-11-2023 8:23:23 AM

by BriGuy • New Contributor II

12-11-2023 8:11:35 AM

656 Views
0 replies
0 kudos

How can I efficiently write to easily queryable logs?

I've got a parallel running process loading multiple tables into the datalake. I'm writing my logs to a delta table using dataframewriter in append mode. The problem is that every save is taking a bit of time with what appears to be the calculation o...

Data Engineering

logging

656 Views
0 replies
0 kudos

12-11-2023 8:11:35 AM

by MrDataMan • New Contributor II

12-06-2023 10:25:25 PM

595 Views
2 replies
0 kudos

Expand and read Zip compressed files not working

I am trying to unzip compressed files following this doc (https://docs.databricks.com/en/files/unzip-files.html) but I am getting the error.When I run:dbutils.fs.mv("file:/LoanStats3a.csv", "dbfs:/tmp/LoanStats3a.csv") I get the following error: java...

Data Engineering

595 Views
2 replies
0 kudos

12-06-2023 10:25:25 PM

View Replies

Latest Reply

gabsylvain
New Contributor III

12-11-2023 7:38:11 AM

0 kudos

Hey @MrDataMan, I wasn't able to reproduce the exact same error you did get, but I still got a similar error while trying to run the example. To solve it, I tweaked the code a little bit: %sh curl https://resources.lendingclub.com/LoanStats3a.csv.z...

0 kudos

12-11-2023 7:38:11 AM

1 More Replies

by BriGuy • New Contributor II

11-15-2023 7:25:48 AM

447 Views
2 replies
0 kudos

process logging optimisation

I have created a process that runs a notebook multiple times in parallel with different parameters. This was working quite quickly. However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to...

Data Engineering

logging

447 Views
2 replies
0 kudos

11-15-2023 7:25:48 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-23-2023 1:36:09 AM

0 kudos

Hi @BriGuy, Regarding your Databricks SQL Editor issue, you’re not alone! Several users have faced similar problems. Here are some steps you can take: Contact Databricks Support: I recommend contacting Databricks support. File a support ticket t...

0 kudos

11-23-2023 1:36:09 AM

1 More Replies

by data_turtle • New Contributor

11-16-2023 9:11:26 PM

733 Views
1 replies
0 kudos

Are init scripts breaking clusters?

My Jobs were running just fine, but for some reason all of a sudden they all started failing. When I looked into it I saw it was an error due to an init script error (we do use an init script). run failed with error message Cluster 1117-045226-l...

Data Engineering

733 Views
1 replies
0 kudos

11-16-2023 9:11:26 PM

View Replies

Latest Reply

User16539034020
Contributor II

12-11-2023 7:22:09 AM

0 kudos

Thank you for reaching out to Databricks Support. Could you please specify the location of the initialization script you are referring to? Additionally, it would be helpful to know whether this is a global init script or one specific to a cluster. ...

0 kudos

12-11-2023 7:22:09 AM

by Priyam1 • New Contributor III

12-10-2023 9:14:10 PM

1006 Views
3 replies
0 kudos

Databricks PAT Logs

As an admin, how can i check which external applications are being connected with databricks by people through Personal Access Token. I have used the token API to get the token list but i couldn't find any other REST API reference for obtaining the i...

Data Engineering

1006 Views
3 replies
0 kudos

12-10-2023 9:14:10 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-10-2023 10:41:48 PM

0 kudos

Hi @Priyam1, As an administrator, you can manage personal access tokens (PATs) in your Databricks workspace. These tokens allow users to authenticate to the Databricks REST API. Let’s explore how you can handle PATs and monitor external application...

0 kudos

12-10-2023 10:41:48 PM

2 More Replies

by kavya08 • New Contributor

12-08-2023 5:25:51 AM

4239 Views
1 replies
0 kudos

curl: (26) Failed to open/read local data from file/application in DBFS

Hi all,I am trying to upload a parquet file from S3 to dbfs with airflow bash operator curl command using Databricks python Rest API's as shown below databricks_load_task = BashOperator( task_id="upload_to_databricks", bash_command ...

Data Engineering

4239 Views
1 replies
0 kudos

12-08-2023 5:25:51 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-10-2023 11:19:14 PM

0 kudos

Hi @kavya08, There might be an issue with how the file path is specified in your curl command. File Path Issue: The --form contents="@s3://bucket/test/file.parquet" part of your curl command specifies the file to be uploaded. Ensure that the path to...

0 kudos

12-10-2023 11:19:14 PM

by sunil_ksheersag • New Contributor

12-08-2023 7:15:51 AM

654 Views
1 replies
0 kudos

synapse pyspark delta lake merge scd type2 without primary key

ProblemI have a set of rows coming from previous process which has no primary key, and the composite keys are bound to change which are not a good case for composite key, only way the rows are unique is the whole row( including all keys and all value...

Data Engineering

654 Views
1 replies
0 kudos

12-08-2023 7:15:51 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-10-2023 11:10:37 PM

0 kudos

Hi @sunil_ksheersag, Implementing Slowly Changing Dimension (SCD) Type 2 without a primary key can be challenging, but there are alternative approaches you can consider. Here are some strategies to handle this situation: Surrogate Key Approach: ...

0 kudos

12-10-2023 11:10:37 PM

by TinasheChinyati • New Contributor

12-08-2023 11:29:40 PM

1334 Views
1 replies
0 kudos

Is databricks capable of housing OLTP and OLAP?

Hi data experts.I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requ...

Data Engineering

1334 Views
1 replies
0 kudos

12-08-2023 11:29:40 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-10-2023 11:03:10 PM

0 kudos

Hi @TinasheChinyati, Migrating your OLTP and OLAP workloads into a lakehouse within Databricks is indeed possible. Load Data into the Lakehouse: Databricks provides tools and capabilities to make data migration to the lakehouse seamless. You can l...

0 kudos

12-10-2023 11:03:10 PM

by BasavaTest • New Contributor

12-10-2023 9:59:30 AM

319 Views
1 replies
0 kudos

I want to practice apache spark.

Data Engineering

319 Views
1 replies
0 kudos

12-10-2023 9:59:30 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-10-2023 10:47:10 PM

0 kudos

Hi @BasavaTest, Apache Spark™ is a powerful framework for large-scale distributed data processing and machine learning. Here are some resources to help you get started: Spark By {Examples}: This tutorial provides basic, simple examples of Spark ...

0 kudos

12-10-2023 10:47:10 PM

by mriccardi • New Contributor II

07-26-2022 6:10:34 AM

1443 Views
4 replies
1 kudos

Spark streaming: Checkpoint not recognising new data

Hello everyone!We are currently facing an issue with a stream that is not updating new data since the 20 of July.We've validated and bronze table has data that silver doesn't have.Also seeing the logs the silver stream is running but writing 0 files....

Data Engineering

1443 Views
4 replies
1 kudos

07-26-2022 6:10:34 AM

View Replies

Latest Reply

mriccardi
New Contributor II

07-26-2022 6:15:11 AM

1 kudos

Also the trigger is configured to run once, but when we start the job it never ends, it keeps in an endless loop.

1 kudos

07-26-2022 6:15:11 AM

3 More Replies

by thains • New Contributor III

12-08-2023 8:59:57 AM

979 Views
1 replies
0 kudos

Resolved! Error: cannot create mws storage configurations: default auth: cannot configure default credentials.

I’ve run into an error that I can't figure out how to debug. We're trying to use terraform through a service account. I don’t know if it’s a permissions issue on Databricks, in our account, or in AWS, but it seems that something is being blocked some...

Data Engineering

979 Views
1 replies
0 kudos

12-08-2023 8:59:57 AM

View Replies

Latest Reply

thains
New Contributor III

12-08-2023 10:59:31 AM

0 kudos

Ok. I found the issue here. We had a *second* place where we were setting up the databricks provider, which I had not updated with the proper client credentials.

0 kudos

12-08-2023 10:59:31 AM

by g96g • New Contributor III

06-19-2023 12:28:03 AM

5469 Views
4 replies
2 kudos

Resolved! If exists in databrickds sql

what is the equivalent of "IF EXISTS" in databricks? I would like to first check something later after that use the insert into statement.

Data Engineering

5469 Views
4 replies
2 kudos

06-19-2023 12:28:03 AM

View Replies

Latest Reply

WWoman
New Contributor II

11-16-2023 2:39:03 PM

2 kudos

Is there a way to check if a table exists, without trying to drop it? something like :select table_name from system_catalogs where database_name = 'mydb' and schema_name = 'myschema' and object_name = 'mytab';

2 kudos

11-16-2023 2:39:03 PM

3 More Replies

by DataGirl • New Contributor

09-08-2022 5:41:51 PM

3498 Views
5 replies
2 kudos

Multi value parameter on Power BI Paginated / SSRS connected to databricks using ODBC

Hi All, I'm wondering if anyone has had any luck setting up multi valued parameters on SSRS using ODBC connection to Databricks? I'm getting "Cannot add multi value query parameter" error everytime I change my parameter to multi value. In the query s...

Data Engineering

3498 Views
5 replies
2 kudos

09-08-2022 5:41:51 PM

View Replies

Latest Reply

TechMG
New Contributor II

12-08-2023 6:56:33 AM

2 kudos

Hello,I am facing similar kind of issue. I am working on Power BI paginated report and databricks is my source for the report. I was trying to pass the parameter by passing the query in expression builder as mentioned above. However, I have end up w...

2 kudos

12-08-2023 6:56:33 AM

4 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! Is Autoloader suitable to load full dumps?

Performance issue with spatial reference system conversions

How can I efficiently write to easily queryable logs?

Expand and read Zip compressed files not working

process logging optimisation

Are init scripts breaking clusters?

Databricks PAT Logs

curl: (26) Failed to open/read local data from file/application in DBFS

synapse pyspark delta lake merge scd type2 without primary key

Is databricks capable of housing OLTP and OLAP?

I want to practice apache spark.

Spark streaming: Checkpoint not recognising new data

Resolved! Error: cannot create mws storage configurations: default auth: cannot configure default credentials.

Resolved! If exists in databrickds sql

Multi value parameter on Power BI Paginated / SSRS connected to databricks using ODBC

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...