cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

quakenbush
by Contributor
  • 579 Views
  • 2 replies
  • 1 kudos

Resolved! Is Autoloader suitable to load full dumps?

Hi,I recently completed the fundamentals & advanced data engineer exam, yet I've got a question about Autoloader. Please don't go too hard on me, since I lack practical experience at this point in time Docs say this is incremental ingestion, so it's ...

  • 579 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Our End-of-Year Community Survey is here! Please take a few moments to complete the survey. Your feedback matters!

  • 1 kudos
1 More Replies
rvo1994
by New Contributor
  • 313 Views
  • 0 replies
  • 0 kudos

Performance issue with spatial reference system conversions

Hi,I am facing a performance issue with spatial reference system conversions. My delta table has approximately 10 GB/46 files/160M records and gets +/- 5M records every week. After ingestion, I need to convert points (columns GE_XY_XCOR and GE_XY_YCO...

  • 313 Views
  • 0 replies
  • 0 kudos
BriGuy
by New Contributor II
  • 656 Views
  • 0 replies
  • 0 kudos

How can I efficiently write to easily queryable logs?

I've got a parallel running process loading multiple tables into the datalake. I'm writing my logs to a delta table using dataframewriter in append mode. The problem is that every save is taking a bit of time with what appears to be the calculation o...

  • 656 Views
  • 0 replies
  • 0 kudos
MrDataMan
by New Contributor II
  • 595 Views
  • 2 replies
  • 0 kudos

Expand and read Zip compressed files not working

I am trying to unzip compressed files following this doc (https://docs.databricks.com/en/files/unzip-files.html) but I am getting the error.When I run:dbutils.fs.mv("file:/LoanStats3a.csv", "dbfs:/tmp/LoanStats3a.csv") I get the following error: java...

  • 595 Views
  • 2 replies
  • 0 kudos
Latest Reply
gabsylvain
New Contributor III
  • 0 kudos

Hey @MrDataMan, I wasn't able to reproduce the exact same error you did get, but I still got a similar error while trying to run the example. To solve it, I tweaked the code a little bit:   %sh curl https://resources.lendingclub.com/LoanStats3a.csv.z...

  • 0 kudos
1 More Replies
BriGuy
by New Contributor II
  • 447 Views
  • 2 replies
  • 0 kudos

process logging optimisation

I have created a process that runs a notebook multiple times in parallel with different parameters.  This was working quite quickly.  However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to...

  • 447 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BriGuy, Regarding your Databricks SQL Editor issue, you’re not alone! Several users have faced similar problems.    Here are some steps you can take:   Contact Databricks Support: I recommend contacting Databricks support. File a support ticket t...

  • 0 kudos
1 More Replies
data_turtle
by New Contributor
  • 733 Views
  • 1 replies
  • 0 kudos

Are init scripts breaking clusters?

My Jobs were running just fine, but for some reason all of a sudden they all started failing.  When I looked into it I saw it was an error due to an init script error (we do use an init script).    run failed with error message Cluster 1117-045226-l...

  • 733 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Contributor II
  • 0 kudos

  Thank you for reaching out to Databricks Support. Could you please specify the location of the initialization script you are referring to? Additionally, it would be helpful to know whether this is a global init script or one specific to a cluster. ...

  • 0 kudos
Priyam1
by New Contributor III
  • 1006 Views
  • 3 replies
  • 0 kudos

Databricks PAT Logs

As an admin, how can i check which external applications are being connected with databricks by people through Personal Access Token. I have used the token API to get the token list but i couldn't find any other REST API reference for obtaining the i...

  • 1006 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Priyam1, As an administrator, you can manage personal access tokens (PATs) in your Databricks workspace. These tokens allow users to authenticate to the Databricks REST API.   Let’s explore how you can handle PATs and monitor external application...

  • 0 kudos
2 More Replies
kavya08
by New Contributor
  • 4239 Views
  • 1 replies
  • 0 kudos

curl: (26) Failed to open/read local data from file/application in DBFS

Hi all,I am trying to upload a parquet file from S3 to dbfs with airflow bash operator curl command using Databricks python Rest API's as shown below   databricks_load_task = BashOperator( task_id="upload_to_databricks", bash_command ...

  • 4239 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @kavya08, There might be an issue with how the file path is specified in your curl command.  File Path Issue: The --form contents="@s3://bucket/test/file.parquet" part of your curl command specifies the file to be uploaded. Ensure that the path to...

  • 0 kudos
sunil_ksheersag
by New Contributor
  • 654 Views
  • 1 replies
  • 0 kudos

synapse pyspark delta lake merge scd type2 without primary key

ProblemI have a set of rows coming from previous process which has no primary key, and the composite keys are bound to change which are not a good case for composite key, only way the rows are unique is the whole row( including all keys and all value...

  • 654 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @sunil_ksheersag, Implementing Slowly Changing Dimension (SCD) Type 2 without a primary key can be challenging, but there are alternative approaches you can consider.    Here are some strategies to handle this situation:   Surrogate Key Approach: ...

  • 0 kudos
TinasheChinyati
by New Contributor
  • 1334 Views
  • 1 replies
  • 0 kudos

Is databricks capable of housing OLTP and OLAP?

Hi data experts.I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requ...

  • 1334 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @TinasheChinyati, Migrating your OLTP and OLAP workloads into a lakehouse within Databricks is indeed possible.    Load Data into the Lakehouse: Databricks provides tools and capabilities to make data migration to the lakehouse seamless. You can l...

  • 0 kudos
BasavaTest
by New Contributor
  • 319 Views
  • 1 replies
  • 0 kudos

I want to practice apache spark.

I want to practice apache spark.

  • 319 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BasavaTest, Apache Sparkâ„¢ is a powerful framework for large-scale distributed data processing and machine learning.    Here are some resources to help you get started:   Spark By {Examples}: This tutorial provides basic, simple examples of Spark ...

  • 0 kudos
mriccardi
by New Contributor II
  • 1443 Views
  • 4 replies
  • 1 kudos

Spark streaming: Checkpoint not recognising new data

Hello everyone!We are currently facing an issue with a stream that is not updating new data since the 20 of July.We've validated and bronze table has data that silver doesn't have.Also seeing the logs the silver stream is running but writing 0 files....

  • 1443 Views
  • 4 replies
  • 1 kudos
Latest Reply
mriccardi
New Contributor II
  • 1 kudos

Also the trigger is configured to run once, but when we start the job it never ends, it keeps in an endless loop.

  • 1 kudos
3 More Replies
thains
by New Contributor III
  • 979 Views
  • 1 replies
  • 0 kudos

Resolved! Error: cannot create mws storage configurations: default auth: cannot configure default credentials.

I’ve run into an error that I can't figure out how to debug. We're trying to use terraform through a service account. I don’t know if it’s a permissions issue on Databricks, in our account, or in AWS, but it seems that something is being blocked some...

  • 979 Views
  • 1 replies
  • 0 kudos
Latest Reply
thains
New Contributor III
  • 0 kudos

Ok. I found the issue here. We had a *second* place where we were setting up the databricks provider, which I had not updated with the proper client credentials.

  • 0 kudos
g96g
by New Contributor III
  • 5469 Views
  • 4 replies
  • 2 kudos

Resolved! If exists in databrickds sql

what is the equivalent of "IF EXISTS" in databricks? I would like to first check something later after that use the insert into statement.

  • 5469 Views
  • 4 replies
  • 2 kudos
Latest Reply
WWoman
New Contributor II
  • 2 kudos

Is there a way to check if a table exists, without trying to drop it? something like :select table_name from system_catalogs where database_name = 'mydb' and schema_name = 'myschema' and object_name = 'mytab';

  • 2 kudos
3 More Replies
DataGirl
by New Contributor
  • 3498 Views
  • 5 replies
  • 2 kudos

Multi value parameter on Power BI Paginated / SSRS connected to databricks using ODBC

Hi All, I'm wondering if anyone has had any luck setting up multi valued parameters on SSRS using ODBC connection to Databricks? I'm getting "Cannot add multi value query parameter" error everytime I change my parameter to multi value. In the query s...

  • 3498 Views
  • 5 replies
  • 2 kudos
Latest Reply
TechMG
New Contributor II
  • 2 kudos

Hello,I am facing similar kind of issue.  I am working on Power BI paginated report and databricks is my source for the report. I was trying to pass the parameter by passing the query in expression builder as mentioned above. However, I have end up w...

  • 2 kudos
4 More Replies
Labels
Top Kudoed Authors