cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

quakenbush
by Contributor
  • 939 Views
  • 2 replies
  • 1 kudos

Resolved! Is Autoloader suitable to load full dumps?

Hi,I recently completed the fundamentals & advanced data engineer exam, yet I've got a question about Autoloader. Please don't go too hard on me, since I lack practical experience at this point in time Docs say this is incremental ingestion, so it's ...

  • 939 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Our End-of-Year Community Survey is here! Please take a few moments to complete the survey. Your feedback matters!

  • 1 kudos
1 More Replies
rvo1994
by New Contributor
  • 489 Views
  • 0 replies
  • 0 kudos

Performance issue with spatial reference system conversions

Hi,I am facing a performance issue with spatial reference system conversions. My delta table has approximately 10 GB/46 files/160M records and gets +/- 5M records every week. After ingestion, I need to convert points (columns GE_XY_XCOR and GE_XY_YCO...

  • 489 Views
  • 0 replies
  • 0 kudos
BriGuy
by New Contributor II
  • 896 Views
  • 0 replies
  • 0 kudos

How can I efficiently write to easily queryable logs?

I've got a parallel running process loading multiple tables into the datalake. I'm writing my logs to a delta table using dataframewriter in append mode. The problem is that every save is taking a bit of time with what appears to be the calculation o...

  • 896 Views
  • 0 replies
  • 0 kudos
MrDataMan
by New Contributor II
  • 970 Views
  • 2 replies
  • 0 kudos

Expand and read Zip compressed files not working

I am trying to unzip compressed files following this doc (https://docs.databricks.com/en/files/unzip-files.html) but I am getting the error.When I run:dbutils.fs.mv("file:/LoanStats3a.csv", "dbfs:/tmp/LoanStats3a.csv") I get the following error: java...

  • 970 Views
  • 2 replies
  • 0 kudos
Latest Reply
gabsylvain
New Contributor III
  • 0 kudos

Hey @MrDataMan, I wasn't able to reproduce the exact same error you did get, but I still got a similar error while trying to run the example. To solve it, I tweaked the code a little bit:   %sh curl https://resources.lendingclub.com/LoanStats3a.csv.z...

  • 0 kudos
1 More Replies
BriGuy
by New Contributor II
  • 925 Views
  • 2 replies
  • 0 kudos

process logging optimisation

I have created a process that runs a notebook multiple times in parallel with different parameters.  This was working quite quickly.  However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to...

  • 925 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BriGuy, Regarding your Databricks SQL Editor issue, you’re not alone! Several users have faced similar problems.    Here are some steps you can take:   Contact Databricks Support: I recommend contacting Databricks support. File a support ticket t...

  • 0 kudos
1 More Replies
data_turtle
by New Contributor
  • 1059 Views
  • 1 replies
  • 0 kudos

Are init scripts breaking clusters?

My Jobs were running just fine, but for some reason all of a sudden they all started failing.  When I looked into it I saw it was an error due to an init script error (we do use an init script).    run failed with error message Cluster 1117-045226-l...

  • 1059 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Contributor II
  • 0 kudos

  Thank you for reaching out to Databricks Support. Could you please specify the location of the initialization script you are referring to? Additionally, it would be helpful to know whether this is a global init script or one specific to a cluster. ...

  • 0 kudos
Priyam1
by New Contributor III
  • 1604 Views
  • 3 replies
  • 0 kudos

Databricks PAT Logs

As an admin, how can i check which external applications are being connected with databricks by people through Personal Access Token. I have used the token API to get the token list but i couldn't find any other REST API reference for obtaining the i...

  • 1604 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Priyam1, As an administrator, you can manage personal access tokens (PATs) in your Databricks workspace. These tokens allow users to authenticate to the Databricks REST API.   Let’s explore how you can handle PATs and monitor external application...

  • 0 kudos
2 More Replies
kavya08
by New Contributor
  • 5703 Views
  • 1 replies
  • 0 kudos

curl: (26) Failed to open/read local data from file/application in DBFS

Hi all,I am trying to upload a parquet file from S3 to dbfs with airflow bash operator curl command using Databricks python Rest API's as shown below   databricks_load_task = BashOperator( task_id="upload_to_databricks", bash_command ...

  • 5703 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @kavya08, There might be an issue with how the file path is specified in your curl command.  File Path Issue: The --form contents="@s3://bucket/test/file.parquet" part of your curl command specifies the file to be uploaded. Ensure that the path to...

  • 0 kudos
sunil_ksheersag
by New Contributor
  • 1025 Views
  • 1 replies
  • 0 kudos

synapse pyspark delta lake merge scd type2 without primary key

ProblemI have a set of rows coming from previous process which has no primary key, and the composite keys are bound to change which are not a good case for composite key, only way the rows are unique is the whole row( including all keys and all value...

  • 1025 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @sunil_ksheersag, Implementing Slowly Changing Dimension (SCD) Type 2 without a primary key can be challenging, but there are alternative approaches you can consider.    Here are some strategies to handle this situation:   Surrogate Key Approach: ...

  • 0 kudos
BasavaTest
by New Contributor
  • 511 Views
  • 1 replies
  • 0 kudos

I want to practice apache spark.

I want to practice apache spark.

  • 511 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BasavaTest, Apache Spark™ is a powerful framework for large-scale distributed data processing and machine learning.    Here are some resources to help you get started:   Spark By {Examples}: This tutorial provides basic, simple examples of Spark ...

  • 0 kudos
mriccardi
by New Contributor II
  • 1914 Views
  • 4 replies
  • 1 kudos

Spark streaming: Checkpoint not recognising new data

Hello everyone!We are currently facing an issue with a stream that is not updating new data since the 20 of July.We've validated and bronze table has data that silver doesn't have.Also seeing the logs the silver stream is running but writing 0 files....

  • 1914 Views
  • 4 replies
  • 1 kudos
Latest Reply
mriccardi
New Contributor II
  • 1 kudos

Also the trigger is configured to run once, but when we start the job it never ends, it keeps in an endless loop.

  • 1 kudos
3 More Replies
thains
by New Contributor III
  • 1379 Views
  • 1 replies
  • 0 kudos

Resolved! Error: cannot create mws storage configurations: default auth: cannot configure default credentials.

I’ve run into an error that I can't figure out how to debug. We're trying to use terraform through a service account. I don’t know if it’s a permissions issue on Databricks, in our account, or in AWS, but it seems that something is being blocked some...

  • 1379 Views
  • 1 replies
  • 0 kudos
Latest Reply
thains
New Contributor III
  • 0 kudos

Ok. I found the issue here. We had a *second* place where we were setting up the databricks provider, which I had not updated with the proper client credentials.

  • 0 kudos
g96g
by New Contributor III
  • 7051 Views
  • 4 replies
  • 2 kudos

Resolved! If exists in databrickds sql

what is the equivalent of "IF EXISTS" in databricks? I would like to first check something later after that use the insert into statement.

  • 7051 Views
  • 4 replies
  • 2 kudos
Latest Reply
WWoman
New Contributor III
  • 2 kudos

Is there a way to check if a table exists, without trying to drop it? something like :select table_name from system_catalogs where database_name = 'mydb' and schema_name = 'myschema' and object_name = 'mytab';

  • 2 kudos
3 More Replies
DataGirl
by New Contributor
  • 4395 Views
  • 5 replies
  • 2 kudos

Multi value parameter on Power BI Paginated / SSRS connected to databricks using ODBC

Hi All, I'm wondering if anyone has had any luck setting up multi valued parameters on SSRS using ODBC connection to Databricks? I'm getting "Cannot add multi value query parameter" error everytime I change my parameter to multi value. In the query s...

  • 4395 Views
  • 5 replies
  • 2 kudos
Latest Reply
TechMG
New Contributor II
  • 2 kudos

Hello,I am facing similar kind of issue.  I am working on Power BI paginated report and databricks is my source for the report. I was trying to pass the parameter by passing the query in expression builder as mentioned above. However, I have end up w...

  • 2 kudos
4 More Replies
deng_dev
by New Contributor III
  • 814 Views
  • 1 replies
  • 0 kudos

Run Query from another notebook in streaming job

Hi!We want to run query located in another notebook every streaming microbatch.We were trying dbutils.run.notebook but we always get errorContext not valid. If you are calling this outside the main thread, you must set the Notebook context via dbutil...

  • 814 Views
  • 1 replies
  • 0 kudos
Latest Reply
norbitek
New Contributor II
  • 0 kudos

Query Parameters means that you have to pass all parameters as a part of URL after question mark not in the body"/api/1.2/commands/status?clusterId=$cid&contextId=$ec_id&commandId=$command_id"

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels