cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pankz-104
by New Contributor
  • 580 Views
  • 2 replies
  • 0 kudos

how to read deleted files in adls

We have soft delete enabled in adls for 3 days, And we have manually deleted some checkpoint files size 3 tb approx. Each file is just couple of bytes like 30 b, 40 b. The deleted file size is increasing day by day even after couple of days. Suppose ...

  • 580 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @pankz-104 , Just a friendly follow-up. Did you have time to test Kaniz's recommendations? do you still have issues? please let us know

  • 0 kudos
1 More Replies
Chris_sh
by New Contributor II
  • 315 Views
  • 1 replies
  • 0 kudos

DLT Missing Select tables button or Enhancement Request?

Currently when a Delta Live Table fails due to an error the option to select specific tables to run a full refresh is removed. This seems like an error. A full refresh can fix an error that might be caused and you should always be able to select to d...

Chris_sh_0-1700066428625.png Chris_sh_2-1700066503115.png
  • 315 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Chris_sh, which DLT channel are you using? 

  • 0 kudos
Rajaniesh
by New Contributor III
  • 1186 Views
  • 3 replies
  • 1 kudos

URGENT HELP NEEDED: Python functions deployed in the cluster throwing the error

Hi,I have created a python wheel with the following code. And the package name is rule_engine"""The entry point of the Python Wheel"""import sysfrom pyspark.sql.functions import expr, coldef get_rules(tag): """  loads data quality rules from a table ...

  • 1186 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

You can find more details and examples here https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#use-a-python-wheel-in-a-databricks-job

  • 1 kudos
2 More Replies
dcc
by New Contributor
  • 568 Views
  • 2 replies
  • 0 kudos

DBT Jobs || API call returns "Internal Error"

Hey there,I am currently using the Databricks API to trigger a specific DBT job. For this, I am calling the API in a Web Activity on Azure datafactory and sending as headers the token and for the body I am sending the Job ID and the necessary vars I ...

Data Engineering
2108
API
jobs
  • 568 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Could you please share the driver logs? it will help us to narrow down the issue

  • 0 kudos
1 More Replies
chari
by Contributor
  • 1717 Views
  • 4 replies
  • 2 kudos

Resolved! Connect to data in one drive to Azure Databricks

Hello,A colleague of mine previously built a data pipeline for connecting data available on share point (one drive), coded in python in jupyter notebook. Now, its my job to transfer the code to Azure databricks and I am unable to connect/download thi...

  • 1717 Views
  • 4 replies
  • 2 kudos
Latest Reply
gabsylvain
New Contributor III
  • 2 kudos

@chari Also you ingest both Sharepoint and OneDrive data directly into Databricks using Partner Connect. You can refer to the documentation bellow for more information: Connect to Fivetran using Partner Connect Fivetran Sharepoint Connector Documenta...

  • 2 kudos
3 More Replies
quakenbush
by Contributor
  • 519 Views
  • 2 replies
  • 1 kudos

Resolved! Is Autoloader suitable to load full dumps?

Hi,I recently completed the fundamentals & advanced data engineer exam, yet I've got a question about Autoloader. Please don't go too hard on me, since I lack practical experience at this point in time Docs say this is incremental ingestion, so it's ...

  • 519 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Our End-of-Year Community Survey is here! Please take a few moments to complete the survey. Your feedback matters!

  • 1 kudos
1 More Replies
rvo1994
by New Contributor
  • 301 Views
  • 0 replies
  • 0 kudos

Performance issue with spatial reference system conversions

Hi,I am facing a performance issue with spatial reference system conversions. My delta table has approximately 10 GB/46 files/160M records and gets +/- 5M records every week. After ingestion, I need to convert points (columns GE_XY_XCOR and GE_XY_YCO...

  • 301 Views
  • 0 replies
  • 0 kudos
BriGuy
by New Contributor II
  • 630 Views
  • 0 replies
  • 0 kudos

How can I efficiently write to easily queryable logs?

I've got a parallel running process loading multiple tables into the datalake. I'm writing my logs to a delta table using dataframewriter in append mode. The problem is that every save is taking a bit of time with what appears to be the calculation o...

  • 630 Views
  • 0 replies
  • 0 kudos
MrDataMan
by New Contributor II
  • 553 Views
  • 2 replies
  • 0 kudos

Expand and read Zip compressed files not working

I am trying to unzip compressed files following this doc (https://docs.databricks.com/en/files/unzip-files.html) but I am getting the error.When I run:dbutils.fs.mv("file:/LoanStats3a.csv", "dbfs:/tmp/LoanStats3a.csv") I get the following error: java...

  • 553 Views
  • 2 replies
  • 0 kudos
Latest Reply
gabsylvain
New Contributor III
  • 0 kudos

Hey @MrDataMan, I wasn't able to reproduce the exact same error you did get, but I still got a similar error while trying to run the example. To solve it, I tweaked the code a little bit:   %sh curl https://resources.lendingclub.com/LoanStats3a.csv.z...

  • 0 kudos
1 More Replies
BriGuy
by New Contributor II
  • 418 Views
  • 2 replies
  • 0 kudos

process logging optimisation

I have created a process that runs a notebook multiple times in parallel with different parameters.  This was working quite quickly.  However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to...

  • 418 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BriGuy, Regarding your Databricks SQL Editor issue, you’re not alone! Several users have faced similar problems.    Here are some steps you can take:   Contact Databricks Support: I recommend contacting Databricks support. File a support ticket t...

  • 0 kudos
1 More Replies
data_turtle
by New Contributor
  • 670 Views
  • 1 replies
  • 0 kudos

Are init scripts breaking clusters?

My Jobs were running just fine, but for some reason all of a sudden they all started failing.  When I looked into it I saw it was an error due to an init script error (we do use an init script).    run failed with error message Cluster 1117-045226-l...

  • 670 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Contributor II
  • 0 kudos

  Thank you for reaching out to Databricks Support. Could you please specify the location of the initialization script you are referring to? Additionally, it would be helpful to know whether this is a global init script or one specific to a cluster. ...

  • 0 kudos
Priyam1
by New Contributor III
  • 942 Views
  • 3 replies
  • 0 kudos

Databricks PAT Logs

As an admin, how can i check which external applications are being connected with databricks by people through Personal Access Token. I have used the token API to get the token list but i couldn't find any other REST API reference for obtaining the i...

  • 942 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Priyam1, As an administrator, you can manage personal access tokens (PATs) in your Databricks workspace. These tokens allow users to authenticate to the Databricks REST API.   Let’s explore how you can handle PATs and monitor external application...

  • 0 kudos
2 More Replies
kavya08
by New Contributor
  • 4040 Views
  • 1 replies
  • 0 kudos

curl: (26) Failed to open/read local data from file/application in DBFS

Hi all,I am trying to upload a parquet file from S3 to dbfs with airflow bash operator curl command using Databricks python Rest API's as shown below   databricks_load_task = BashOperator( task_id="upload_to_databricks", bash_command ...

  • 4040 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @kavya08, There might be an issue with how the file path is specified in your curl command.  File Path Issue: The --form contents="@s3://bucket/test/file.parquet" part of your curl command specifies the file to be uploaded. Ensure that the path to...

  • 0 kudos
sunil_ksheersag
by New Contributor
  • 609 Views
  • 1 replies
  • 0 kudos

synapse pyspark delta lake merge scd type2 without primary key

ProblemI have a set of rows coming from previous process which has no primary key, and the composite keys are bound to change which are not a good case for composite key, only way the rows are unique is the whole row( including all keys and all value...

  • 609 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @sunil_ksheersag, Implementing Slowly Changing Dimension (SCD) Type 2 without a primary key can be challenging, but there are alternative approaches you can consider.    Here are some strategies to handle this situation:   Surrogate Key Approach: ...

  • 0 kudos
TinasheChinyati
by New Contributor
  • 1272 Views
  • 1 replies
  • 0 kudos

Is databricks capable of housing OLTP and OLAP?

Hi data experts.I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requ...

  • 1272 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @TinasheChinyati, Migrating your OLTP and OLAP workloads into a lakehouse within Databricks is indeed possible.    Load Data into the Lakehouse: Databricks provides tools and capabilities to make data migration to the lakehouse seamless. You can l...

  • 0 kudos
Labels