cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Rajaniesh
by New Contributor III
  • 4160 Views
  • 2 replies
  • 1 kudos

URGENT HELP NEEDED: Python functions deployed in the cluster throwing the error

Hi,I have created a python wheel with the following code. And the package name is rule_engine"""The entry point of the Python Wheel"""import sysfrom pyspark.sql.functions import expr, coldef get_rules(tag): """  loads data quality rules from a table ...

  • 4160 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

You can find more details and examples here https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#use-a-python-wheel-in-a-databricks-job

  • 1 kudos
1 More Replies
dcc
by New Contributor
  • 9068 Views
  • 1 replies
  • 0 kudos

DBT Jobs || API call returns "Internal Error"

Hey there,I am currently using the Databricks API to trigger a specific DBT job. For this, I am calling the API in a Web Activity on Azure datafactory and sending as headers the token and for the body I am sending the Job ID and the necessary vars I ...

  • 9068 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Could you please share the driver logs? it will help us to narrow down the issue

  • 0 kudos
chari
by Contributor
  • 19258 Views
  • 4 replies
  • 2 kudos

Resolved! Connect to data in one drive to Azure Databricks

Hello,A colleague of mine previously built a data pipeline for connecting data available on share point (one drive), coded in python in jupyter notebook. Now, its my job to transfer the code to Azure databricks and I am unable to connect/download thi...

  • 19258 Views
  • 4 replies
  • 2 kudos
Latest Reply
gabsylvain
Databricks Employee
  • 2 kudos

@chari Also you ingest both Sharepoint and OneDrive data directly into Databricks using Partner Connect. You can refer to the documentation bellow for more information: Connect to Fivetran using Partner Connect Fivetran Sharepoint Connector Documenta...

  • 2 kudos
3 More Replies
rvo1994
by New Contributor
  • 8044 Views
  • 0 replies
  • 0 kudos

Performance issue with spatial reference system conversions

Hi,I am facing a performance issue with spatial reference system conversions. My delta table has approximately 10 GB/46 files/160M records and gets +/- 5M records every week. After ingestion, I need to convert points (columns GE_XY_XCOR and GE_XY_YCO...

  • 8044 Views
  • 0 replies
  • 0 kudos
BriGuy
by New Contributor II
  • 2184 Views
  • 0 replies
  • 0 kudos

How can I efficiently write to easily queryable logs?

I've got a parallel running process loading multiple tables into the datalake. I'm writing my logs to a delta table using dataframewriter in append mode. The problem is that every save is taking a bit of time with what appears to be the calculation o...

  • 2184 Views
  • 0 replies
  • 0 kudos
MrDataMan
by New Contributor II
  • 3355 Views
  • 2 replies
  • 0 kudos

Expand and read Zip compressed files not working

I am trying to unzip compressed files following this doc (https://docs.databricks.com/en/files/unzip-files.html) but I am getting the error.When I run:dbutils.fs.mv("file:/LoanStats3a.csv", "dbfs:/tmp/LoanStats3a.csv") I get the following error: java...

  • 3355 Views
  • 2 replies
  • 0 kudos
Latest Reply
gabsylvain
Databricks Employee
  • 0 kudos

Hey @MrDataMan, I wasn't able to reproduce the exact same error you did get, but I still got a similar error while trying to run the example. To solve it, I tweaked the code a little bit:   %sh curl https://resources.lendingclub.com/LoanStats3a.csv.z...

  • 0 kudos
1 More Replies
BriGuy
by New Contributor II
  • 2263 Views
  • 1 replies
  • 0 kudos

process logging optimisation

I have created a process that runs a notebook multiple times in parallel with different parameters.  This was working quite quickly.  However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to...

  • 2263 Views
  • 1 replies
  • 0 kudos
data_turtle
by New Contributor
  • 3061 Views
  • 1 replies
  • 0 kudos

Are init scripts breaking clusters?

My Jobs were running just fine, but for some reason all of a sudden they all started failing.  When I looked into it I saw it was an error due to an init script error (we do use an init script).    run failed with error message Cluster 1117-045226-l...

  • 3061 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Databricks Employee
  • 0 kudos

  Thank you for reaching out to Databricks Support. Could you please specify the location of the initialization script you are referring to? Additionally, it would be helpful to know whether this is a global init script or one specific to a cluster. ...

  • 0 kudos
Priyam1
by New Contributor III
  • 8237 Views
  • 1 replies
  • 0 kudos

Databricks PAT Logs

As an admin, how can i check which external applications are being connected with databricks by people through Personal Access Token. I have used the token API to get the token list but i couldn't find any other REST API reference for obtaining the i...

  • 8237 Views
  • 1 replies
  • 0 kudos
cyong
by New Contributor II
  • 2043 Views
  • 1 replies
  • 0 kudos

Disable CDF on DLT tables

Hi, I noticed Change Data Feed (CDF) is enabled by default for the bronze and gold tables running in DLT. How to check the size of the delta log? Can it be turned off?

  • 2043 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi, I dont believe CDF is enabled by default, please see:"Change data feed is not enabled by default..." in this doc: https://docs.databricks.com/en/delta/delta-change-data-feed.html. If it was mistakenly enabled at a table-level it could be disabled...

  • 0 kudos
quakenbush
by Contributor
  • 2367 Views
  • 0 replies
  • 0 kudos

Is Autoloader suitable to load full dumps?

Hi,I recently completed the fundamentals & advanced data engineer exam, yet I've got a question about Autoloader. Please don't go too hard on me, since I lack practical experience at this point in time Docs say this is incremental ingestion, so it's ...

  • 2367 Views
  • 0 replies
  • 0 kudos
mriccardi
by New Contributor II
  • 5668 Views
  • 4 replies
  • 1 kudos

Spark streaming: Checkpoint not recognising new data

Hello everyone!We are currently facing an issue with a stream that is not updating new data since the 20 of July.We've validated and bronze table has data that silver doesn't have.Also seeing the logs the silver stream is running but writing 0 files....

  • 5668 Views
  • 4 replies
  • 1 kudos
Latest Reply
mriccardi
New Contributor II
  • 1 kudos

Also the trigger is configured to run once, but when we start the job it never ends, it keeps in an endless loop.

  • 1 kudos
3 More Replies
thains
by New Contributor III
  • 17293 Views
  • 1 replies
  • 0 kudos

Resolved! Error: cannot create mws storage configurations: default auth: cannot configure default credentials.

I’ve run into an error that I can't figure out how to debug. We're trying to use terraform through a service account. I don’t know if it’s a permissions issue on Databricks, in our account, or in AWS, but it seems that something is being blocked some...

  • 17293 Views
  • 1 replies
  • 0 kudos
Latest Reply
thains
New Contributor III
  • 0 kudos

Ok. I found the issue here. We had a *second* place where we were setting up the databricks provider, which I had not updated with the proper client credentials.

  • 0 kudos
g96g
by New Contributor III
  • 18617 Views
  • 4 replies
  • 2 kudos

Resolved! If exists in databrickds sql

what is the equivalent of "IF EXISTS" in databricks? I would like to first check something later after that use the insert into statement.

  • 18617 Views
  • 4 replies
  • 2 kudos
Latest Reply
WWoman
Databricks Partner
  • 2 kudos

Is there a way to check if a table exists, without trying to drop it? something like :select table_name from system_catalogs where database_name = 'mydb' and schema_name = 'myschema' and object_name = 'mytab';

  • 2 kudos
3 More Replies
Labels