cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Valentin1
by New Contributor III
  • 8805 Views
  • 6 replies
  • 3 kudos

Delta Live Tables Incremental Batch Loads & Failure Recovery

Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...

  • 8805 Views
  • 6 replies
  • 3 kudos
Latest Reply
lprevost
Contributor II
  • 3 kudos

I totally agree that this is a gap in the Databricks solution.  This gap exists between a static read and real time streaming.   My problem (and suspect there are many use cases) is that I have slowly changing data coming into structured folders via ...

  • 3 kudos
5 More Replies
Octavian1
by Contributor
  • 1343 Views
  • 2 replies
  • 1 kudos

Path of artifacts not found error in pyfunc.load_model using pyfunc wrapper

Hi,For a PySpark model, which involves also a pipeline, and that I want to register with mlflow, I am using a pyfunc wrapper.Steps I followed:1. Pipeline and model serialization and logging (using Volume locally, the logging will be performed in dbfs...

  • 1343 Views
  • 2 replies
  • 1 kudos
Latest Reply
pikapika
New Contributor II
  • 1 kudos

Stuck with the same issue however I managed to load it ( was looking to serve it using model serving as well ),One thing I noticed is that we can use mlflow.create_experiment() in the beginning and specify the default artifact location parameter as D...

  • 1 kudos
1 More Replies
KristiLogos
by Contributor
  • 2198 Views
  • 9 replies
  • 4 kudos

Resolved! Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n' in schema

I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:df = spark.read.opti...

  • 2198 Views
  • 9 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @KristiLogos ,Check if your JSON doesn't have characters contained in error message in it's key values. 

  • 4 kudos
8 More Replies
wendyl
by New Contributor II
  • 1122 Views
  • 3 replies
  • 0 kudos

Connection Refused: [Databricks][JDBC](11640) Required Connection Key(s): PWD;

Hey I'm trying to connect to Databricks using client id and secrets. I'm using JDBC 2.6.38.I'm using the following connection url: jdbc:databricks://<server-hostname>:443;httpPath=<http-path>;AuthMech=11;Auth_Flow=1;OAuth2ClientId=<service-principal-...

  • 1122 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @wendyl ,Could you give as an answer for the following questions? - does your workspace have private link ?- do you use  Microsoft Entra ID managed service principal ?- if you used Entra ID managed SP, did you use secret from Entra ID, or Azure Da...

  • 0 kudos
2 More Replies
Himanshu4
by New Contributor II
  • 2357 Views
  • 5 replies
  • 2 kudos

Inquiry Regarding Enabling Unity Catalog in Databricks Cluster Configuration via API

Dear Databricks Community,I hope this message finds you well. I am currently working on automating cluster configuration updates in Databricks using the API. As part of this automation, I am looking to ensure that the Unity Catalog is enabled within ...

  • 2357 Views
  • 5 replies
  • 2 kudos
Latest Reply
Himanshu4
New Contributor II
  • 2 kudos

Hi RaphaelCan we fetch job details from one workspace and create new job in new workspace with the same "job id" and configuration?

  • 2 kudos
4 More Replies
mayur_05
by New Contributor II
  • 971 Views
  • 3 replies
  • 0 kudos

access cluster executor logs

Hi Team,I want to get realtime log for cluster executor and driver stderr/stdout log while performing data operations and save those log in catalog's volume

  • 971 Views
  • 3 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

you can create it for Job Clusters compute too. The specific cluster log folder will be under /dbfs/cluster-logs (or whatever you change it to)    

  • 0 kudos
2 More Replies
TheManOfSteele
by New Contributor III
  • 1688 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks-connect Configure a connection to serverless compute Not working

Following these instructions, at https://docs.databricks.com/en/dev-tools/databricks-connect/python/install.html#configure-a-connection-to-serverless-compute There seems to be an issue with the example code.from databricks.connect import DatabricksSe...

  • 1688 Views
  • 2 replies
  • 0 kudos
Latest Reply
TheManOfSteele
New Contributor III
  • 0 kudos

Worked! Thank you!

  • 0 kudos
1 More Replies
Dave_Nithio
by Contributor II
  • 883 Views
  • 1 replies
  • 0 kudos

Delta Table Log History not Updating

I am running into an issue related to my Delta Log and an old version. I currently have default delta settings for delta.checkpointInterval (10 commits as this table was created prior to DBR 11.1), delta.deletedFileRetentionDuration (7 days), and del...

Dave_Nithio_4-1726759906146.png Dave_Nithio_2-1726759822867.png Dave_Nithio_1-1726759722776.png Dave_Nithio_5-1726760080078.png
  • 883 Views
  • 1 replies
  • 0 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 0 kudos

@Dave_Nithio wrote:I am running into an issue related to my Delta Log and an old version. I currently have default delta settings for delta.checkpointInterval (10 commits as this table was created prior to DBR 11.1), delta.deletedFileRetentionDuratio...

  • 0 kudos
hpant
by New Contributor III
  • 815 Views
  • 1 replies
  • 0 kudos

" ResourceNotFound" error is coming on connecting devops repo to databricks workflow(job).

I have a .py file in a repo in azure devops,I want to add it in a workflow in databricks and these are the values I have provided. And the source is this:I have provided all the values correctly but getting this error: " ResourceNotFound". Can someon...

hpant_0-1725539147316.png hpant_2-1725539295054.png hpant_3-1725539358879.png
  • 815 Views
  • 1 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Can you try cloning the DevOps repo as a Git folder? The git folder clone interface should ask you to set up a Git credential if it's not already there.

  • 0 kudos
drii_cavalcanti
by New Contributor III
  • 3317 Views
  • 3 replies
  • 0 kudos

DBUtils commands do not work on shared access mode clusters

Hi there,I am trying to upload a file to an s3 bucket. However, none of dbutils commands seem to work neither does the boto3 library. For clusters that have the configuration, except for the shared access mode, seem to work fine.Those are the error m...

  • 3317 Views
  • 3 replies
  • 0 kudos
Latest Reply
mvdilts1
New Contributor II
  • 0 kudos

I am encountering very similar behavior to drii_cavalcanti.  When I use a Shared cluster with an IAM Role specified I can verify that the aws cli is installed but when I run aws sts get-caller-identity I receive the error "Unable to locate credential...

  • 0 kudos
2 More Replies
ziafazal
by New Contributor II
  • 1089 Views
  • 3 replies
  • 0 kudos

How to stop a continuous pipeline which is set to RETRY on FAILURE and failing for some reason

I have created a pipeline which is continuous and set to RETRY on FAILURE. For some reason it keeps failing and retrying. Is there any way I can stop it. Hitting Stop button throws an error.

  • 1089 Views
  • 3 replies
  • 0 kudos
Latest Reply
ziafazal
New Contributor II
  • 0 kudos

Hi @szymon_dybczak I already tried to remove it via REST API but got same error as in the pipeline logs. Eventually, I had to remove workspace to get rid of it.

  • 0 kudos
2 More Replies
jen-metaplane
by New Contributor II
  • 1076 Views
  • 4 replies
  • 1 kudos

How to get catalog and schema from system query table

Hi,We are querying the system.query table to parse query history. If the table in the query is not fully qualified with its catalog and schema, how can we derive the catalog and schema?Thanks,Jen

  • 1076 Views
  • 4 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

There is no straightforward method to get this data. Run the query to check the defaults:SELECT current_catalog() AS default_catalog, current_schema() AS default_schema;Catalog and schema may be changed in the query, so if you have query text you...

  • 1 kudos
3 More Replies
sshukla
by New Contributor III
  • 1989 Views
  • 9 replies
  • 1 kudos

Java heap issue, GC allocation failure while writing data from mysql to adls

Hi Team,I am reading 60 million -80million data from mysql server and writing into ADLS in parquet format but i am getting java heap issue, GC allocation failure and out of memory issue.below are my cluster configuration  Driver - 56GB Ram, 16 coreWo...

  • 1989 Views
  • 9 replies
  • 1 kudos
Latest Reply
shaza606
New Contributor II
  • 1 kudos

Hello good man 

  • 1 kudos
8 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 3486 Views
  • 4 replies
  • 0 kudos

On-behalf-of token creation for service principals is not enabled for this workspace

Hi AllI just wanted to create PAT for Databricks Service Principle but getting below code while hitting API or using CLI - Please help me to create PAT for the same.#dataengineering #databricks

AjayPandey_0-1710845262519.png AjayPandey_1-1710845276557.png
Data Engineering
community
Databricks
  • 3486 Views
  • 4 replies
  • 0 kudos
Latest Reply
MorpheusGoGo
New Contributor II
  • 0 kudos

This only works if you are on AWS or GCP, no support for Azure  Check the API documentation AWS - https://docs.databricks.com/api/workspace/tokenmanagement/createobotokenNo such documentation exists for Azure.

  • 0 kudos
3 More Replies
HansAdriaans
by New Contributor II
  • 1513 Views
  • 1 replies
  • 0 kudos

Can not open socket to local (127.0.0.1)

Hi, I'm running a databricks pipeline hourly using python notebooks checked out from git with on-demand compute (using r6gd.xlarge 32GB + 4 CPU's Gravaton). Most of the times the pipeline runs without problems. However, sometimes the first notebook f...

  • 1513 Views
  • 1 replies
  • 0 kudos
Latest Reply
HansAdriaans
New Contributor II
  • 0 kudos

Short update, I changed the script a bit by simply adding a display function just before the running the collect and this seems to work for now

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels