cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lprevost
by Contributor II
  • 2251 Views
  • 5 replies
  • 0 kudos

Large/complex Incremental Autoloader Job -- Seeking Experience on approach

I'm experimenting with several approaches to implement an incremental autoloader query either in DLT or in a pipeline job.   The complexities:- Moving approximately 30B records from a nasty set of nested folders on S3 in several thousand csv files.  ...

  • 2251 Views
  • 5 replies
  • 0 kudos
Latest Reply
lprevost
Contributor II
  • 0 kudos

Crickets....

  • 0 kudos
4 More Replies
lprevost
by Contributor II
  • 658 Views
  • 1 replies
  • 0 kudos

Using GraphFrames on DLT job

I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.   I am using it successfully in my job compute instances.  Here are my overrides for the standard job compute policy: {"spark_version": {"type": "unlimited","defau...

  • 658 Views
  • 1 replies
  • 0 kudos
Latest Reply
lprevost
Contributor II
  • 0 kudos

Crickets ....

  • 0 kudos
lprevost
by Contributor II
  • 1142 Views
  • 2 replies
  • 0 kudos

GraphFrames and DLT

I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.   I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job.  Here are my overrides for the standard job c...

  • 1142 Views
  • 2 replies
  • 0 kudos
Latest Reply
lprevost
Contributor II
  • 0 kudos

Crickets .....

  • 0 kudos
1 More Replies
Valentin1
by New Contributor III
  • 11952 Views
  • 6 replies
  • 3 kudos

Delta Live Tables Incremental Batch Loads & Failure Recovery

Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...

  • 11952 Views
  • 6 replies
  • 3 kudos
Latest Reply
lprevost
Contributor II
  • 3 kudos

I totally agree that this is a gap in the Databricks solution.  This gap exists between a static read and real time streaming.   My problem (and suspect there are many use cases) is that I have slowly changing data coming into structured folders via ...

  • 3 kudos
5 More Replies
Octavian1
by Contributor
  • 2555 Views
  • 2 replies
  • 1 kudos

Path of artifacts not found error in pyfunc.load_model using pyfunc wrapper

Hi,For a PySpark model, which involves also a pipeline, and that I want to register with mlflow, I am using a pyfunc wrapper.Steps I followed:1. Pipeline and model serialization and logging (using Volume locally, the logging will be performed in dbfs...

  • 2555 Views
  • 2 replies
  • 1 kudos
Latest Reply
pikapika
New Contributor II
  • 1 kudos

Stuck with the same issue however I managed to load it ( was looking to serve it using model serving as well ),One thing I noticed is that we can use mlflow.create_experiment() in the beginning and specify the default artifact location parameter as D...

  • 1 kudos
1 More Replies
KristiLogos
by Contributor
  • 3811 Views
  • 9 replies
  • 4 kudos

Resolved! Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n' in schema

I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:df = spark.read.opti...

  • 3811 Views
  • 9 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @KristiLogos ,Check if your JSON doesn't have characters contained in error message in it's key values. 

  • 4 kudos
8 More Replies
wendyl
by New Contributor II
  • 2068 Views
  • 3 replies
  • 0 kudos

Connection Refused: [Databricks][JDBC](11640) Required Connection Key(s): PWD;

Hey I'm trying to connect to Databricks using client id and secrets. I'm using JDBC 2.6.38.I'm using the following connection url: jdbc:databricks://<server-hostname>:443;httpPath=<http-path>;AuthMech=11;Auth_Flow=1;OAuth2ClientId=<service-principal-...

  • 2068 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @wendyl ,Could you give as an answer for the following questions? - does your workspace have private link ?- do you use  Microsoft Entra ID managed service principal ?- if you used Entra ID managed SP, did you use secret from Entra ID, or Azure Da...

  • 0 kudos
2 More Replies
Himanshu4
by New Contributor II
  • 3409 Views
  • 5 replies
  • 2 kudos

Inquiry Regarding Enabling Unity Catalog in Databricks Cluster Configuration via API

Dear Databricks Community,I hope this message finds you well. I am currently working on automating cluster configuration updates in Databricks using the API. As part of this automation, I am looking to ensure that the Unity Catalog is enabled within ...

  • 3409 Views
  • 5 replies
  • 2 kudos
Latest Reply
Himanshu4
New Contributor II
  • 2 kudos

Hi RaphaelCan we fetch job details from one workspace and create new job in new workspace with the same "job id" and configuration?

  • 2 kudos
4 More Replies
mayur_05
by New Contributor II
  • 1882 Views
  • 3 replies
  • 0 kudos

access cluster executor logs

Hi Team,I want to get realtime log for cluster executor and driver stderr/stdout log while performing data operations and save those log in catalog's volume

  • 1882 Views
  • 3 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

you can create it for Job Clusters compute too. The specific cluster log folder will be under /dbfs/cluster-logs (or whatever you change it to)    

  • 0 kudos
2 More Replies
TheManOfSteele
by New Contributor III
  • 3233 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks-connect Configure a connection to serverless compute Not working

Following these instructions, at https://docs.databricks.com/en/dev-tools/databricks-connect/python/install.html#configure-a-connection-to-serverless-compute There seems to be an issue with the example code.from databricks.connect import DatabricksSe...

  • 3233 Views
  • 2 replies
  • 0 kudos
Latest Reply
TheManOfSteele
New Contributor III
  • 0 kudos

Worked! Thank you!

  • 0 kudos
1 More Replies
Dave_Nithio
by Contributor II
  • 1538 Views
  • 1 replies
  • 0 kudos

Delta Table Log History not Updating

I am running into an issue related to my Delta Log and an old version. I currently have default delta settings for delta.checkpointInterval (10 commits as this table was created prior to DBR 11.1), delta.deletedFileRetentionDuration (7 days), and del...

Dave_Nithio_4-1726759906146.png Dave_Nithio_2-1726759822867.png Dave_Nithio_1-1726759722776.png Dave_Nithio_5-1726760080078.png
  • 1538 Views
  • 1 replies
  • 0 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 0 kudos

@Dave_Nithio wrote:I am running into an issue related to my Delta Log and an old version. I currently have default delta settings for delta.checkpointInterval (10 commits as this table was created prior to DBR 11.1), delta.deletedFileRetentionDuratio...

  • 0 kudos
hpant
by New Contributor III
  • 1399 Views
  • 1 replies
  • 0 kudos

" ResourceNotFound" error is coming on connecting devops repo to databricks workflow(job).

I have a .py file in a repo in azure devops,I want to add it in a workflow in databricks and these are the values I have provided. And the source is this:I have provided all the values correctly but getting this error: " ResourceNotFound". Can someon...

hpant_0-1725539147316.png hpant_2-1725539295054.png hpant_3-1725539358879.png
  • 1399 Views
  • 1 replies
  • 0 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 0 kudos

Can you try cloning the DevOps repo as a Git folder? The git folder clone interface should ask you to set up a Git credential if it's not already there.

  • 0 kudos
drii_cavalcanti
by New Contributor III
  • 4375 Views
  • 3 replies
  • 0 kudos

DBUtils commands do not work on shared access mode clusters

Hi there,I am trying to upload a file to an s3 bucket. However, none of dbutils commands seem to work neither does the boto3 library. For clusters that have the configuration, except for the shared access mode, seem to work fine.Those are the error m...

  • 4375 Views
  • 3 replies
  • 0 kudos
Latest Reply
mvdilts1
New Contributor II
  • 0 kudos

I am encountering very similar behavior to drii_cavalcanti.  When I use a Shared cluster with an IAM Role specified I can verify that the aws cli is installed but when I run aws sts get-caller-identity I receive the error "Unable to locate credential...

  • 0 kudos
2 More Replies
ziafazal
by New Contributor II
  • 1767 Views
  • 3 replies
  • 0 kudos

How to stop a continuous pipeline which is set to RETRY on FAILURE and failing for some reason

I have created a pipeline which is continuous and set to RETRY on FAILURE. For some reason it keeps failing and retrying. Is there any way I can stop it. Hitting Stop button throws an error.

  • 1767 Views
  • 3 replies
  • 0 kudos
Latest Reply
ziafazal
New Contributor II
  • 0 kudos

Hi @szymon_dybczak I already tried to remove it via REST API but got same error as in the pipeline logs. Eventually, I had to remove workspace to get rid of it.

  • 0 kudos
2 More Replies
jen-metaplane
by New Contributor II
  • 2012 Views
  • 4 replies
  • 1 kudos

How to get catalog and schema from system query table

Hi,We are querying the system.query table to parse query history. If the table in the query is not fully qualified with its catalog and schema, how can we derive the catalog and schema?Thanks,Jen

  • 2012 Views
  • 4 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

There is no straightforward method to get this data. Run the query to check the defaults:SELECT current_catalog() AS default_catalog, current_schema() AS default_schema;Catalog and schema may be changed in the query, so if you have query text you...

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels