cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bilel
by New Contributor II
  • 1709 Views
  • 1 replies
  • 2 kudos

Python library not installed when compute is resized

 Hi,I have a python notebook workflow that uses a job cluster. The cluster lost at least a node (due to Spot Instance Termination) and did an upsize. After that I got an error in my job "Module not found", but the python module was being used before ...

  • 1709 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi @Bilel,How are you doing today?As per my understanding, Consider installing the library at the cluster level to ensure it's automatically applied across all nodes when a new one is added. You could also try using init scripts to guarantee the requ...

  • 2 kudos
fperry
by New Contributor III
  • 993 Views
  • 1 replies
  • 0 kudos

Question about stateful processing

I'm experiencing an issue that I don't understand. I am using Python's arbitrary stateful processing with structured streaming to calculate metrics for each item/ID. A timeout is set, after which I clear the state for that item/ID and display each ID...

  • 993 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi @fperry,How are you doing today?As per my understanding, Consider checking for any differences in how the stateful streaming function is writing and persisting data. It's possible that while the state is cleared after the timeout, some state might...

  • 0 kudos
gabrieleladd
by New Contributor II
  • 3656 Views
  • 3 replies
  • 1 kudos

Clearing data stored by pipelines

Hi everyone! I'm new to Databricks and moving my first steps with Delta Live Tables, so please forgive my inexperience. I'm building my first DLT pipeline and there's something that I can't really grasp: how to clear all the objects generated or upda...

Data Engineering
Data Pipelines
Delta Live Tables
  • 3656 Views
  • 3 replies
  • 1 kudos
Latest Reply
ChKing
New Contributor II
  • 1 kudos

To clear all objects generated or updated by the DLT pipeline, you can drop the tables manually using the DROP command as you've mentioned. However, to get a completely clean slate, including metadata like the tracking of already processed files in t...

  • 1 kudos
2 More Replies
aniruth1000
by New Contributor II
  • 4590 Views
  • 3 replies
  • 2 kudos

Resolved! Delta Live Tables - CDC - Batching - Delta Tables

Hey Folks, I'm trying to implement CDC - Apply changes from one delta table to another. Source is  a delta table named table_latest and target is another delta table named table_old. Both are delta tables in databricks. Im trying to cascade the incre...

  • 4590 Views
  • 3 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @aniruth1000 ,When using delta live table pipelines, only the source table can be the delta table.The target table must be fully managed by the DLT pipeline, including its creation and lifecycle.Let's say that you modified the code as suggested by...

  • 2 kudos
2 More Replies
vishwanath_1
by New Contributor III
  • 4735 Views
  • 4 replies
  • 1 kudos

i am reading a 130gb csv file with multi line true it is taking 4 hours just to read

reading 130gb file  without  multi line true it is 6 minutes my file has data in multi liner .How to speed up the reading time here .. i am using below commandInputDF=spark.read.option("delimiter","^").option("header",false).option("encoding","UTF-8"...

  • 4735 Views
  • 4 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Hi @vishwanath_1 , Can you try setting the below config if this resolves the issue? set spark.databricks.sql.csv.edgeParserSplittable=true;

  • 1 kudos
3 More Replies
vishu4rall
by New Contributor II
  • 1598 Views
  • 4 replies
  • 0 kudos

copy files from azure file share to s3 bucket

kindly help us with code to upload a text/csv file from Azure file share to s3 bucket

  • 1598 Views
  • 4 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Did you try using azcopy?  https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10?tabs=dnf

  • 0 kudos
3 More Replies
lprevost
by Contributor II
  • 2349 Views
  • 5 replies
  • 0 kudos

Large/complex Incremental Autoloader Job -- Seeking Experience on approach

I'm experimenting with several approaches to implement an incremental autoloader query either in DLT or in a pipeline job.   The complexities:- Moving approximately 30B records from a nasty set of nested folders on S3 in several thousand csv files.  ...

  • 2349 Views
  • 5 replies
  • 0 kudos
Latest Reply
lprevost
Contributor II
  • 0 kudos

Crickets....

  • 0 kudos
4 More Replies
lprevost
by Contributor II
  • 692 Views
  • 1 replies
  • 0 kudos

Using GraphFrames on DLT job

I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.   I am using it successfully in my job compute instances.  Here are my overrides for the standard job compute policy: {"spark_version": {"type": "unlimited","defau...

  • 692 Views
  • 1 replies
  • 0 kudos
Latest Reply
lprevost
Contributor II
  • 0 kudos

Crickets ....

  • 0 kudos
lprevost
by Contributor II
  • 1196 Views
  • 2 replies
  • 0 kudos

GraphFrames and DLT

I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.   I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job.  Here are my overrides for the standard job c...

  • 1196 Views
  • 2 replies
  • 0 kudos
Latest Reply
lprevost
Contributor II
  • 0 kudos

Crickets .....

  • 0 kudos
1 More Replies
Valentin1
by New Contributor III
  • 12306 Views
  • 6 replies
  • 3 kudos

Delta Live Tables Incremental Batch Loads & Failure Recovery

Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...

  • 12306 Views
  • 6 replies
  • 3 kudos
Latest Reply
lprevost
Contributor II
  • 3 kudos

I totally agree that this is a gap in the Databricks solution.  This gap exists between a static read and real time streaming.   My problem (and suspect there are many use cases) is that I have slowly changing data coming into structured folders via ...

  • 3 kudos
5 More Replies
Octavian1
by Contributor
  • 2656 Views
  • 2 replies
  • 1 kudos

Path of artifacts not found error in pyfunc.load_model using pyfunc wrapper

Hi,For a PySpark model, which involves also a pipeline, and that I want to register with mlflow, I am using a pyfunc wrapper.Steps I followed:1. Pipeline and model serialization and logging (using Volume locally, the logging will be performed in dbfs...

  • 2656 Views
  • 2 replies
  • 1 kudos
Latest Reply
pikapika
New Contributor II
  • 1 kudos

Stuck with the same issue however I managed to load it ( was looking to serve it using model serving as well ),One thing I noticed is that we can use mlflow.create_experiment() in the beginning and specify the default artifact location parameter as D...

  • 1 kudos
1 More Replies
KristiLogos
by Contributor
  • 4064 Views
  • 9 replies
  • 4 kudos

Resolved! Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n' in schema

I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:df = spark.read.opti...

  • 4064 Views
  • 9 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @KristiLogos ,Check if your JSON doesn't have characters contained in error message in it's key values. 

  • 4 kudos
8 More Replies
wendyl
by New Contributor II
  • 2182 Views
  • 3 replies
  • 0 kudos

Connection Refused: [Databricks][JDBC](11640) Required Connection Key(s): PWD;

Hey I'm trying to connect to Databricks using client id and secrets. I'm using JDBC 2.6.38.I'm using the following connection url: jdbc:databricks://<server-hostname>:443;httpPath=<http-path>;AuthMech=11;Auth_Flow=1;OAuth2ClientId=<service-principal-...

  • 2182 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @wendyl ,Could you give as an answer for the following questions? - does your workspace have private link ?- do you use  Microsoft Entra ID managed service principal ?- if you used Entra ID managed SP, did you use secret from Entra ID, or Azure Da...

  • 0 kudos
2 More Replies
Himanshu4
by New Contributor II
  • 3546 Views
  • 5 replies
  • 2 kudos

Inquiry Regarding Enabling Unity Catalog in Databricks Cluster Configuration via API

Dear Databricks Community,I hope this message finds you well. I am currently working on automating cluster configuration updates in Databricks using the API. As part of this automation, I am looking to ensure that the Unity Catalog is enabled within ...

  • 3546 Views
  • 5 replies
  • 2 kudos
Latest Reply
Himanshu4
New Contributor II
  • 2 kudos

Hi RaphaelCan we fetch job details from one workspace and create new job in new workspace with the same "job id" and configuration?

  • 2 kudos
4 More Replies
mayur_05
by New Contributor II
  • 2036 Views
  • 3 replies
  • 0 kudos

access cluster executor logs

Hi Team,I want to get realtime log for cluster executor and driver stderr/stdout log while performing data operations and save those log in catalog's volume

  • 2036 Views
  • 3 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

you can create it for Job Clusters compute too. The specific cluster log folder will be under /dbfs/cluster-logs (or whatever you change it to)    

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels