Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
The classic solution is to copy data from FTP to ADLS storage using Azure Data Factory, and after the copy is done in the ADF pipeline, trigger the databricks notebook.
Hi, I did the following code but it seems like the cluster is running for a long period of time and then stops without any results. Attached my following code: (I used 'com.springml.spark.sftp' library and install it as Maven)Also i whitelisted my lo...
Hello,Is it possible to use a SFTP location to load from for structured streaming.At the moment we are going from SFTP->S3->databricks via structured streaming. I would like to cut out the S3 part.CheersChris
Hi @Chris Lant Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.
In a databricks cluster with Scala 2.1.1 I am trying to read a file into a spark data frame using the following code.val df = spark.read
.format("com.springml.spark.sftp")
.option("host", "*")
.option("username", "*")
.option("password", "*")...
Hi @Andreas P Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
When I try to convert a notebook into a job I frequently run into an issue with writing to the local filesystem. For this particular example, I did all my notebook testing with a bytestream for small files. When I tried to run as a job, I used the me...
I was able to fix it. It was an issue with the nested files on the SFTP. I had to ensure that the parent folders were being created as well. Splitting out the local path and file made it easier to ensure that it existed with os.path.exists() and os.m...