<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Access to Databricks Volumes via Databricks Connect not working anymore in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135920#M50464</link>
    <description>&lt;P&gt;Hi mmayorga, thanks for the answer, but as I describe above, this was working before and I am looking for a specific reason on why this isn't working anymore.&lt;/P&gt;&lt;P&gt;Something must've changed, and since it isn't working for you anymore, too; it probably is something from Databricks side.&lt;/P&gt;</description>
    <pubDate>Fri, 24 Oct 2025 06:06:46 GMT</pubDate>
    <dc:creator>JuliandaCruz</dc:creator>
    <dc:date>2025-10-24T06:06:46Z</dc:date>
    <item>
      <title>Access to Databricks Volumes via Databricks Connect not working anymore</title>
      <link>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135806#M50435</link>
      <description>&lt;P&gt;Hi all,&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;I use the extension to debug my python code regularly and since yesterday accessing files in the Databricks Volume isn't working anymore.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;The situation in the UI of Databricks is as follows:&lt;/DIV&gt;&lt;DIV class=""&gt;When I execute a glob statement to list all zip-files in my Volume, it returns a list of all zip files:&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;But when I execute the same code in my VSCode environment with "Debug current File with Databricks Connect", it returns an empty list.&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;The same code was still working the week before.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Cluster Information:&lt;/DIV&gt;&lt;DIV class=""&gt;Runtime 14.3 with Python 3.10 (Because I have Python 3.10 installed on my local machine)&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Databricks Connect: Version 2.10.3&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;I upgraded VS Code this week, but I already rolled back to the former version, this changed nothing.&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 23 Oct 2025 09:06:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135806#M50435</guid>
      <dc:creator>JuliandaCruz</dc:creator>
      <dc:date>2025-10-23T09:06:32Z</dc:date>
    </item>
    <item>
      <title>Re: Access to Databricks Volumes via Databricks Connect not working anymore</title>
      <link>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135907#M50458</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;/SPAN&gt;&lt;A style="background-color: #ffffff;" target="_blank" rel="noopener"&gt;@JuliandaCruz&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;- Instead of using Python, I suggest using Spark from DBConnect to achieve the same result.&lt;/SPAN&gt;&amp;nbsp;Code given below.&lt;/P&gt;
&lt;P&gt;Once you save the Python file (I saved it as list_and_upload_files.py), use a .env file to put the variables.&lt;/P&gt;
&lt;P&gt;.env file content -&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;LI-CODE lang="markup"&gt;DATABRICKS_PROFILE=&amp;lt;&amp;lt;...&amp;gt;&amp;gt;
VOLUMES_FOLDER=/Volumes/catalog/schema/volume/folder/
UPLOAD_DESTINATION=/Volumes/catalog/schema/volume/uploads/&lt;/LI-CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Finally, use the command below to run it -&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;python list_and_upload_files.py --action upload --files 1.pdf&lt;/LI-CODE&gt;&lt;LI-CODE lang="python"&gt;from databricks.connect import DatabricksSession
from databricks.sdk import WorkspaceClient
from dotenv import load_dotenv
import os
import shutil
import argparse
import io


class DatabricksFileManager:
    """
    A class to manage file operations in Databricks Volumes.
    """
    
    def __init__(self, profile_name):
        """
        Initialize the DatabricksFileManager with a profile.
        
        Args:
            profile_name: Name of the Databricks profile to use
        """
        self.profile_name = profile_name
        self.spark = None
        self.workspace_client = None
        
    def connect(self):
        """
        Establish connection to Databricks using the configured profile.
        """
        print(f"Connecting to Databricks using profile: {self.profile_name}")
        self.spark = DatabricksSession.builder.profile("fielddemo").serverless().getOrCreate()
        self.workspace_client = WorkspaceClient(profile=self.profile_name)
        print("Connection established successfully!")
        
    def disconnect(self):
        """
        Close the Databricks connection.
        """
        if self.spark:
            self.spark.stop()
            print("Databricks connection closed.")
            
    def list_files(self, folder_path, file_extension=None):
        """
        List files in a Databricks Volumes folder.
        
        Args:
            folder_path: Path to the Volumes folder
            file_extension: Optional file extension to filter (e.g., '.pdf', '.csv')
            
        Returns:
            List of file paths
        """
        if not self.spark:
            raise Exception("Not connected to Databricks. Call connect() first.")
            
        # Use Spark SQL to list files in Databricks Volumes
        files_df = self.spark.sql(f"LIST '{folder_path}'")
        
        # Filter by file extension if provided
        if file_extension:
            if not file_extension.startswith('.'):
                file_extension = '.' + file_extension
            files_df = files_df.filter(files_df.name.endswith(file_extension))
        
        # Show the results
        print(f"Files found in {folder_path}:")
        files_df.select("path", "name", "size").show(truncate=False)
        
        # Get the paths as a list
        file_paths = [row.path for row in files_df.collect()]
        print(f"\nTotal files: {len(file_paths)}")
        
        return file_paths
    
    def list_pdf_files(self, folder_path):
        """
        List all PDF files in a Databricks Volumes folder.
        
        Args:
            folder_path: Path to the Volumes folder
            
        Returns:
            List of PDF file paths
        """
        return self.list_files(folder_path, file_extension='.pdf')
    
    def upload_file(self, local_file_path, destination_path):
        """
        Upload a single file to Databricks Volumes.
        
        Args:
            local_file_path: Path to the local file to upload
            destination_path: Destination path in Databricks Volumes (including filename)
            
        Returns:
            bool: True if upload successful, False otherwise
        """
        if not self.workspace_client:
            raise Exception("Not connected to Databricks. Call connect() first.")
        
        if not os.path.exists(local_file_path):
            print(f"Error: Local file not found: {local_file_path}")
            return False
        
        try:
            # Get file size for progress tracking
            file_size = os.path.getsize(local_file_path)
            file_size_mb = file_size / (1024 * 1024)
            
            # Upload using Databricks SDK Files API
            with open(local_file_path, 'rb') as f:
                file_content = f.read()
                # Wrap bytes in BytesIO to provide file-like interface
                file_obj = io.BytesIO(file_content)
                self.workspace_client.files.upload(
                    destination_path,
                    file_obj,
                    overwrite=True
                )
            
            print(f"✓ Successfully uploaded: {os.path.basename(local_file_path)} ({file_size_mb:.2f} MB)")
            return True
            
        except Exception as e:
            print(f"✗ Error uploading file: {str(e)}")
            return False
    
    def upload_files(self, local_files, destination_folder):
        """
        Upload multiple files to Databricks Volumes.
        
        Args:
            local_files: List of local file paths or a directory path
            destination_folder: Destination folder path in Databricks Volumes
            
        Returns:
            dict: Dictionary with 'success' and 'failed' lists of file paths
        """
        if not self.workspace_client:
            raise Exception("Not connected to Databricks. Call connect() first.")
        
        # Ensure destination folder ends with /
        if not destination_folder.endswith('/'):
            destination_folder += '/'
        
        # Handle if local_files is a directory
        if isinstance(local_files, str) and os.path.isdir(local_files):
            directory = local_files
            local_files = [
                os.path.join(directory, f) 
                for f in os.listdir(directory) 
                if os.path.isfile(os.path.join(directory, f))
            ]
        
        results = {'success': [], 'failed': []}
        
        print(f"\n📤 Uploading {len(local_files)} file(s) to {destination_folder}")
        print("-" * 60)
        
        for idx, local_file in enumerate(local_files, 1):
            filename = os.path.basename(local_file)
            destination_path = f"{destination_folder}{filename}"
            
            print(f"[{idx}/{len(local_files)}] {filename}...", end=" ")
            
            if self.upload_file(local_file, destination_path):
                results['success'].append(local_file)
            else:
                results['failed'].append(local_file)
        
        print("-" * 60)
        print(f"✅ Upload complete: {len(results['success'])} succeeded, {len(results['failed'])} failed")
        return results
    
    def __enter__(self):
        """Context manager entry."""
        self.connect()
        return self
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Context manager exit."""
        self.disconnect()


def main():
    # Parse command-line arguments
    parser = argparse.ArgumentParser(
        description="Databricks File Manager - List and upload files to Databricks Volumes",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # List PDF files
  python list_files.py --action list
  
  # List files with specific extension
  python list_files.py --action list --extension .csv
  
  # Upload a single file
  python list_files.py --action upload --files /path/to/file.pdf
  
  # Upload multiple files
  python list_files.py --action upload --files file1.pdf file2.pdf file3.pdf
  
  # Upload entire directory
  python list_files.py --action upload --files /path/to/directory
        """
    )
    
    parser.add_argument(
        '--action',
        choices=['list', 'upload'],
        default='list',
        help='Action to perform: list or upload files (default: list)'
    )
    
    parser.add_argument(
        '--files',
        nargs='+',
        help='File(s) or directory to upload (required for upload action)'
    )
    
    parser.add_argument(
        '--extension',
        default='.pdf',
        help='File extension to filter when listing (default: .pdf)'
    )
    
    parser.add_argument(
        '--destination',
        help='Override destination folder from .env (optional)'
    )
    
    args = parser.parse_args()
    
    # Load environment variables from .env file
    load_dotenv()
    
    # Get configuration from environment variables
    profile_name = os.getenv("DATABRICKS_PROFILE")
    folder = os.getenv("VOLUMES_FOLDER")
    upload_destination = args.destination or os.getenv("UPLOAD_DESTINATION")
    
    if not profile_name:
        print("Error: DATABRICKS_PROFILE not set in .env file")
        return
    
    print(f"Using profile: {profile_name}")
    print(f"Folder path: {folder}\n")
    
    # Use the class with context manager (automatically handles connect/disconnect)
    with DatabricksFileManager(profile_name=profile_name) as file_manager:
        
        if args.action == 'list':
            # ===== List files =====
            print("=" * 60)
            print(f"LISTING FILES (*{args.extension})")
            print("=" * 60)
            
            if args.extension == '.pdf':
                files = file_manager.list_pdf_files(folder)
            else:
                files = file_manager.list_files(folder, file_extension=args.extension)
            
            # Print the file paths
            print(f"\nFound {len(files)} file(s):")
            for path in files:
                print(f"  - {path}")
        
        elif args.action == 'upload':
            # ===== Upload files =====
            if not args.files:
                print("Error: --files argument is required for upload action")
                print("Usage: python list_files.py --action upload --files &amp;lt;file1&amp;gt; [file2 ...]")
                return
            
            if not upload_destination:
                print("Error: Upload destination not set. Use --destination or set UPLOAD_DESTINATION in .env")
                return
            
            print("=" * 60)
            print("UPLOADING FILES")
            print("=" * 60)
            
            # Check if it's a single directory or multiple files
            if len(args.files) == 1 and os.path.isdir(args.files[0]):
                # Upload directory
                results = file_manager.upload_files(args.files[0], upload_destination)
            else:
                # Upload specific files
                results = file_manager.upload_files(args.files, upload_destination)
            
            # Summary
            print(f"\n📊 Summary:")
            print(f"   ✅ Succeeded: {len(results['success'])} file(s)")
            print(f"   ❌ Failed: {len(results['failed'])} file(s)")
            
            if results['failed']:
                print("\n❌ Failed files:")
                for failed_file in results['failed']:
                    print(f"   - {failed_file}")


if __name__ == "__main__":
    main()

&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Oct 2025 00:26:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135907#M50458</guid>
      <dc:creator>dkushari</dc:creator>
      <dc:date>2025-10-24T00:26:50Z</dc:date>
    </item>
    <item>
      <title>Re: Access to Databricks Volumes via Databricks Connect not working anymore</title>
      <link>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135908#M50459</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193660"&gt;@JuliandaCruz&lt;/a&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for reaching out!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I was able to reproduce your case while using Databricks Connect. The "&lt;EM&gt;Upload and Run file&lt;/EM&gt;" option worked fine and returned results, which is essentially the same as running from the Databricks UI.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Though when running "&lt;EM&gt;Run current file with Databricks Connect&lt;/EM&gt;", the same code returned no values. This option leverages a mix between your local environment and the one in the cluster, hence the importance of matching Python versions; however, I was unable to obtain results using "glob"; I couldn't find a specific reason or limitation.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You can follow the approach suggested by &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/38309"&gt;@dkushari&lt;/a&gt;&amp;nbsp;, or a faster workaround would be to leverage:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- &lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-aux-list" target="_self"&gt;LIST command&lt;/A&gt; in SQL or &lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-utils#dbutils-fs-ls" target="_self"&gt;&lt;SPAN&gt;dbutils.fs.ls&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt; in Python, which will return: path, name, size, and modification time&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# Define Volume Path
volume_path =’'/Volumes/..."

# Get the Files from specified Volume
files = dbutils.fs.ls(volume_path)

# Parse and extract specific fields
# path, name, size, modification

# Print Results
print(files)&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;I hope this works!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Oct 2025 00:44:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135908#M50459</guid>
      <dc:creator>mmayorga</dc:creator>
      <dc:date>2025-10-24T00:44:08Z</dc:date>
    </item>
    <item>
      <title>Re: Access to Databricks Volumes via Databricks Connect not working anymore</title>
      <link>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135919#M50463</link>
      <description>&lt;P&gt;Hi dkushari,&lt;/P&gt;&lt;P&gt;thanks for the detailed answer, but I don't want to adapt my whole project to new code just because the connector isn't working anymore as intended.&lt;/P&gt;&lt;P&gt;My glob-example is just a short code snippet to describe the general problem.&lt;/P&gt;&lt;P&gt;Our project consists of over 10000 lines of code and when problems occur, I need to use the debugger from VSCode, which worked just fine until start of this week.&lt;/P&gt;&lt;P&gt;Do you have any idea why this isn't working anymore?&lt;/P&gt;</description>
      <pubDate>Fri, 24 Oct 2025 06:04:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135919#M50463</guid>
      <dc:creator>JuliandaCruz</dc:creator>
      <dc:date>2025-10-24T06:04:29Z</dc:date>
    </item>
    <item>
      <title>Re: Access to Databricks Volumes via Databricks Connect not working anymore</title>
      <link>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135920#M50464</link>
      <description>&lt;P&gt;Hi mmayorga, thanks for the answer, but as I describe above, this was working before and I am looking for a specific reason on why this isn't working anymore.&lt;/P&gt;&lt;P&gt;Something must've changed, and since it isn't working for you anymore, too; it probably is something from Databricks side.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Oct 2025 06:06:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/access-to-databricks-volumes-via-databricks-connect-not-working/m-p/135920#M50464</guid>
      <dc:creator>JuliandaCruz</dc:creator>
      <dc:date>2025-10-24T06:06:46Z</dc:date>
    </item>
  </channel>
</rss>

