cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

boskicl
by New Contributor III
  • 36639 Views
  • 8 replies
  • 11 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 36639 Views
  • 8 replies
  • 11 kudos
Latest Reply
nvashisth
New Contributor III
  • 11 kudos

@timo199 , @boskicl I had similar issue and job was getting stuck at Filtering Files for Query indefinitely. I checked SPARK logs and based on that figured out that we had enabled PHOTON acceleration on our cluster for job and datatype of our columns...

  • 11 kudos
7 More Replies
Eyespoop
by New Contributor II
  • 27844 Views
  • 4 replies
  • 4 kudos

Resolved! PySpark: Writing Parquet Files to the Azure Blob Storage Container

Currently I am having some issues with the writing of the parquet file in the Storage Container. I do have the codes running but whenever the dataframe writer puts the parquet to the blob storage instead of the parquet file type, it is created as a f...

image image(1) image(2)
  • 27844 Views
  • 4 replies
  • 4 kudos
Latest Reply
amarv
New Contributor II
  • 4 kudos

This is my approach:from databricks.sdk.runtime import dbutils from pyspark.sql.types import DataFrame output_base_url = "abfss://..." def write_single_parquet_file(df: DataFrame, filename: str): print(f"Writing '{filename}.parquet' to ABFS") ...

  • 4 kudos
3 More Replies
Anotech
by New Contributor II
  • 10979 Views
  • 3 replies
  • 1 kudos

How can I fix this error. ExecutionError: An error occurred while calling o392.mount: java.lang.NullPointerException

Hello, I'm trying to mount my Databricks to my Azure gen 2 data lake to read in data from the container, but I get an error when executing this line of code: dbutils.fs.mount( source = "abfss://resumes@choisysresume.dfs.core.windows.net/", mount_poin...

  • 10979 Views
  • 3 replies
  • 1 kudos
Latest Reply
Nikhill
New Contributor II
  • 1 kudos

I was using databricks scopes, to get the key which was used in the the config. I received a similar mount error while mounting with "wasbs" driver,  "ExecutionError: An error occurred while calling o427.mount.", this was the issue because the scope ...

  • 1 kudos
2 More Replies
venkad
by Contributor
  • 13116 Views
  • 5 replies
  • 7 kudos

Passing proxy configurations with databricks-sql-connector python?

Hi,I am trying to connect to databricks workspace which has IP Access restriction enabled using databricks-sql-connector. Only my Proxy server IPs are added in the allow list.from databricks import sql   connection = sql.connect( server_hostname ='...

  • 13116 Views
  • 5 replies
  • 7 kudos
Latest Reply
ss2025
New Contributor II
  • 7 kudos

Is there any resolution for the above setting up proxy with databricks sql connector

  • 7 kudos
4 More Replies
Arby
by New Contributor II
  • 16664 Views
  • 5 replies
  • 0 kudos

Help With OSError: [Errno 95] Operation not supported: '/Workspace/Repos/Connectors....

Hello,I am experiencing issues with importing from utils repo the schema file I created.this is the logic we use for all ingestion and all other schemas live in this repo utills/schemasI am unable to access the file I created for a new ingestion pipe...

icon
  • 16664 Views
  • 5 replies
  • 0 kudos
Latest Reply
HarikaM
New Contributor II
  • 0 kudos

@ArbMake sure the below is a file with extension .py and not a notebook. That should resolve the issue./Workspace/Repos/Connectors/Dev/utils/schemas/Comptroller.py'

  • 0 kudos
4 More Replies
PunithRaj
by New Contributor
  • 6419 Views
  • 2 replies
  • 2 kudos

How to read a PDF file from Azure Datalake blob storage to Databricks

I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...

  • 6419 Views
  • 2 replies
  • 2 kudos
Latest Reply
Mykola_Melnyk
New Contributor III
  • 2 kudos

@PunithRaj You can try to use  PDF DataSource for Apache Spark for read pdf files directly to the DataFrame. So you will have extracted text and rendered page as image in output. More details here: https://stabrise.com/spark-pdf/df = spark.read.forma...

  • 2 kudos
1 More Replies
verargulla
by New Contributor III
  • 15036 Views
  • 5 replies
  • 4 kudos

Azure Databricks: Error Creating Cluster

We have provisioned a new workspace in Azure using our own VNet. Upon creating the first cluster, I encounter this error:Control Plane Request Failure: Failed to get instance bootstrap steps from the Databricks Control Plane. Please check that instan...

  • 15036 Views
  • 5 replies
  • 4 kudos
Latest Reply
Mohamednazeer
New Contributor III
  • 4 kudos

We are also facing the same issue.

  • 4 kudos
4 More Replies
Rita
by New Contributor III
  • 10363 Views
  • 7 replies
  • 6 kudos

How to connect Cognos 11.1.7 to Azure Databricks

We are trying to connect Cognos 11.1.7 to Azure Databricks, but no success.Can you please help or guide us how to connect Cognos 11.1.7 to Azure Databricks.This is very critical to our user community. Can you please help or guide us how to connect Co...

  • 10363 Views
  • 7 replies
  • 6 kudos
Latest Reply
Hans2
New Contributor II
  • 6 kudos

Have anyone got the Simba JDBC driver going with CA 11.1.7? The ODBC driver works fine but i  can't get the JDBC running.Regd's

  • 6 kudos
6 More Replies
krishnachaitany
by New Contributor II
  • 6023 Views
  • 3 replies
  • 4 kudos

Resolved! Spot instance in Azure Databricks

When I run a job enabling using spot instances , I would like to know how many number of workers are using spot and how many number of workers are using on demand instances for a given job run In order to identify the spot instances we got for any...

  • 6023 Views
  • 3 replies
  • 4 kudos
Latest Reply
drumcircle
New Contributor II
  • 4 kudos

This remains a challenge using system tables.

  • 4 kudos
2 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 8223 Views
  • 5 replies
  • 5 kudos

Support of running multiple cells at a time in databricks notebook Hi all,Now databricks notebook supports parallel run of commands in a single notebo...

Support of running multiple cells at a time in databricks notebookHi all,Now databricks notebook supports parallel run of commands in a single notebook that will help run ad hoc queries simultaneously without creating a separate notebook.Once you run...

image.png image
  • 8223 Views
  • 5 replies
  • 5 kudos
Latest Reply
SunilUIIT
New Contributor II
  • 5 kudos

Hi Team,I am observing that the functionality is not working as expected in the Trial workspace of Databricks. Is there a setting that needs to be enabled to allow independent SQL cells in a Databricks notebook to run in parallel, while dependent cel...

  • 5 kudos
4 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 25967 Views
  • 4 replies
  • 26 kudos

How to connect your Azure Data Lake Storage to Azure DatabricksStandard Workspace �� Private link In your storage accounts please go to “Networ...

How to connect your Azure Data Lake Storage to Azure DatabricksStandard Workspace Private linkIn your storage accounts please go to “Networking” -> “Private endpoint connections” and click Add Private Endpoint.It is important to add private links in ...

image.png image.png image.png image.png
  • 25967 Views
  • 4 replies
  • 26 kudos
Latest Reply
dollyb
Contributor II
  • 26 kudos

This should be updated for Unity Catalog workspaces. 

  • 26 kudos
3 More Replies
LightUp
by New Contributor III
  • 10333 Views
  • 3 replies
  • 4 kudos

Converting SQL Code to SQL Databricks

I am new to Databricks. Please excuse my ignorance. My requirement is to convert the SQL query below into Databricks SQL. The query comes from EventLog table and the output of the query goes into EventSummaryThese queries can be found hereCREATE TABL...

image
  • 10333 Views
  • 3 replies
  • 4 kudos
Latest Reply
thelogicplus
Contributor
  • 4 kudos

you may explore the tool and services from Travinto Technologies . They have very good tools. We had explored their tool for our code coversion from  Informatica, Datastage and abi initio to DATABRICKS , pyspark. Also we used for SQL queries, stored ...

  • 4 kudos
2 More Replies
sreedata
by New Contributor III
  • 4244 Views
  • 4 replies
  • 5 kudos

Resolved! Databricks -->Workflows-->Job Runs

In Databricks -->Workflows-->Job Runs we have a column "Run As".From where does this value come. We are getting a user id here but need to change it to a generic account. Any help would be appreciated. Thanks

  • 4244 Views
  • 4 replies
  • 5 kudos
Latest Reply
Leon_K
New Contributor II
  • 5 kudos

I'm surprised why there no options to select "Run as" as something like "system user". Why all this complication with Service Principal? Where to report this ?@DataBricks  

  • 5 kudos
3 More Replies
Jyo777
by Contributor
  • 8223 Views
  • 7 replies
  • 4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

  • 8223 Views
  • 7 replies
  • 4 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 4 kudos

Not a comparison, but there is a DB-SQL cheatsheet at https://www.databricks.com/sites/default/files/2023-09/databricks-sql-cheatsheet.pdf/

  • 4 kudos
6 More Replies
Data_Analytics1
by Contributor III
  • 35858 Views
  • 10 replies
  • 10 kudos

Failure starting repl. How to resolve this error? I got this error in a job which is running.

Failure starting repl. Try detaching and re-attaching the notebook.java.lang.Exception: Python repl did not start in 30 seconds. at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1442) at com.databricks.b...

  • 35858 Views
  • 10 replies
  • 10 kudos
Latest Reply
PabloCSD
Valued Contributor II
  • 10 kudos

I have had this problem many times, today I made a copy of the cluster and it got "de-saturated", it could help someone in the future

  • 10 kudos
9 More Replies
Labels