cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anmol_Chauhan
by New Contributor II
  • 2329 Views
  • 4 replies
  • 1 kudos

How to use Widgets with SQL Endpoint in Databricks?

I' trying to use widgets with SQL Endpoints but I'm encountering an error, whereas they work seamlessly with Databricks Interactive Cluster. While query parameters can substitute widgets in SQL endpoints, but I specifically require dropdown and multi...

  • 2329 Views
  • 4 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Got it, let me test, i think there is no specific way to do it, but if you add the option All as the first on the list it should select it

  • 1 kudos
3 More Replies
lukamu
by New Contributor II
  • 1978 Views
  • 3 replies
  • 1 kudos

Resolved! Issue with filter_by in Databricks SQL Query History API (/api/2.0/sql/history/queries)

Hi everyone,I'm trying to use the filter_by parameter in a GET request to /api/2.0/sql/history/queries, but I keep getting a 400 Bad Request error. When I use max_results, it works fine, but adding filter_by causes the request to fail.Example value f...

  • 1978 Views
  • 3 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @lukamu, glad that it worked for you!

  • 1 kudos
2 More Replies
GinoBarkley
by New Contributor II
  • 1538 Views
  • 1 replies
  • 1 kudos

Resolved! Extracting data from GCP BigQuery using Foreign Catalog

On Databricks, I have created a connection type Google Query and tested the connection successfully. I have then created a foreign catalog from the connection to a Google BigQuery project. I can see all the data sets and tables in the Foreign Catalog...

Data Engineering
bigquery
Databricks
location US
  • 1538 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @GinoBarkley, Could you please advise which DBR version are you using in your personal access mode? I see this requirments: Databricks clusters must use Databricks Runtime 16.1 or above and shared or single user access mode. SQL warehouses must ...

  • 1 kudos
hari-prasad
by Valued Contributor II
  • 5203 Views
  • 6 replies
  • 2 kudos

'from_json' spark function not parsing value column from Confluent Kafka topic

For one of badge completion, it was mandatory to complete a Spark Streaming Demo Practice.Due to the absence of a Kafka broker setup required for the demo practice, I configured a Confluent Kafka cluster and made several modifications to the Spark sc...

hariprasad_0-1737534905518.png hariprasad_1-1737534936401.png hariprasad_0-1737533122673.png hariprasad_2-1737534973740.png
  • 5203 Views
  • 6 replies
  • 2 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 2 kudos

I am not sure if I read the full explanation but how about this :     df     .withColumn('value_str', F.decode(F.col('value'), 'utf-8'))    .withColumn('value_json', F.explode(F.from_json(F.col('value_str'),   json_schema)))    .select('value_json.*'...

  • 2 kudos
5 More Replies
Balram-snaplogi
by New Contributor II
  • 2273 Views
  • 2 replies
  • 0 kudos

Not able to Run jobs using M2M authentication form our code

Hi,I am using OAuth machine-to-machine (M2M) authentication with the JDBC approach.String url = "jdbc:databricks://<server-hostname>:443";Properties p = new java.util.Properties();p.put("httpPath", "<http-path>");p.put("AuthMech", "11");p.put("Auth_F...

  • 2273 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Balram-snaplogi ,It looks like permission problem. Could you check if the service principal has the necessary permissions to execute jobs?In Databricks, permissions for jobs can be managed to control access. The following permissions are availabl...

  • 0 kudos
1 More Replies
hedbergjacob
by New Contributor II
  • 2995 Views
  • 2 replies
  • 0 kudos

Resolved! Delta Live Table "Default Schema" mandatory but not editable

Hi,We have an issue with a DLT pipeline. We want to add some source code to an existing pipeline. However, when we save, error message shows that "Default schema" is a mandatory field. However, we are not able to edit the field. The DLT pipeline does...

Data Engineering
deltalivetables
  • 2995 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Does the pipeline settings JSON includes the "schema" field. If you have full admin rights, you can update the existing pipeline settings to include the "schema" field. Like  curl -X PATCH https://<databricks-instance>/api/2.0/pipelines/<pipeline-id>...

  • 0 kudos
1 More Replies
simha08
by New Contributor II
  • 1369 Views
  • 2 replies
  • 0 kudos

Unable to Read Collection/Files from MongoDB using Azure Databricks

Hi there,Can someone help to read data from MongoDB using Azure Databricks? Surprisingly, I am able to connect from Jupyter Notebook and read data, but not from the Azure Databricks.1) I have install the required spark-connector packages in the clust...

  • 1369 Views
  • 2 replies
  • 0 kudos
Latest Reply
simha08
New Contributor II
  • 0 kudos

I am using following code to read the data from mongoDB using Databricksfrom pyspark.sql import SparkSessionspark = SparkSession \.builder \.appName("myApp") \.config("spark.mongodb.connection.uri", "mongodb+srv://username:password@cluster.xxxx.mongo...

  • 0 kudos
1 More Replies
RamanBajetha
by New Contributor II
  • 1262 Views
  • 2 replies
  • 1 kudos

Issue with Generic DLT Pipeline Handling Multiple BUs

We are implementing a data ingestion framework where data flows from a foreign catalog (source) to a raw layer (Delta tables) and then to a bronze layer (DLT streaming tables). Currently, each Business Unit (BU) has a separate workflow and DLT pipeli...

  • 1262 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

You can create separate schemas within the same catalog for each BU. For example, you can have schemas like BU1_schema, BU2_schema, etc., within the same catalog. By using Unity Catalog, you can segregate BU-specific tables within the same DLT pipeli...

  • 1 kudos
1 More Replies
desertstorm
by New Contributor II
  • 8126 Views
  • 8 replies
  • 0 kudos

Driver Crash on processing large dataframe

I have a dataframe with abt 2 million text rows (1gb). I partition it into about 700 parititons as thats the no of cores available on my cluster exceutors. I run the transformations extracting medical information and then write the results in parquet...

  • 8126 Views
  • 8 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @Svish ,Your problem is probably caused by using Pandas. Pandas loads all the data into the driver memory, which is likely why you are experiencing issues. If you can modify your code to use Spark instead, you will probably avoid this problem.How...

  • 0 kudos
7 More Replies
developer321
by New Contributor II
  • 1020 Views
  • 2 replies
  • 0 kudos

getting "NoSuchMethodError" while using tsl 15.4 and spark 3.5

hi, i am using data bricks version 15.4 and spark 3.5 and getting "NoSuchMethodError" and all the resources i found only solution is to downgrade spark and data bricks version. is there any solution apart from this as i cant do this in my case. regar...

  • 1020 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @developer321, Are you upgrading any of the default libraries of DBR 15.4 LTS? please share more details on your use-case and commands / settings being used.

  • 0 kudos
1 More Replies
cool_cool_cool
by New Contributor II
  • 1828 Views
  • 1 replies
  • 0 kudos

Job Stuck with single user access mode

Heya So I'm working on a new workflow. I've started by writing a notebook and running it on an interactive cluster with "Single User" access mode, and everything worked fine.I created a workflow for this task with the same interactive cluster, and ev...

  • 1828 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey!You cannot access an Instance Profile (IAM Role) in “Shared” mode, so discard this option if your job relies on AWS credentials via an instance profile. If your workflow depends on accessing S3 or other AWS resources using an IAM Role, you must u...

  • 0 kudos
TomBrick
by New Contributor II
  • 2950 Views
  • 4 replies
  • 1 kudos

Linux ODBC driver Unknown error

Hi,I'm trying to debug an issue connecting to Azure Databricks from a CentOS 7 machine. Testing on my own machine only required unixODBC, the databricks-odbc driver and the connection string which all worked fine. When I test from the CentOS 7 machin...

  • 2950 Views
  • 4 replies
  • 1 kudos
Latest Reply
Allia
Databricks Employee
  • 1 kudos

@TomBrick Can you use the latest ODBC driver. Below is the link to download it. https://www.databricks.com/spark/odbc-drivers-download Also, can you add the parameters below in the simba.sparkodbc.ini file? This will give you more information about t...

  • 1 kudos
3 More Replies
Subbu_G
by Databricks Partner
  • 891 Views
  • 1 replies
  • 0 kudos

Streamsets to Databricks Integration failure

Hi Team,While trying to ingest data from ADLS gen 2 to Databricks through Streamsets. I am getting below errorConfiguration fs.azure.account.key.xxx.dfs.core.windows.net is not available.Able to make connection from Streamsets to databricks using sql...

  • 891 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @Subbu_G, Ensure that the Spark configuration for the ADLS Gen 2 account key is correctly set. The configuration should follow the format: spark.hadoop.fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <your-storage-account-key>...

  • 0 kudos
noorbasha534
by Valued Contributor II
  • 1583 Views
  • 1 replies
  • 0 kudos

Error handling - SQL states

Dear all,Few questions please - 1. Has anyone successfully used the below way of dealing with error handling in PySpark (example: that contains data frames) as well as SQL code based notebooks - from pyspark.errors import PySparkException try: spa...

  • 1583 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @noorbasha534,   The approach you mentioned for error handling in PySpark using PySparkException is a valid method. It allows you to catch specific exceptions related to PySpark operations and handle them accordingly. Logging errors into tables ...

  • 0 kudos
subhas_hati
by New Contributor
  • 2585 Views
  • 1 replies
  • 0 kudos

Distinguishing stream workload from batch work load

Is it possible the same data source of batch data as well as stream data. Please find the following code that I have got from internet. The following code handles both stream and batch workload. Please find attached the corresponding pdf file. I am f...

  • 2585 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @subhas_hati, Thanks for your question: Batch Workload: The availableNow trigger is used for batch processing. When you set the trigger to availableNow, it processes all available data as a single batch and then stops. This is useful for scenarios...

  • 0 kudos
Labels