cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SusuTheSeeker
by New Contributor III
  • 4835 Views
  • 7 replies
  • 3 kudos

Kernel switches to unknown using pyspark

I am working in jupyter hub in a notebook. I am using pyspark dataframe for analyzing text. More precisely I am doing sentimment analysis of newspaper articles. The code works until I get to some point where the kernel is busy and after approximately...

  • 4835 Views
  • 7 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

do you actually run the code on a distributed environment (meaning a driver and multiple workers)?If not, there is no use in using pyspark as all code will be executed locally.

  • 3 kudos
6 More Replies
auser85
by New Contributor III
  • 6648 Views
  • 1 replies
  • 3 kudos

Delta Table: Drop column failure

DBR 10.5, Spark 3.2.1```%sqlCREATE TABLE testing (    name string,    counter int    )     USING DELTA     OPTIONS (PATH "/mnt/general/testingtbl/")``````%sqlinsert into testing (name, counter) values ('a', 1)``````%sqlALTER TABLE testing SET TBLPROP...

  • 6648 Views
  • 1 replies
  • 3 kudos
Orianh
by Valued Contributor II
  • 6034 Views
  • 6 replies
  • 2 kudos

Resolved! Databrikcs job cli

Hey guys, I'm trying to create a job via databricks cli, This job is going to use a wheell file that I already upload to dbfs and exported from this package the entry point that needed for the job.In the UI I can see that the job has been created, Bu...

  • 6034 Views
  • 6 replies
  • 2 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 2 kudos

Hi @orian hindi​ , adding the wheel package in the "libraries" section of json file will always try to install the whl on a cluster level that requires manage access, irrespective of job cluster or an existing interactive cluster. You cannot achieve ...

  • 2 kudos
5 More Replies
vivek_sinha
by Contributor
  • 23919 Views
  • 3 replies
  • 4 kudos

Resolved! PySpark on Jupyterhub K8s || Unable to query data || Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Pyspark Version: 2.4.5 Hive Version: 1.2 Hadoop Version: 2.7 AWS-SDK Jar: 1.7.4 Hadoop-AWS: 2.7.3When I am trying to show data I am getting Class org.apache.hadoop.fs.s3a.S3AFileSystem not found while I am passing all the information which all are re...

  • 23919 Views
  • 3 replies
  • 4 kudos
Latest Reply
vivek_sinha
Contributor
  • 4 kudos

Hi @Arvind Ravish​ Thanks for the response and now I fixed the issue.The image which I was using to launch spark executor didn't have aws jars. After doing necessary changes it started working.But still may thanks for your response.

  • 4 kudos
2 More Replies
vivek_sinha
by Contributor
  • 8933 Views
  • 3 replies
  • 4 kudos

Resolved! Getting Authentication Error while accessing Azure Blob table (wasb) URL using PySpark

I am trying to access the Azure Blob table using Pyspark but getting an Authentication Error. Here I am passing SAS token (HTTP and HTTPS enabled) but it's working only with WASBS (HTTPS) URL, not with WASB (HTTP) URL.Even I tried with Account key as...

  • 8933 Views
  • 3 replies
  • 4 kudos
Latest Reply
vivek_sinha
Contributor
  • 4 kudos

Hi @Arvind Ravish​  The issue got fixed after passing HTTP and HTTPS enabled token to spark executors.Thanks again for your help

  • 4 kudos
2 More Replies
Prabakar
by Databricks Employee
  • 2928 Views
  • 1 replies
  • 1 kudos

Non-admin users unable to create jobs from Job UI Non-admin users may be experiencing difficulties interacting with the jobs UI. This is due to a rece...

Non-admin users unable to create jobs from Job UINon-admin users may be experiencing difficulties interacting with the jobs UI. This is due to a recently discovered UI regression in the 3.73 shard release, deployed to the jobs service starting June 6...

  • 2928 Views
  • 1 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

This has been conveyed to all customers. If the email landed in your spam box then this should help you.

  • 1 kudos
mali_bigdata
by New Contributor
  • 1201 Views
  • 0 replies
  • 0 kudos

Databricks is adding NULL value in the URL while moving the Fairlearn dashboard and causing CORS error and fairlearn dashboard keeps spinning.

We are trying to run FairnessDashboard and once we pass in the data to the dashboard it keeps on the spinning. Please see the attached file.Also we noticed that Databricks is adding NULL in the URL and eventually we get the CORS error and it is redir...

  • 1201 Views
  • 0 replies
  • 0 kudos
eager_to_learn
by New Contributor III
  • 5708 Views
  • 7 replies
  • 5 kudos

Resolved! Databricks pool - 2 instances are in running state without any job running in the system

We are using Azure Databricks pools, configured 16 max instances. Out of 16, 2 instances are in running state without any job in running condition, how & where can i check the usage of the instances ?p.s. SQL pool is also not running, so no chances o...

  • 5708 Views
  • 7 replies
  • 5 kudos
Latest Reply
eager_to_learn
New Contributor III
  • 5 kudos

@Kaniz Fatma​ / @Prabakar Ammeappin​ Any idea, how can we queue the jobs in the Resource pools, is it some setting which we need to switch on so the jobs are queued until instances are available or can you point some documentation for the same ?

  • 5 kudos
6 More Replies
ABAGRI
by New Contributor II
  • 2266 Views
  • 2 replies
  • 2 kudos

Resolved! Having Issues with extracting records from complex JSON

Hi Team,we are using delta live tables to ingest data from Kafka.the JSON file we receive is a complex JSON structure and we are trying to explode the file into its necessary columns and transactions, Thank youplease see attached sample file{ "Table...

  • 2266 Views
  • 2 replies
  • 2 kudos
Latest Reply
User16753725469
Contributor II
  • 2 kudos

Hi @Lantis Pillay​ Could you please try to parse JSON records in the below way

  • 2 kudos
1 More Replies
MattM
by New Contributor III
  • 2536 Views
  • 0 replies
  • 0 kudos

Unstructured Data - PDF and a semi-structured data

I have a scenario where one source is unstructered pdf files and another source is semi-structered JSON files. I get files from these two sources on a daily basis into an ADLS storage. What is the best way to load this into a medallion structure by s...

  • 2536 Views
  • 0 replies
  • 0 kudos
Antoine_De_A
by New Contributor III
  • 3168 Views
  • 1 replies
  • 3 kudos

Resolved! Streaming data to CosmosDB

Hello everyone,Here is the problem I am facing. I'm currently working on streaming data to DataBricks, my goal is to create a data stream on a first notebook, and then on a second notebook to read this data stream, add all the new rows to a dataFrame...

  • 3168 Views
  • 1 replies
  • 3 kudos
Latest Reply
Antoine_De_A
New Contributor III
  • 3 kudos

Problem solved!Instead of trying to do everything directly with the .writeStream options I used the .forEachBatch() function which allows me to call a function outside the .writeStream().In this function I get a dataFrame in parameter which is my str...

  • 3 kudos
curious-case-of
by New Contributor II
  • 11230 Views
  • 1 replies
  • 4 kudos

Databricks notebook taking too long to run as a job compared to when triggered from within the notebook

I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.The runtime when I run from within the notebook directly is roughly 2 hours. But w...

  • 11230 Views
  • 1 replies
  • 4 kudos
Latest Reply
wvl
New Contributor II
  • 4 kudos

We're seeing the same behavior.. Good performance using interactive cluster.Using identically sized job cluster, performance is bad. Any ideas?

  • 4 kudos
data_engineer_0
by New Contributor II
  • 15115 Views
  • 3 replies
  • 2 kudos

How to run the .py file in databricks cluster

Hi team,I wants to run the below command in databricks and also need to capture the error and success message.Please help me out here,Thanks in advanceEx: python3 /mnt/users/code/x.py --arguments

  • 15115 Views
  • 3 replies
  • 2 kudos
Latest Reply
User16764241763
Honored Contributor
  • 2 kudos

Hello @Piper Wilson​ Would this task not help?https://docs.databricks.com/dev-tools/api/latest/examples.html#jobs-api-examples

  • 2 kudos
2 More Replies
User15787040559
by Databricks Employee
  • 3123 Views
  • 1 replies
  • 0 kudos

MicrosoftTeams-image

ERROR Max retries exceeded with url: /api/2.0/jobs/runs/get?run_id= Failed to establish a new connectionThis error can happen when exceeding the rate limits for all REST API calls as documented here.In the image shown for example we're using the Jobs...

  • 3123 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16764241763
Honored Contributor
  • 0 kudos

Hi @Carlos Morillo​  Are you facing this issue consistently or when you run a lot of jobs?We are internally tracking a similar issue. Could you please file a support request with Microsoft Support? Databricks and MSFT will collaborate and provide upd...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels