cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cmditch
by New Contributor II
  • 711 Views
  • 1 replies
  • 0 kudos

Spark UI in GCP is broken

This seems to only be affecting single-node clusters in GCP and not multi-node clusters. I'm seeing 403 responses for all the css/js assets, among other things. I have not encountered this issue in an Azure workspace I have access to.My cluster is ru...

  • 711 Views
  • 1 replies
  • 0 kudos
Latest Reply
cmditch
New Contributor II
  • 0 kudos

This is still a problem and make single node clusters very difficult to use atm.See attached photo of what the UI looks like

  • 0 kudos
Venkat_335
by New Contributor II
  • 934 Views
  • 1 replies
  • 1 kudos

ISO-8859-1 encode not giving expected result using pyspark

I used ISO-8859-1 codepage to read the some special characters like A.P. MØLLER - MÆRSK A/S usinh pypsark. But the output is not coming as expected and getting output like this A.P. M?LLER - M?RSK A/S. Can some one help to resolve it.

  • 934 Views
  • 1 replies
  • 1 kudos
Latest Reply
saipujari_spark
Valued Contributor
  • 1 kudos

@Venkat_335 I am not able to reproduce the issue. Please let me know which DBR you are using. It works fine with DBR 12.2 without mentioning the ISO-8859-1

  • 1 kudos
Luu
by New Contributor III
  • 2334 Views
  • 7 replies
  • 5 kudos

OPTIMZE ZOrder does not have an effect

Hi all,recently I am facing a strange behaviour after an OPTIMZE ZOrder command. For a large table around (400 mio. rows) I executed the OPTIMIZE command with ZOrder for 3 columns. However, it seems that the command does not have any effect and the c...

  • 2334 Views
  • 7 replies
  • 5 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 5 kudos

There are several potential reasons why your OPTIMIZE ZORDER command may not have had any effect on your table:The existing data files may already be optimally sorted based on the ZOrder and/or column ordering.If the data is already optimized based o...

  • 5 kudos
6 More Replies
NCat
by New Contributor III
  • 3544 Views
  • 5 replies
  • 3 kudos

How can I start SparkSession out of Notebook?

Hi community,How can I start SparkSession out of Notebook?I want to split my Notebook into small Python modules, and I want to let some of them to call Spark functionality.

  • 3544 Views
  • 5 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

in general (as already stated) a notebook automatically gets a sparksession.You don't have to do anything.If you specifically need to have separate sessions (isolation), you should run different notebooks (or plan different jobs) as these get a new s...

  • 3 kudos
4 More Replies
Ank
by New Contributor II
  • 4493 Views
  • 5 replies
  • 6 kudos

Why am I getting NameError name ' ' is not defined in another cell?

I defined a dictionary variable Dict, populated it, and print(dict) in the first cell of my notebook. In the next cell, I executed the command print(dict) again. However, this time it gave me an error NameError: name 'Dict is not definedHow can that ...

  • 4493 Views
  • 5 replies
  • 6 kudos
Latest Reply
erigaud
Honored Contributor
  • 6 kudos

Running pip install restarts the interpreter, meaning that any variable defined prior to the pip install is lost, so indeed the solution is so run the pip install first, or better is to add the library you want to installl directly to the cluster con...

  • 6 kudos
4 More Replies
AChang
by New Contributor III
  • 597 Views
  • 1 replies
  • 0 kudos

Best Cluster Setup for intensive transformation workload

I have a pyspark dataframe, 61k rows, 3 columns, one of which is a string column which has a max length of 4k. I'm doing about 100 different regexp_replace functions on this dataframe, so, very resource intensive. I'm trying to write this to a delta ...

Data Engineering
cluster
ETL
regex
  • 597 Views
  • 1 replies
  • 0 kudos
Latest Reply
Leonardo
New Contributor III
  • 0 kudos

It seems that you're trying to apply a lot of transformations, but it's basic stuff, so I'd go for the best practices documentation and find a way to create a compute-optimized cluster.Ref.: https://docs.databricks.com/en/clusters/cluster-config-best...

  • 0 kudos
AryaMa
by New Contributor III
  • 18348 Views
  • 13 replies
  • 8 kudos

Resolved! reading data from url using spark

reading data form url using spark ,community edition ,got a path related error ,any suggestions please ? url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv" from pyspark import SparkFiles spark.sparkContext.addFil...

  • 18348 Views
  • 13 replies
  • 8 kudos
Latest Reply
padang
New Contributor II
  • 8 kudos

Sorry, bringing this back up...​from pyspark import SparkFiles url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv" spark.sparkContext.addFile(url) df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSc...

  • 8 kudos
12 More Replies
Abhishek7781
by New Contributor II
  • 1373 Views
  • 1 replies
  • 0 kudos

Unable to run a dbt project through a Databricks Workflows

I'm trying to run a dbt project which reads data from ADLS and writes back to ADLS using a Databricks Workflow. When I run the same project from my local machine (using python virtual environment from Visual Studio Code), it's running perfectly fine ...

Abhishek7781_0-1691482504468.png
  • 1373 Views
  • 1 replies
  • 0 kudos
Latest Reply
Abhishek7781
New Contributor II
  • 0 kudos

Tried installing an older version (2.1.0) of databricks-sql-connector (instead of 2.7.0) and surprisingly a new error message appeared. Don't know how to fix this now. 

  • 0 kudos
AnaLippross
by New Contributor
  • 2852 Views
  • 1 replies
  • 1 kudos

Schema issues with External Tables

Hi everyone!We have started using Unity Catalog in our Project and I am seeing weird behavior with the schemas from external tables imported to Databricks. On Data Explorer when I expand some tables I see that the schema of those specific tables is w...

Data Engineering
External Tables
Unity Catalog
  • 2852 Views
  • 1 replies
  • 1 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 1 kudos

 It seems like you are encountering an issue with the schema mapping when importing external tables to Unity Catalog in Databricks.To troubleshoot thisBased on the information you've provided, it sounds like the issue you're experiencing could be rel...

  • 1 kudos
xavier20
by New Contributor
  • 2214 Views
  • 2 replies
  • 1 kudos

SQL Execution API Code 400

I am trying to execute the following command to test API but getting response 400 import jsonimport osfrom urllib.parse import urljoin, urlencodeimport pyarrowimport requests# NOTE set debuglevel = 1 (or higher) for http debug loggingfrom http.client...

  • 2214 Views
  • 2 replies
  • 1 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 1 kudos

A 400 status code response indicates that the server was unable to process the request due to a client error, e.g., incorrect syntax, invalid parametersBased on the code you provided, it appears that you are trying to execute a SQL query against your...

  • 1 kudos
1 More Replies
PrebenOlsen
by New Contributor III
  • 1183 Views
  • 4 replies
  • 1 kudos

Can't start, delete, unpin or edit cluster: User is not part of org

Hi!Getting error message:DatabricksError: User XXX is not part of org: YYY. Config: host=https://adb-ZZZ.azuredatabricks.net, auth_type=runtimeI am in the admin's group, but I cannot alter this in any way. I've tried using the databricks-SDK using:fr...

  • 1183 Views
  • 4 replies
  • 1 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 1 kudos

To resolve this issue, I would recommend taking the following steps:Verify that you have the correct access and permissions:Check with your Databricks organization admin to ensure that your user account has the appropriate access level and permission...

  • 1 kudos
3 More Replies
Christine
by Contributor
  • 8676 Views
  • 5 replies
  • 1 kudos

pyspark.pandas.read_excel(engine = xlrd) reading xls file with #REF error

Not sure if this is the right place to ask this question, so let me know if it is not. I am trying to read an xls file which containts #REF values in databricks with pyspark.pandas. When I try to read the file with "pyspark.pandas.read_excel(file_pat...

  • 8676 Views
  • 5 replies
  • 1 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 1 kudos

 It sounds like you're trying to open an Excel file that has some invalid references, which is causing an error when you try to read it with pyspark.pandas.read_excel().One way to handle invalid references is to use the openpyxl engine instead of xlr...

  • 1 kudos
4 More Replies
Thor
by New Contributor III
  • 8628 Views
  • 3 replies
  • 5 kudos

How to remove duplicates in a Delta table?

I made multiple inserts (by error) in a Delta table and I have now strict duplicates, I feel like it's impossible to delete them if you don't have a column "IDENTITY" to distinguish lines (the primary key is RLOC+LOAD_DATE):it sounds odd to me not to...

snap delete snap identity
  • 8628 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ken_H
New Contributor II
  • 5 kudos

There are several great ways to handle this:  https://stackoverflow.com/questions/61674476/how-to-drop-duplicates-in-delta-tableThis was my preference: with cte as(Select col1,col2,col3,etc,row_number()over(partition by col1,col2,col3,etc order by co...

  • 5 kudos
2 More Replies
raghunathr
by New Contributor
  • 4278 Views
  • 3 replies
  • 4 kudos

Benefits of Databricks Views vs Tables

Do we have any explicit benefits with Databricks Views when the view going to be a simple select of table?Does it improve performance by using views over tables?Giving access to views vs Tables?  

  • 4278 Views
  • 3 replies
  • 4 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 4 kudos

There can be several benefits to using Databricks views, even when the view is a simple select of a table:Improved query readability and maintainability:By encapsulating queries in views, you can simplify complex queries, making them more readable an...

  • 4 kudos
2 More Replies
ashish577
by New Contributor III
  • 1317 Views
  • 3 replies
  • 1 kudos

Any way to access unity catalog location through python/dbutils

I have a table created at unity catalog that was dropped, the files are not deleted due to the 30 day soft delete. Is there anyway to copy the files to a different location? When I try to use dbutils.fs.cp I get location overlap error with unity cata...

  • 1317 Views
  • 3 replies
  • 1 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 1 kudos

 You can use the dbutils.fs.mv command to move the files from the deleted table to a new location. Here's an example of how to do it: python# Define the pathssource_path = "dbfs:/mnt/<unity-catalog-location>/<database-name>/<table-name>"target_path =...

  • 1 kudos
2 More Replies
Labels
Top Kudoed Authors