Data Engineering

Forum Posts

Sorted by:

by cmditch • New Contributor II

07-23-2023 11:07:49 PM

718 Views
1 replies
0 kudos

Spark UI in GCP is broken

This seems to only be affecting single-node clusters in GCP and not multi-node clusters. I'm seeing 403 responses for all the css/js assets, among other things. I have not encountered this issue in an Azure workspace I have access to.My cluster is ru...

Data Engineering

718 Views
1 replies
0 kudos

07-23-2023 11:07:49 PM

View Replies

Latest Reply

cmditch
New Contributor II

08-09-2023 4:25:20 PM

0 kudos

This is still a problem and make single node clusters very difficult to use atm.See attached photo of what the UI looks like

0 kudos

08-09-2023 4:25:20 PM

by Venkat_335 • New Contributor II

07-24-2023 9:06:46 PM

950 Views
1 replies
1 kudos

ISO-8859-1 encode not giving expected result using pyspark

I used ISO-8859-1 codepage to read the some special characters like A.P. MØLLER - MÆRSK A/S usinh pypsark. But the output is not coming as expected and getting output like this A.P. M?LLER - M?RSK A/S. Can some one help to resolve it.

Data Engineering

950 Views
1 replies
1 kudos

07-24-2023 9:06:46 PM

View Replies

Latest Reply

saipujari_spark
Valued Contributor

08-09-2023 11:23:35 AM

1 kudos

@Venkat_335 I am not able to reproduce the issue. Please let me know which DBR you are using. It works fine with DBR 12.2 without mentioning the ISO-8859-1

1 kudos

08-09-2023 11:23:35 AM

by Luu • New Contributor III

08-07-2023 6:53:09 AM

2401 Views
7 replies
5 kudos

OPTIMZE ZOrder does not have an effect

Hi all,recently I am facing a strange behaviour after an OPTIMZE ZOrder command. For a large table around (400 mio. rows) I executed the OPTIMIZE command with ZOrder for 3 columns. However, it seems that the command does not have any effect and the c...

Data Engineering

2401 Views
7 replies
5 kudos

08-07-2023 6:53:09 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-08-2023 8:48:51 AM

5 kudos

There are several potential reasons why your OPTIMIZE ZORDER command may not have had any effect on your table:The existing data files may already be optimally sorted based on the ZOrder and/or column ordering.If the data is already optimized based o...

5 kudos

08-08-2023 8:48:51 AM

6 More Replies

by NCat • New Contributor III

08-08-2023 4:11:40 PM

3682 Views
5 replies
3 kudos

How can I start SparkSession out of Notebook?

Hi community,How can I start SparkSession out of Notebook?I want to split my Notebook into small Python modules, and I want to let some of them to call Spark functionality.

Data Engineering

3682 Views
5 replies
3 kudos

08-08-2023 4:11:40 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

08-09-2023 8:03:48 AM

3 kudos

in general (as already stated) a notebook automatically gets a sparksession.You don't have to do anything.If you specifically need to have separate sessions (isolation), you should run different notebooks (or plan different jobs) as these get a new s...

3 kudos

08-09-2023 8:03:48 AM

4 More Replies

by Ank • New Contributor II

08-22-2022 4:33:12 PM

4591 Views
5 replies
6 kudos

Why am I getting NameError name ' ' is not defined in another cell?

I defined a dictionary variable Dict, populated it, and print(dict) in the first cell of my notebook. In the next cell, I executed the command print(dict) again. However, this time it gave me an error NameError: name 'Dict is not definedHow can that ...

Data Engineering

4591 Views
5 replies
6 kudos

08-22-2022 4:33:12 PM

View Replies

Latest Reply

erigaud
Honored Contributor

08-09-2023 7:20:58 AM

6 kudos

Running pip install restarts the interpreter, meaning that any variable defined prior to the pip install is lost, so indeed the solution is so run the pip install first, or better is to add the library you want to installl directly to the cluster con...

6 kudos

08-09-2023 7:20:58 AM

4 More Replies

by AChang • New Contributor III

08-08-2023 11:35:12 AM

613 Views
1 replies
0 kudos

Best Cluster Setup for intensive transformation workload

I have a pyspark dataframe, 61k rows, 3 columns, one of which is a string column which has a max length of 4k. I'm doing about 100 different regexp_replace functions on this dataframe, so, very resource intensive. I'm trying to write this to a delta ...

Data Engineering

cluster

ETL

regex

613 Views
1 replies
0 kudos

08-08-2023 11:35:12 AM

View Replies

Latest Reply

Leonardo
New Contributor III

08-09-2023 7:14:16 AM

0 kudos

It seems that you're trying to apply a lot of transformations, but it's basic stuff, so I'd go for the best practices documentation and find a way to create a compute-optimized cluster.Ref.: https://docs.databricks.com/en/clusters/cluster-config-best...

0 kudos

08-09-2023 7:14:16 AM

by AryaMa • New Contributor III

07-12-2019 3:07:30 PM

18488 Views
13 replies
8 kudos

Resolved! reading data from url using spark

reading data form url using spark ,community edition ,got a path related error ,any suggestions please ? url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv" from pyspark import SparkFiles spark.sparkContext.addFil...

Data Engineering

18488 Views
13 replies
8 kudos

07-12-2019 3:07:30 PM

View Replies

Latest Reply

padang
New Contributor II

03-01-2023 1:10:07 PM

8 kudos

Sorry, bringing this back up...from pyspark import SparkFiles url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv" spark.sparkContext.addFile(url) df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSc...

8 kudos

03-01-2023 1:10:07 PM

12 More Replies

by Abhishek7781 • New Contributor II

08-08-2023 1:31:33 AM

1384 Views
1 replies
0 kudos

Unable to run a dbt project through a Databricks Workflows

I'm trying to run a dbt project which reads data from ADLS and writes back to ADLS using a Databricks Workflow. When I run the same project from my local machine (using python virtual environment from Visual Studio Code), it's running perfectly fine ...

Data Engineering

1384 Views
1 replies
0 kudos

08-08-2023 1:31:33 AM

View Replies

Latest Reply

Abhishek7781
New Contributor II

08-09-2023 12:14:31 AM

0 kudos

Tried installing an older version (2.1.0) of databricks-sql-connector (instead of 2.7.0) and surprisingly a new error message appeared. Don't know how to fix this now.

0 kudos

08-09-2023 12:14:31 AM

by AnaLippross • New Contributor

08-01-2023 5:18:25 AM

2900 Views
1 replies
1 kudos

Schema issues with External Tables

Hi everyone!We have started using Unity Catalog in our Project and I am seeing weird behavior with the schemas from external tables imported to Databricks. On Data Explorer when I expand some tables I see that the schema of those specific tables is w...

Data Engineering

External Tables

Unity Catalog

2900 Views
1 replies
1 kudos

08-01-2023 5:18:25 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-08-2023 9:12:36 AM

1 kudos

It seems like you are encountering an issue with the schema mapping when importing external tables to Unity Catalog in Databricks.To troubleshoot thisBased on the information you've provided, it sounds like the issue you're experiencing could be rel...

1 kudos

08-08-2023 9:12:36 AM

by xavier20 • New Contributor

08-01-2023 6:32:29 PM

2420 Views
2 replies
1 kudos

SQL Execution API Code 400

I am trying to execute the following command to test API but getting response 400 import jsonimport osfrom urllib.parse import urljoin, urlencodeimport pyarrowimport requests# NOTE set debuglevel = 1 (or higher) for http debug loggingfrom http.client...

Data Engineering

2420 Views
2 replies
1 kudos

08-01-2023 6:32:29 PM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-08-2023 9:07:32 AM

1 kudos

A 400 status code response indicates that the server was unable to process the request due to a client error, e.g., incorrect syntax, invalid parametersBased on the code you provided, it appears that you are trying to execute a SQL query against your...

1 kudos

08-08-2023 9:07:32 AM

1 More Replies

by PrebenOlsen • New Contributor III

07-27-2023 7:21:46 AM

1214 Views
4 replies
1 kudos

Can't start, delete, unpin or edit cluster: User is not part of org

Hi!Getting error message:DatabricksError: User XXX is not part of org: YYY. Config: host=https://adb-ZZZ.azuredatabricks.net, auth_type=runtimeI am in the admin's group, but I cannot alter this in any way. I've tried using the databricks-SDK using:fr...

Data Engineering

1214 Views
4 replies
1 kudos

07-27-2023 7:21:46 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-08-2023 9:06:04 AM

1 kudos

To resolve this issue, I would recommend taking the following steps:Verify that you have the correct access and permissions:Check with your Databricks organization admin to ensure that your user account has the appropriate access level and permission...

1 kudos

08-08-2023 9:06:04 AM

3 More Replies

by Christine • Contributor

07-21-2023 6:41:36 AM

8837 Views
5 replies
1 kudos

pyspark.pandas.read_excel(engine = xlrd) reading xls file with #REF error

Not sure if this is the right place to ask this question, so let me know if it is not. I am trying to read an xls file which containts #REF values in databricks with pyspark.pandas. When I try to read the file with "pyspark.pandas.read_excel(file_pat...

Data Engineering

8837 Views
5 replies
1 kudos

07-21-2023 6:41:36 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-08-2023 9:03:19 AM

1 kudos

It sounds like you're trying to open an Excel file that has some invalid references, which is causing an error when you try to read it with pyspark.pandas.read_excel().One way to handle invalid references is to use the openpyxl engine instead of xlr...

1 kudos

08-08-2023 9:03:19 AM

4 More Replies

by Thor • New Contributor III

05-19-2023 1:48:04 AM

8781 Views
3 replies
5 kudos

How to remove duplicates in a Delta table?

I made multiple inserts (by error) in a Delta table and I have now strict duplicates, I feel like it's impossible to delete them if you don't have a column "IDENTITY" to distinguish lines (the primary key is RLOC+LOAD_DATE):it sounds odd to me not to...

Data Engineering

8781 Views
3 replies
5 kudos

05-19-2023 1:48:04 AM

View Replies

Latest Reply

Ken_H
New Contributor II

08-08-2023 8:58:02 AM

5 kudos

There are several great ways to handle this: https://stackoverflow.com/questions/61674476/how-to-drop-duplicates-in-delta-tableThis was my preference: with cte as(Select col1,col2,col3,etc,row_number()over(partition by col1,col2,col3,etc order by co...

5 kudos

08-08-2023 8:58:02 AM

2 More Replies

by raghunathr • New Contributor

08-03-2023 12:23:16 PM

4341 Views
3 replies
4 kudos

Benefits of Databricks Views vs Tables

Do we have any explicit benefits with Databricks Views when the view going to be a simple select of table?Does it improve performance by using views over tables?Giving access to views vs Tables?

Data Engineering

4341 Views
3 replies
4 kudos

08-03-2023 12:23:16 PM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-08-2023 8:57:11 AM

4 kudos

There can be several benefits to using Databricks views, even when the view is a simple select of a table:Improved query readability and maintainability:By encapsulating queries in views, you can simplify complex queries, making them more readable an...

4 kudos

08-08-2023 8:57:11 AM

2 More Replies

by ashish577 • New Contributor III

08-02-2023 10:49:29 PM

1351 Views
3 replies
1 kudos

Any way to access unity catalog location through python/dbutils

I have a table created at unity catalog that was dropped, the files are not deleted due to the 30 day soft delete. Is there anyway to copy the files to a different location? When I try to use dbutils.fs.cp I get location overlap error with unity cata...

Data Engineering

1351 Views
3 replies
1 kudos

08-02-2023 10:49:29 PM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

08-08-2023 8:56:26 AM

1 kudos

You can use the dbutils.fs.mv command to move the files from the deleted table to a new location. Here's an example of how to do it: python# Define the pathssource_path = "dbfs:/mnt/<unity-catalog-location>/<database-name>/<table-name>"target_path =...

1 kudos

08-08-2023 8:56:26 AM

2 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Spark UI in GCP is broken

ISO-8859-1 encode not giving expected result using pyspark

OPTIMZE ZOrder does not have an effect

How can I start SparkSession out of Notebook?

Why am I getting NameError name ' ' is not defined in another cell?

Best Cluster Setup for intensive transformation workload

Resolved! reading data from url using spark

Unable to run a dbt project through a Databricks Workflows

Schema issues with External Tables

SQL Execution API Code 400

Can't start, delete, unpin or edit cluster: User is not part of org

pyspark.pandas.read_excel(engine = xlrd) reading xls file with #REF error

How to remove duplicates in a Delta table?

Benefits of Databricks Views vs Tables

Any way to access unity catalog location through python/dbutils

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...