Data Engineering

Forum Posts

Sorted by:

by Venkat_335 • New Contributor II

07-24-2023 9:06:46 PM

1781 Views
1 replies
1 kudos

ISO-8859-1 encode not giving expected result using pyspark

I used ISO-8859-1 codepage to read the some special characters like A.P. MØLLER - MÆRSK A/S usinh pypsark. But the output is not coming as expected and getting output like this A.P. M?LLER - M?RSK A/S. Can some one help to resolve it.

Data Engineering

1781 Views
1 replies
1 kudos

07-24-2023 9:06:46 PM

View Replies

Latest Reply

saipujari_spark
Databricks Employee

08-09-2023 11:23:35 AM

1 kudos

@Venkat_335 I am not able to reproduce the issue. Please let me know which DBR you are using. It works fine with DBR 12.2 without mentioning the ISO-8859-1

1 kudos

08-09-2023 11:23:35 AM

by Luu • New Contributor III

08-07-2023 6:53:09 AM

6346 Views
5 replies
3 kudos

OPTIMZE ZOrder does not have an effect

Hi all,recently I am facing a strange behaviour after an OPTIMZE ZOrder command. For a large table around (400 mio. rows) I executed the OPTIMIZE command with ZOrder for 3 columns. However, it seems that the command does not have any effect and the c...

Data Engineering

6346 Views
5 replies
3 kudos

08-07-2023 6:53:09 AM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 8:48:51 AM

3 kudos

There are several potential reasons why your OPTIMIZE ZORDER command may not have had any effect on your table:The existing data files may already be optimally sorted based on the ZOrder and/or column ordering.If the data is already optimized based o...

3 kudos

08-08-2023 8:48:51 AM

4 More Replies

by Ank • New Contributor II

08-22-2022 4:33:12 PM

10649 Views
5 replies
6 kudos

Why am I getting NameError name ' ' is not defined in another cell?

I defined a dictionary variable Dict, populated it, and print(dict) in the first cell of my notebook. In the next cell, I executed the command print(dict) again. However, this time it gave me an error NameError: name 'Dict is not definedHow can that ...

Data Engineering

10649 Views
5 replies
6 kudos

08-22-2022 4:33:12 PM

View Replies

Latest Reply

erigaud
Honored Contributor

08-09-2023 7:20:58 AM

6 kudos

Running pip install restarts the interpreter, meaning that any variable defined prior to the pip install is lost, so indeed the solution is so run the pip install first, or better is to add the library you want to installl directly to the cluster con...

6 kudos

08-09-2023 7:20:58 AM

4 More Replies

by AChang • New Contributor III

08-08-2023 11:35:12 AM

1964 Views
1 replies
0 kudos

Best Cluster Setup for intensive transformation workload

I have a pyspark dataframe, 61k rows, 3 columns, one of which is a string column which has a max length of 4k. I'm doing about 100 different regexp_replace functions on this dataframe, so, very resource intensive. I'm trying to write this to a delta ...

Data Engineering

cluster

ETL

regex

1964 Views
1 replies
0 kudos

08-08-2023 11:35:12 AM

View Replies

Latest Reply

Leonardo
New Contributor III

08-09-2023 7:14:16 AM

0 kudos

It seems that you're trying to apply a lot of transformations, but it's basic stuff, so I'd go for the best practices documentation and find a way to create a compute-optimized cluster.Ref.: https://docs.databricks.com/en/clusters/cluster-config-best...

0 kudos

08-09-2023 7:14:16 AM

by AryaMa • New Contributor III

07-12-2019 3:07:30 PM

33006 Views
13 replies
8 kudos

Resolved! reading data from url using spark

reading data form url using spark ,community edition ,got a path related error ,any suggestions please ? url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv" from pyspark import SparkFiles spark.sparkContext.addFil...

Data Engineering

33006 Views
13 replies
8 kudos

07-12-2019 3:07:30 PM

View Replies

Latest Reply

padang
New Contributor II

03-01-2023 1:10:07 PM

8 kudos

Sorry, bringing this back up...from pyspark import SparkFiles url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv" spark.sparkContext.addFile(url) df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSc...

8 kudos

03-01-2023 1:10:07 PM

12 More Replies

by Abhishek7781 • New Contributor II

08-08-2023 1:31:33 AM

3324 Views
1 replies
0 kudos

Unable to run a dbt project through a Databricks Workflows

I'm trying to run a dbt project which reads data from ADLS and writes back to ADLS using a Databricks Workflow. When I run the same project from my local machine (using python virtual environment from Visual Studio Code), it's running perfectly fine ...

Data Engineering

3324 Views
1 replies
0 kudos

08-08-2023 1:31:33 AM

View Replies

Latest Reply

Abhishek7781
New Contributor II

08-09-2023 12:14:31 AM

0 kudos

Tried installing an older version (2.1.0) of databricks-sql-connector (instead of 2.7.0) and surprisingly a new error message appeared. Don't know how to fix this now.

0 kudos

08-09-2023 12:14:31 AM

by AnaLippross • New Contributor

08-01-2023 5:18:25 AM

8673 Views
1 replies
1 kudos

Schema issues with External Tables

Hi everyone!We have started using Unity Catalog in our Project and I am seeing weird behavior with the schemas from external tables imported to Databricks. On Data Explorer when I expand some tables I see that the schema of those specific tables is w...

Data Engineering

External Tables

Unity Catalog

8673 Views
1 replies
1 kudos

08-01-2023 5:18:25 AM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 9:12:36 AM

1 kudos

It seems like you are encountering an issue with the schema mapping when importing external tables to Unity Catalog in Databricks.To troubleshoot thisBased on the information you've provided, it sounds like the issue you're experiencing could be rel...

1 kudos

08-08-2023 9:12:36 AM

by xavier20 • New Contributor

08-01-2023 6:32:29 PM

15562 Views
2 replies
1 kudos

SQL Execution API Code 400

I am trying to execute the following command to test API but getting response 400 import jsonimport osfrom urllib.parse import urljoin, urlencodeimport pyarrowimport requests# NOTE set debuglevel = 1 (or higher) for http debug loggingfrom http.client...

Data Engineering

15562 Views
2 replies
1 kudos

08-01-2023 6:32:29 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 9:07:32 AM

1 kudos

A 400 status code response indicates that the server was unable to process the request due to a client error, e.g., incorrect syntax, invalid parametersBased on the code you provided, it appears that you are trying to execute a SQL query against your...

1 kudos

08-08-2023 9:07:32 AM

1 More Replies

by PrebenOlsen • New Contributor III

07-27-2023 7:21:46 AM

3031 Views
3 replies
1 kudos

Can't start, delete, unpin or edit cluster: User is not part of org

Hi!Getting error message:DatabricksError: User XXX is not part of org: YYY. Config: host=https://adb-ZZZ.azuredatabricks.net, auth_type=runtimeI am in the admin's group, but I cannot alter this in any way. I've tried using the databricks-SDK using:fr...

Data Engineering

3031 Views
3 replies
1 kudos

07-27-2023 7:21:46 AM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 9:06:04 AM

1 kudos

To resolve this issue, I would recommend taking the following steps:Verify that you have the correct access and permissions:Check with your Databricks organization admin to ensure that your user account has the appropriate access level and permission...

1 kudos

08-08-2023 9:06:04 AM

2 More Replies

by Christine • Contributor II

07-21-2023 6:41:36 AM

24929 Views
4 replies
1 kudos

pyspark.pandas.read_excel(engine = xlrd) reading xls file with #REF error

Not sure if this is the right place to ask this question, so let me know if it is not. I am trying to read an xls file which containts #REF values in databricks with pyspark.pandas. When I try to read the file with "pyspark.pandas.read_excel(file_pat...

Data Engineering

24929 Views
4 replies
1 kudos

07-21-2023 6:41:36 AM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 9:03:19 AM

1 kudos

It sounds like you're trying to open an Excel file that has some invalid references, which is causing an error when you try to read it with pyspark.pandas.read_excel().One way to handle invalid references is to use the openpyxl engine instead of xlr...

1 kudos

08-08-2023 9:03:19 AM

3 More Replies

by Thor • New Contributor III

05-19-2023 1:48:04 AM

24111 Views
3 replies
6 kudos

How to remove duplicates in a Delta table?

I made multiple inserts (by error) in a Delta table and I have now strict duplicates, I feel like it's impossible to delete them if you don't have a column "IDENTITY" to distinguish lines (the primary key is RLOC+LOAD_DATE):it sounds odd to me not to...

Data Engineering

24111 Views
3 replies
6 kudos

05-19-2023 1:48:04 AM

View Replies

Latest Reply

Ken_H
New Contributor II

08-08-2023 8:58:02 AM

6 kudos

There are several great ways to handle this: https://stackoverflow.com/questions/61674476/how-to-drop-duplicates-in-delta-tableThis was my preference: with cte as(Select col1,col2,col3,etc,row_number()over(partition by col1,col2,col3,etc order by co...

6 kudos

08-08-2023 8:58:02 AM

2 More Replies

by raghunathr • New Contributor III

08-03-2023 12:23:16 PM

12975 Views
2 replies
4 kudos

Resolved! Benefits of Databricks Views vs Tables

Do we have any explicit benefits with Databricks Views when the view going to be a simple select of table?Does it improve performance by using views over tables?Giving access to views vs Tables?

Data Engineering

12975 Views
2 replies
4 kudos

08-03-2023 12:23:16 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 8:57:11 AM

4 kudos

There can be several benefits to using Databricks views, even when the view is a simple select of a table:Improved query readability and maintainability:By encapsulating queries in views, you can simplify complex queries, making them more readable an...

4 kudos

08-08-2023 8:57:11 AM

1 More Replies

by ashish577 • New Contributor III

08-02-2023 10:49:29 PM

3010 Views
3 replies
1 kudos

Any way to access unity catalog location through python/dbutils

I have a table created at unity catalog that was dropped, the files are not deleted due to the 30 day soft delete. Is there anyway to copy the files to a different location? When I try to use dbutils.fs.cp I get location overlap error with unity cata...

Data Engineering

3010 Views
3 replies
1 kudos

08-02-2023 10:49:29 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 8:56:26 AM

1 kudos

You can use the dbutils.fs.mv command to move the files from the deleted table to a new location. Here's an example of how to do it: python# Define the pathssource_path = "dbfs:/mnt/<unity-catalog-location>/<database-name>/<table-name>"target_path =...

1 kudos

08-08-2023 8:56:26 AM

2 More Replies

by sarnendude • New Contributor II

08-03-2023 3:06:39 AM

3911 Views
3 replies
2 kudos

Unable to enable Databricks Assistant

Databricks Assistant is currently in Public Preview.As per below documentation, I have clicked 'Account Console' link to logins & enable Databricks Assistant but I am not getting "Settings" option at left side in admin console.Once I log in using Azu...

Data Engineering

databricksassistant

3911 Views
3 replies
2 kudos

08-03-2023 3:06:39 AM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 8:55:01 AM

2 kudos

To enable Databricks Assistant, you need to navigate to the Admin Console in your Databricks workspace and follow these steps:Log in to your Databricks workspace using an account with workspace admin privileges.Click on the "Admin Console" icon in th...

2 kudos

08-08-2023 8:55:01 AM

2 More Replies

by User16783853501 • Databricks Employee

06-24-2021 6:26:58 PM

3272 Views
2 replies
2 kudos

Using Delta Time Travel what is the scalability limit for using the feature, at what point does the time travel become infeasible?

Data Engineering

3272 Views
2 replies
2 kudos

06-24-2021 6:26:58 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

08-08-2023 8:52:13 AM

2 kudos

The scalability limit for using Delta Time Travel depends on several factors, including the size of your Delta tables, the frequency of changes to the tables, and the retention periods for the Delta versions.In general, Delta Time Travel can become i...

2 kudos

08-08-2023 8:52:13 AM

1 More Replies

User

Count

1611

768

348

286

252

Databricks Community

Forum Posts

ISO-8859-1 encode not giving expected result using pyspark

OPTIMZE ZOrder does not have an effect

Why am I getting NameError name ' ' is not defined in another cell?

Best Cluster Setup for intensive transformation workload

Resolved! reading data from url using spark

Unable to run a dbt project through a Databricks Workflows

Schema issues with External Tables

SQL Execution API Code 400

Can't start, delete, unpin or edit cluster: User is not part of org

pyspark.pandas.read_excel(engine = xlrd) reading xls file with #REF error

How to remove duplicates in a Delta table?

Resolved! Benefits of Databricks Views vs Tables

Any way to access unity catalog location through python/dbutils

Unable to enable Databricks Assistant

Using Delta Time Travel what is the scalability limit for using the feature, at what point does the time travel become infeasible?

Join Us as a Local Community Builder!

Databricks External table row maximum size

DAB | Set tag based on job parameter

How can I use Terraform to assign an external loca...

global temp view issue

Dlt pipeline showing legacy , even though all thin...