cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Venkat_335
by New Contributor II
  • 1781 Views
  • 1 replies
  • 1 kudos

ISO-8859-1 encode not giving expected result using pyspark

I used ISO-8859-1 codepage to read the some special characters like A.P. MØLLER - MÆRSK A/S usinh pypsark. But the output is not coming as expected and getting output like this A.P. M?LLER - M?RSK A/S. Can some one help to resolve it.

  • 1781 Views
  • 1 replies
  • 1 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 1 kudos

@Venkat_335 I am not able to reproduce the issue. Please let me know which DBR you are using. It works fine with DBR 12.2 without mentioning the ISO-8859-1

  • 1 kudos
Luu
by New Contributor III
  • 6346 Views
  • 5 replies
  • 3 kudos

OPTIMZE ZOrder does not have an effect

Hi all,recently I am facing a strange behaviour after an OPTIMZE ZOrder command. For a large table around (400 mio. rows) I executed the OPTIMIZE command with ZOrder for 3 columns. However, it seems that the command does not have any effect and the c...

  • 6346 Views
  • 5 replies
  • 3 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 3 kudos

There are several potential reasons why your OPTIMIZE ZORDER command may not have had any effect on your table:The existing data files may already be optimally sorted based on the ZOrder and/or column ordering.If the data is already optimized based o...

  • 3 kudos
4 More Replies
Ank
by New Contributor II
  • 10649 Views
  • 5 replies
  • 6 kudos

Why am I getting NameError name ' ' is not defined in another cell?

I defined a dictionary variable Dict, populated it, and print(dict) in the first cell of my notebook. In the next cell, I executed the command print(dict) again. However, this time it gave me an error NameError: name 'Dict is not definedHow can that ...

  • 10649 Views
  • 5 replies
  • 6 kudos
Latest Reply
erigaud
Honored Contributor
  • 6 kudos

Running pip install restarts the interpreter, meaning that any variable defined prior to the pip install is lost, so indeed the solution is so run the pip install first, or better is to add the library you want to installl directly to the cluster con...

  • 6 kudos
4 More Replies
AChang
by New Contributor III
  • 1964 Views
  • 1 replies
  • 0 kudos

Best Cluster Setup for intensive transformation workload

I have a pyspark dataframe, 61k rows, 3 columns, one of which is a string column which has a max length of 4k. I'm doing about 100 different regexp_replace functions on this dataframe, so, very resource intensive. I'm trying to write this to a delta ...

Data Engineering
cluster
ETL
regex
  • 1964 Views
  • 1 replies
  • 0 kudos
Latest Reply
Leonardo
New Contributor III
  • 0 kudos

It seems that you're trying to apply a lot of transformations, but it's basic stuff, so I'd go for the best practices documentation and find a way to create a compute-optimized cluster.Ref.: https://docs.databricks.com/en/clusters/cluster-config-best...

  • 0 kudos
AryaMa
by New Contributor III
  • 33006 Views
  • 13 replies
  • 8 kudos

Resolved! reading data from url using spark

reading data form url using spark ,community edition ,got a path related error ,any suggestions please ? url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv" from pyspark import SparkFiles spark.sparkContext.addFil...

  • 33006 Views
  • 13 replies
  • 8 kudos
Latest Reply
padang
New Contributor II
  • 8 kudos

Sorry, bringing this back up...​from pyspark import SparkFiles url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv" spark.sparkContext.addFile(url) df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSc...

  • 8 kudos
12 More Replies
Abhishek7781
by New Contributor II
  • 3324 Views
  • 1 replies
  • 0 kudos

Unable to run a dbt project through a Databricks Workflows

I'm trying to run a dbt project which reads data from ADLS and writes back to ADLS using a Databricks Workflow. When I run the same project from my local machine (using python virtual environment from Visual Studio Code), it's running perfectly fine ...

Abhishek7781_0-1691482504468.png
  • 3324 Views
  • 1 replies
  • 0 kudos
Latest Reply
Abhishek7781
New Contributor II
  • 0 kudos

Tried installing an older version (2.1.0) of databricks-sql-connector (instead of 2.7.0) and surprisingly a new error message appeared. Don't know how to fix this now. 

  • 0 kudos
AnaLippross
by New Contributor
  • 8673 Views
  • 1 replies
  • 1 kudos

Schema issues with External Tables

Hi everyone!We have started using Unity Catalog in our Project and I am seeing weird behavior with the schemas from external tables imported to Databricks. On Data Explorer when I expand some tables I see that the schema of those specific tables is w...

Data Engineering
External Tables
Unity Catalog
  • 8673 Views
  • 1 replies
  • 1 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 1 kudos

 It seems like you are encountering an issue with the schema mapping when importing external tables to Unity Catalog in Databricks.To troubleshoot thisBased on the information you've provided, it sounds like the issue you're experiencing could be rel...

  • 1 kudos
xavier20
by New Contributor
  • 15562 Views
  • 2 replies
  • 1 kudos

SQL Execution API Code 400

I am trying to execute the following command to test API but getting response 400 import jsonimport osfrom urllib.parse import urljoin, urlencodeimport pyarrowimport requests# NOTE set debuglevel = 1 (or higher) for http debug loggingfrom http.client...

  • 15562 Views
  • 2 replies
  • 1 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 1 kudos

A 400 status code response indicates that the server was unable to process the request due to a client error, e.g., incorrect syntax, invalid parametersBased on the code you provided, it appears that you are trying to execute a SQL query against your...

  • 1 kudos
1 More Replies
PrebenOlsen
by New Contributor III
  • 3031 Views
  • 3 replies
  • 1 kudos

Can't start, delete, unpin or edit cluster: User is not part of org

Hi!Getting error message:DatabricksError: User XXX is not part of org: YYY. Config: host=https://adb-ZZZ.azuredatabricks.net, auth_type=runtimeI am in the admin's group, but I cannot alter this in any way. I've tried using the databricks-SDK using:fr...

  • 3031 Views
  • 3 replies
  • 1 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 1 kudos

To resolve this issue, I would recommend taking the following steps:Verify that you have the correct access and permissions:Check with your Databricks organization admin to ensure that your user account has the appropriate access level and permission...

  • 1 kudos
2 More Replies
Christine
by Contributor II
  • 24929 Views
  • 4 replies
  • 1 kudos

pyspark.pandas.read_excel(engine = xlrd) reading xls file with #REF error

Not sure if this is the right place to ask this question, so let me know if it is not. I am trying to read an xls file which containts #REF values in databricks with pyspark.pandas. When I try to read the file with "pyspark.pandas.read_excel(file_pat...

  • 24929 Views
  • 4 replies
  • 1 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 1 kudos

 It sounds like you're trying to open an Excel file that has some invalid references, which is causing an error when you try to read it with pyspark.pandas.read_excel().One way to handle invalid references is to use the openpyxl engine instead of xlr...

  • 1 kudos
3 More Replies
Thor
by New Contributor III
  • 24111 Views
  • 3 replies
  • 6 kudos

How to remove duplicates in a Delta table?

I made multiple inserts (by error) in a Delta table and I have now strict duplicates, I feel like it's impossible to delete them if you don't have a column "IDENTITY" to distinguish lines (the primary key is RLOC+LOAD_DATE):it sounds odd to me not to...

snap delete snap identity
  • 24111 Views
  • 3 replies
  • 6 kudos
Latest Reply
Ken_H
New Contributor II
  • 6 kudos

There are several great ways to handle this:  https://stackoverflow.com/questions/61674476/how-to-drop-duplicates-in-delta-tableThis was my preference: with cte as(Select col1,col2,col3,etc,row_number()over(partition by col1,col2,col3,etc order by co...

  • 6 kudos
2 More Replies
raghunathr
by New Contributor III
  • 12975 Views
  • 2 replies
  • 4 kudos

Resolved! Benefits of Databricks Views vs Tables

Do we have any explicit benefits with Databricks Views when the view going to be a simple select of table?Does it improve performance by using views over tables?Giving access to views vs Tables?  

  • 12975 Views
  • 2 replies
  • 4 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 4 kudos

There can be several benefits to using Databricks views, even when the view is a simple select of a table:Improved query readability and maintainability:By encapsulating queries in views, you can simplify complex queries, making them more readable an...

  • 4 kudos
1 More Replies
ashish577
by New Contributor III
  • 3010 Views
  • 3 replies
  • 1 kudos

Any way to access unity catalog location through python/dbutils

I have a table created at unity catalog that was dropped, the files are not deleted due to the 30 day soft delete. Is there anyway to copy the files to a different location? When I try to use dbutils.fs.cp I get location overlap error with unity cata...

  • 3010 Views
  • 3 replies
  • 1 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 1 kudos

 You can use the dbutils.fs.mv command to move the files from the deleted table to a new location. Here's an example of how to do it: python# Define the pathssource_path = "dbfs:/mnt/<unity-catalog-location>/<database-name>/<table-name>"target_path =...

  • 1 kudos
2 More Replies
sarnendude
by New Contributor II
  • 3911 Views
  • 3 replies
  • 2 kudos

Unable to enable Databricks Assistant

Databricks Assistant is currently in Public Preview.As per below documentation, I have clicked 'Account Console' link to logins & enable Databricks Assistant but I am not getting "Settings" option at left side in admin console.Once I log in using Azu...

Data Engineering
databricksassistant
  • 3911 Views
  • 3 replies
  • 2 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 2 kudos

To enable Databricks Assistant, you need to navigate to the Admin Console in your Databricks workspace and follow these steps:Log in to your Databricks workspace using an account with workspace admin privileges.Click on the "Admin Console" icon in th...

  • 2 kudos
2 More Replies
User16783853501
by Databricks Employee
  • 3272 Views
  • 2 replies
  • 2 kudos

Using Delta Time Travel what is the scalability limit for using the feature, at what point does the time travel become infeasible?

Using Delta Time Travel what is the scalability limit for using the feature, at what point does the time travel become infeasible? 

  • 3272 Views
  • 2 replies
  • 2 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 2 kudos

The scalability limit for using Delta Time Travel depends on several factors, including the size of your Delta tables, the frequency of changes to the tables, and the retention periods for the Delta versions.In general, Delta Time Travel can become i...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels