cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

srchella
by New Contributor
  • 2185 Views
  • 1 replies
  • 0 kudos

How to take distinct of multiple columns ( > than 2 columns) in pyspark datafarme ?

I have 10+ columns and want to take distinct rows by multiple columns into consideration. How to achieve this using pyspark dataframe functions ?

  • 2185 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sandeep
Contributor III
  • 0 kudos

You can use dropDuplicates https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=distinct#pyspark.sql.DataFrame.dropDuplicates

  • 0 kudos
cfregly
by Contributor
  • 10651 Views
  • 15 replies
  • 0 kudos
  • 10651 Views
  • 15 replies
  • 0 kudos
Latest Reply
wildhogg
New Contributor II
  • 0 kudos

Well, just a little bit research, and i found this post below: Hopefully this will help. " registerTempTable() registerTempTable() creates an in-memory table that is scoped to the cluster in which it was created. The data is stored using Hive's high...

  • 0 kudos
14 More Replies
DavidWrench
by New Contributor II
  • 12929 Views
  • 4 replies
  • 0 kudos

Displaying HTML Output

I am trying to display the html output or read in an html file to display in databricks notebook from pandas-profiling.import pandas as pd import pandas_profiling df = pd.read_csv("/dbfs/FileStore/tables/my_data.csv", header='infer', parse_dates=Tru...

  • 12929 Views
  • 4 replies
  • 0 kudos
Latest Reply
Bendu_Preez
New Contributor II
  • 0 kudos

What eventually worked for me was displayHTML(profile.to_html()) for the pandas_profiling and displayHTML(profile.html) for the spark_profiling.

  • 0 kudos
3 More Replies
AdamArold
by New Contributor
  • 4369 Views
  • 4 replies
  • 0 kudos

How can I integrate DataBricks into PyCharm?

Editing notebooks on DataBricks is rather cumbersome because it lacks a lot of features IDEs like PyCharm have. Another problem is that a DataBricks notebook comes with some local state which are not present on my computer. How can I edit notebooks ...

  • 4369 Views
  • 4 replies
  • 0 kudos
Latest Reply
SimonD_Morias
New Contributor II
  • 0 kudos

The documents are out for databricks-connect: https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html I've also written up about a few limitations I have found - some with workarounds: https://datathirst.net/blog/2019/3/7/databricks-co...

  • 0 kudos
3 More Replies
microamp
by New Contributor II
  • 8898 Views
  • 12 replies
  • 0 kudos

Azure Data Lake Config Issue: No value for dfs.adls.oauth2.access.token.provider found in conf file.

Hi,I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.spark.read.format("c...

  • 8898 Views
  • 12 replies
  • 0 kudos
Latest Reply
User16301467523
New Contributor II
  • 0 kudos

Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options. Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sou...

  • 0 kudos
11 More Replies
PranjalThapar
by New Contributor
  • 5599 Views
  • 4 replies
  • 0 kudos

Splitting Date into Year, Month and Day, with inconsistent delimiters

I am trying to split my Date Column which is a String Type right now into 3 columns Year, Month and Date. I use (PySpark): <code>split_date=pyspark.sql.functions.split(df['Date'], '-') df= df.withColumn('Year', split_date.getItem(0)) df= df.wit...

  • 5599 Views
  • 4 replies
  • 0 kudos
Latest Reply
youssefassouli
New Contributor II
  • 0 kudos

thank you so much that was halpful

  • 0 kudos
3 More Replies
dan11
by New Contributor II
  • 2462 Views
  • 4 replies
  • 1 kudos

sql delete?

<pre> Hello databricks people, I started working with databricks today. I have a sql script which I developed with sqlite3 on a laptop. I want to port the script to databricks. I started with two sql statements: select count(prop_id) from prop0; del...

  • 2462 Views
  • 4 replies
  • 1 kudos
Latest Reply
Bill_Chambers
Contributor II
  • 1 kudos

Hey Dan, good to hear you're getting started with Databricks. This is not a limitation of Databricks it's a restriction built into Spark itself. Spark is not a data store, it's a distributed computation framework. Therefore deleting data would be un...

  • 1 kudos
3 More Replies
shampa
by New Contributor
  • 4059 Views
  • 1 replies
  • 0 kudos

How can we compare two dataframes in spark scala to find difference between these 2 files, which column ?? and value ??.

I have two files and I created two dataframes prod1 and prod2 out of it.I need to find the records with column names and values that are not matching in both the dfs. id_sk is the primary key .all the cols are string datatype dataframe 1 (prod1) id_...

  • 4059 Views
  • 1 replies
  • 0 kudos
Latest Reply
manojlukhi
New Contributor II
  • 0 kudos

use full Outer Join in spark SQL

  • 0 kudos
ArielHerrera
by New Contributor II
  • 13359 Views
  • 2 replies
  • 0 kudos

Resolved! How to create blank target links in markdown to open url link in new tabs?

I am using markdown to include links urls. I am using the below markdown syntax: [link text](http://example.com) The issue is each time I click the linked text it opens the url in the same tab as the notebook. I want the url to open it in a new ta...

  • 13359 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ariel Herrera, You can just put html anchor tag in databricks notebook cell. It will open a new tab when you click it. Please try the example below. It works for me in databricks notebook. %md <a href="https://google.com" target="_blank">google ...

  • 0 kudos
1 More Replies
cfregly
by Contributor
  • 5765 Views
  • 5 replies
  • 0 kudos
  • 5765 Views
  • 5 replies
  • 0 kudos
Latest Reply
MatthewValenti
New Contributor II
  • 0 kudos

This is an old post, however, is this still accurate for the latest version of Databricks in 2019? If so, how to approach the following?1. Connect to many MongoDBs.2. Connect to MongoDB when connection string information is dynamic (i.e. stored in s...

  • 0 kudos
4 More Replies
senthilkumar
by New Contributor
  • 14273 Views
  • 1 replies
  • 0 kudos

How filter condition working in spark dataframe?

I have a table in hbase with 1 billions records.I want to filter the records based on certain condition (by date). For example: Dataframe.filter(col(date) === todayDate) Filter will be applied after all records from the table will be loaded into me...

  • 14273 Views
  • 1 replies
  • 0 kudos
Latest Reply
muk1
New Contributor II
  • 0 kudos

Hello @senthil kumar​ To pass external values to the filter (or where) transformations you can use the "lit" function in the following way:Dataframe.filter(col(date) == lit(todayDate))don´t know if that helps. Be careful with the schema infered by th...

  • 0 kudos
DominicRobinson
by New Contributor II
  • 8826 Views
  • 4 replies
  • 0 kudos

Issues with UTF-16 files and unicode characters

Can someone please offer some insight - I've spent days trying to solve this issue We have the task of loading in hundreds of tab seperated text files encoded in UTF-16 little endian with a tab delimiter. Our organisation is an international one and...

  • 8826 Views
  • 4 replies
  • 0 kudos
Latest Reply
User16817872376
New Contributor III
  • 0 kudos

You can also always read in the file as a textFile, and then run a UTF-16 decoder/encoder library as a UDF on the text.

  • 0 kudos
3 More Replies
Tamara
by New Contributor III
  • 8967 Views
  • 8 replies
  • 1 kudos

Resolved! Can I connect to a MS SQL server table in Databricks account?

I'd like to access a table on a MS SQL Server (Microsoft). Is it possible from Databricks? To my understanding, the syntax is something like this (in a SQL Notebook): CREATE TEMPORARY TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url...

  • 8967 Views
  • 8 replies
  • 1 kudos
Latest Reply
JohnSmith091
New Contributor II
  • 1 kudos

Thanks for the trick that you have shared with us. I am really amazed to use this informational post. If you are facing MacBook error like MacBook Pro won't turn on black screen then click the link.

  • 1 kudos
7 More Replies
juan_perez
by New Contributor
  • 10864 Views
  • 2 replies
  • 0 kudos

Write data Frame into Azure Data Lake Storage

It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake. To mount the dat...

  • 10864 Views
  • 2 replies
  • 0 kudos
Latest Reply
PawanShukla
New Contributor III
  • 0 kudos

I am new in Azure Data Bricks..and I am trying to write the Data frame in mounted ADLS file. But in below command dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/mnt/<mount-name>")

  • 0 kudos
1 More Replies
SatheesshChinnu
by New Contributor III
  • 8251 Views
  • 4 replies
  • 0 kudos

Resolved! Error: TransportResponseHandler: Still have 1 requests outstanding when connection, occurring only on large dataset.

I am getting below error only during large dataset(i.e 15 TB compressed) . if my dataset is small( 1TB) i am not getting this error. Look like it fails on shuffle stage. Approx number of mappers is 150,000 Spark config:spark.sql.warehouse.dir hdfs:...

  • 8251 Views
  • 4 replies
  • 0 kudos
Latest Reply
parikshitbhoyar
New Contributor II
  • 0 kudos

@Satheessh Chinnusamy how did you solve the above issue

  • 0 kudos
3 More Replies
Labels
Top Kudoed Authors