- 12063 Views
- 1 replies
- 0 kudos
Hi,
I am trying to split a record in a table to 2 records based on a column value. Please refer to the sample below. The input table displays the 3 types of Product and their price. Notice that for a specific Product (row) only its corresponding col...
- 12063 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @rishigc
You can use something like below.
SELECT explode(arrays_zip(split(Product, '+'), split(Price, '+') ) as product_and_price from df
or
df.withColumn("product_and_price", explode(arrays_zip(split(Product, '+'), split(Price, '+'))).select(
...
- 4785 Views
- 2 replies
- 0 kudos
i have a dataframe of 18000000rows and 1322 column with '0' and '1' value.
want to find how many '1's are in every column ???
below is DataSet
se_00001 se_00007 se_00036 se_00100 se_0010p se_00250
- 4785 Views
- 2 replies
- 0 kudos
Latest Reply
Hi Siddhu,
You can use
df.select(sum("col1"), sum("col2"), sum("col3"))
where col1, col2, col3 are the column names for which you would like to find the sum
please let us know if it answers your question
Thanks
1 More Replies
- 6315 Views
- 1 replies
- 0 kudos
Hi Community
I would like to know if there is an option to create an integer sequence which persists even if the cluster is shut down. My target is to use this integer value as a surrogate key to join different tables or do Slowly changing dimensio...
- 6315 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @pascalvanbellen ,There is no concept of FK, PK, SK in Spark. But Databricks Delta automatically takes care of SCD type scenarios. https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html#slowly-changing-data-scd-type-2
...
- 2130 Views
- 1 replies
- 0 kudos
I have 10+ columns and want to take distinct rows by multiple columns into consideration. How to achieve this using pyspark dataframe functions ?
- 2130 Views
- 1 replies
- 0 kudos
Latest Reply
You can use dropDuplicates
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=distinct#pyspark.sql.DataFrame.dropDuplicates
- 12679 Views
- 4 replies
- 0 kudos
I am trying to display the html output or read in an html file to display in databricks notebook from pandas-profiling.import pandas as pd import pandas_profiling df = pd.read_csv("/dbfs/FileStore/tables/my_data.csv", header='infer', parse_dates=Tru...
- 12679 Views
- 4 replies
- 0 kudos
Latest Reply
What eventually worked for me was displayHTML(profile.to_html()) for the pandas_profiling and displayHTML(profile.html) for the spark_profiling.
3 More Replies
- 4290 Views
- 4 replies
- 0 kudos
Editing notebooks on DataBricks is rather cumbersome because it lacks a lot of features IDEs like PyCharm have.
Another problem is that a DataBricks notebook comes with some local state which are not present on my computer.
How can I edit notebooks ...
- 4290 Views
- 4 replies
- 0 kudos
Latest Reply
The documents are out for databricks-connect: https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html
I've also written up about a few limitations I have found - some with workarounds: https://datathirst.net/blog/2019/3/7/databricks-co...
3 More Replies
- 8560 Views
- 12 replies
- 0 kudos
Hi,I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.spark.read.format("c...
- 8560 Views
- 12 replies
- 0 kudos
Latest Reply
Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options.
Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sou...
11 More Replies
- 5458 Views
- 4 replies
- 0 kudos
I am trying to split my Date Column which is a String Type right now into 3 columns Year, Month and Date. I use (PySpark):
<code>split_date=pyspark.sql.functions.split(df['Date'], '-')
df= df.withColumn('Year', split_date.getItem(0))
df= df.wit...
- 5458 Views
- 4 replies
- 0 kudos
by
dan11
• New Contributor II
- 2337 Views
- 4 replies
- 1 kudos
<pre> Hello databricks people, I started working with databricks today. I have a sql script which I developed with sqlite3 on a laptop. I want to port the script to databricks. I started with two sql statements: select count(prop_id) from prop0; del...
- 2337 Views
- 4 replies
- 1 kudos
Latest Reply
Hey Dan, good to hear you're getting started with Databricks. This is not a limitation of Databricks it's a restriction built into Spark itself. Spark is not a data store, it's a distributed computation framework. Therefore deleting data would be un...
3 More Replies
- 3982 Views
- 1 replies
- 0 kudos
I have two files and I created two dataframes prod1 and prod2 out of it.I need to find the records with column names and values that are not matching in both the dfs.
id_sk is the primary key .all the cols are string datatype
dataframe 1 (prod1)
id_...
- 3982 Views
- 1 replies
- 0 kudos
Latest Reply
use full Outer Join in spark SQL
- 13126 Views
- 2 replies
- 0 kudos
I am using markdown to include links urls. I am using the below markdown syntax:
[link text](http://example.com)
The issue is each time I click the linked text it opens the url in the same tab as the notebook. I want the url to open it in a new ta...
- 13126 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @Ariel Herrera,
You can just put html anchor tag in databricks notebook cell. It will open a new tab when you click it.
Please try the example below. It works for me in databricks notebook.
%md <a href="https://google.com" target="_blank">google ...
1 More Replies
- 13942 Views
- 1 replies
- 0 kudos
I have a table in hbase with 1 billions records.I want to filter the records based on certain condition (by date).
For example:
Dataframe.filter(col(date) === todayDate)
Filter will be applied after all records from the table will be loaded into me...
- 13942 Views
- 1 replies
- 0 kudos
Latest Reply
Hello @senthil kumar​ To pass external values to the filter (or where) transformations you can use the "lit" function in the following way:Dataframe.filter(col(date) == lit(todayDate))don´t know if that helps. Be careful with the schema infered by th...
- 8537 Views
- 4 replies
- 0 kudos
Can someone please offer some insight - I've spent days trying to solve this issue
We have the task of loading in hundreds of tab seperated text files encoded in UTF-16 little endian with a tab delimiter. Our organisation is an international one and...
- 8537 Views
- 4 replies
- 0 kudos
Latest Reply
You can also always read in the file as a textFile, and then run a UTF-16 decoder/encoder library as a UDF on the text.
3 More Replies