- 707 Views
- 0 replies
- 1 kudos
Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...
- 707 Views
- 0 replies
- 1 kudos
by
as999
• New Contributor III
- 1398 Views
- 3 replies
- 1 kudos
I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row. For example.Original DF.sno Object Name shape rating1 Fruit apple round ...
- 1398 Views
- 3 replies
- 1 kudos
Latest Reply
basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...
2 More Replies
- 5622 Views
- 9 replies
- 8 kudos
I was reading an excel file with one column,country india India india India indiadataframe i got from this data : df.show()+-------+ |country| +-------+ | india | | India | | india | | India | | india | +-------+In the next step i removed last value ...
- 5622 Views
- 9 replies
- 8 kudos
Latest Reply
@sarvesh singh​ - Thank you for letting us know. Would you be happy to mark the best answer so others can find the solution easily?
8 More Replies
- 3506 Views
- 6 replies
- 4 kudos
I have created custom UDF's that generate logs. These logs can be flushed by calling another API exposed which is exposed by an internal layer. However I want to call this API just after the execution of the UDF comes to an end. Is there any way of d...
- 3506 Views
- 6 replies
- 4 kudos
Latest Reply
@Krishna Kashiv​ May be ExecutorPlugin.java can help. It has all the methods you might required. Let me know if it works or not.You need to implement this interface org.apache.spark.api.plugin.SparkPluginand expose it as spark.plugins = com.abc.Imp...
5 More Replies
- 3328 Views
- 2 replies
- 1 kudos
How to properly configure the jar containing the class and spark plugin in Databricks?During DBR 7.3 cluster creation, I tried by setting the spark.plugins, spark.driver.extraClassPath and spark.executor.extraClassPath Spark configs by copying the ja...
- 3328 Views
- 2 replies
- 1 kudos
Latest Reply
Hello @Krishna Kashiv​ - I don't know if we've met yet. My name is Piper and I'm a community moderator here. Thank you for your new question. It looks thorough! Let's give it a while to see what our members have to say. Otherwise, we will circle back...
1 More Replies
- 10594 Views
- 2 replies
- 0 kudos
I have a fixed length file ( a sample is shown below) and I want to read this file using DataFrames API in Spark using SCALA(not python or java). Using DataFrames API there are ways to read textFile, json file and so on but not sure if there is a wa...
- 10594 Views
- 2 replies
- 0 kudos
Latest Reply
Find the below solution which can be used. Let us consider this is the data in the file. EMP ID First Name Last Name 1Chris M 2John ...
1 More Replies
- 2945 Views
- 6 replies
- 5 kudos
I have a dataframe with the following columns:Key1Key2Y_N_ColCol1Col2For the key tuple (Key1, Key2), I have rows with Y_N_Col = "Y" and Y_N_Col = "N".I need a new dataframe with all rows with Y_N_Col = "Y" (regardless of the key tuple), plus all Y_N_...
- 2945 Views
- 6 replies
- 5 kudos
Latest Reply
I'd use a left-anti join.So create a df with all the Y, then create a df with all the N and do a left_anti join (on key1 and key2) on the df with the Y.then a union of those two.
5 More Replies
- 2101 Views
- 2 replies
- 1 kudos
Nore, I've tested with the same connection variable:locally with scala - works (via the same prod schema registry)in the cluster with python - worksin the cluster with scala - fails with 401 auth errordef setupSchemaRegistry(schemaRegistryUrl: String...
- 2101 Views
- 2 replies
- 1 kudos
Latest Reply
Found the issue: it's the uber package mangling some dependency resolving, which I fixedAnother issue, is that currently you can't use 6.* branch of confluent schema registry client in databricks, because the avro version is different then the one su...
1 More Replies
- 2002 Views
- 2 replies
- 1 kudos
(since Spark 3.0)Dataset.queryExecution.debug.toFilewill dump the full plan to a file, without concatenating the output as a fully materialized Java string in memory.
- 2002 Views
- 2 replies
- 1 kudos
Latest Reply
Notebooks really aren't the best method of viewing large files. Two methods you could employ areSave the file to dbfs and then use databricks CLI to download the fileUse the web terminalIn the web terminal option you can do something like "cat my_lar...
1 More Replies
- 2325 Views
- 2 replies
- 1 kudos
- 2325 Views
- 2 replies
- 1 kudos
Latest Reply
A class is the definition and you can create many instances of them, just like classes in any other language. An object is the instance of the class, a singleton, and can be used to create features you might recognise as static methods.Often when wri...
1 More Replies
- 1205 Views
- 2 replies
- 0 kudos
- 1205 Views
- 2 replies
- 0 kudos
Latest Reply
Object is fundamental building block of an object-oriented language. Integers, strings, floats, arrays, lists, dictionaries are all objects. Ex. number 15 is object, string "hello" is object.Whereas every object has Type, which defines what can be d...
1 More Replies
- 1014 Views
- 1 replies
- 1 kudos
- 1014 Views
- 1 replies
- 1 kudos
Latest Reply
We can do something like the below from python 3.2 onwards..import os
path = '/home/parent/child1/child2'
os.makedirs(path, exist_ok=True)
- 2961 Views
- 1 replies
- 2 kudos
- 2961 Views
- 1 replies
- 2 kudos
Latest Reply
We can do something like the below from python 3.2 onwards..import os
path = '/home/parent/child1/child2'
os.makedirs(path, exist_ok=True)
- 1822 Views
- 2 replies
- 1 kudos
- 1822 Views
- 2 replies
- 1 kudos
- 1260 Views
- 2 replies
- 3 kudos
In a project we use Azure Databricks to create csv files to be loaded in ThoughtSpot.Below is a sample to the code I use to write the file:val fileRepartition = 1
val fileFormat = "csv"
val fileSaveMode = "overwrite"
var fileOptions = Map (
...
- 1260 Views
- 2 replies
- 3 kudos
Latest Reply
Hi Shan,Thanks for the link.I now know more options for creating different csv files.I have not yet completed the problem, but that is related with a destination application (ThoughtSpot) not being able to load the data in the csv file correctly.Rega...
1 More Replies