Hello guys, I'm having an issue when trying to get a row values from spark data frame.I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .I tried to partitionBy index column, optimize with zor...
Using DBR 10 or later and I’m getting an error when running the following querySELECT * FROM delta.`s3://some_path`getting org.apache.spark.SparkException: Unable to fetch tables of db deltaFor 3.2.0+ they recommend reading like this:CREATE TEMPORAR...
Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...
@Kaniz Fatma​ @Parker Temple​ I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....
Hi, I'm trying to save a dataframe to csv with the code below:output.coalesce(1).write.mode('overwrite').option('header', 'true').csv(tmp_file_path) But it get "Py4JJavaError: An error occurred while calling o5082.csv." error. Any idea how to solve...
Job aborted due to stage failure: Task 0 in stage 3084.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3084.0 (TID...., ip..., executor 0): org.apache.spark.SparkExecution: Task failed while writing rowsJob aborted due to stage failure:...
What we have:Databricks Workspace Premium on AzureADLS Gen2 storage for raw data, processed data (tables) and files like CSV, models, etc.What we want to do:We have users that want to work on Databricks to create and work with Python algorithms. We d...
Hey @Vartika Nain​ , we are still at the same situation as described above. The Hive Metastore is a weak point.I would love to have the functionality that a mount can be dedicated to a given cluster.Regards, Gerhard
getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...
Hi @Rahul Samant​ , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...
We have been trying to update some library versions by uninstalling the old versions and installing new ones. However, the old libraries continue to get installed on cluster startup despite not showing up in the "libraries" tab of the cluster page. W...
The issue seemed to go away on its own. At some point the libraries page started showing what was getting installed to the cluster, and removing libraries from the page caused them to stop getting installed on cluster startup. I'm guessing there was ...
I am trying to check whether a certain datapoint exists in multiple locations.This is what my table looks like:I am checking whether the same datapoint is in two locations. The idea is that this datapoint should exist in BOTH locations, and be counte...
In our streaming jobs, we currently run streaming (cloudFiles format) on a directory with sales transactions coming every 5 minutes.In this directory, the transactions are ordered in the following format:<streaming-checkpoint-root>/<transaction_date>...
Update:Seems that maxFileAge was not a good idea. The following with the option "includeExistingFiles" = False solved my problem:streaming_df = ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", extension) .option("...
I am new to Databricks. Please excuse my ignorance. My requirement is to convert the SQL query below into Databricks SQL. The query comes from EventLog table and the output of the query goes into EventSummaryThese queries can be found hereCREATE TABL...
Thank you @Joseph Kambourakis​ The part that is not clear to me from the how to rework the part circled in the image above. Even this part of the code does not work in databricks:DATEADD(month, DATEDIFF(month, 0, DATEADD(month , 1 , EventStartDateTi...
Databricks customers - nominate your data team and leaders for one (or more) of the six Data Team Award categories: Data Team Transformation AwardData Team for Good AwardData Team Disruptor AwardData Team Democratization AwardData Team Visionary Awar...