- 2188 Views
- 0 replies
- 0 kudos
i'm getting this error: Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ','.(line 1, pos 18) == SQL == sum(mp4) AS Videos, sum(csv+xlsx) AS Sheets, sum(docx+txt+pdf) AS Docu...
- 2188 Views
- 0 replies
- 0 kudos
by
alm
• Databricks Partner
- 11208 Views
- 6 replies
- 2 kudos
I have a medallion architecture: Bronze layer: Raw data in tablesSilver layer: Refined data in views created from the bronze layerGold layer: Data products as views created from the silver layerCurrently I have a data scientist that needs access to d...
- 11208 Views
- 6 replies
- 2 kudos
Latest Reply
Single-user clusters use a different security mode which is the reason for this difference.
On single-user/assigned clusters, you'll need the Fine Grained Access Control service (which is a Serverless service) - that is the solution to this problem (...
5 More Replies
- 5809 Views
- 3 replies
- 0 kudos
I'm trying to addmonotonicallyIncreasingId() column to a streaming table and I see the following errorFailed to start stream [table_name] in either append mode or complete mode.
Append mode error: Expression(s): monotonically_increasing_id() is not s...
- 5809 Views
- 3 replies
- 0 kudos
Latest Reply
Is aggregations with row_number() combined with a SQL window function and a watermark still supported in Databricks 14.3?
2 More Replies
by
MikeGo
• Valued Contributor
- 7662 Views
- 5 replies
- 0 kudos
Hi team, When I create a DLT job, is there a way to control the cluster runtime version somewhere? E.g. I want to use 14.3 LTS. I tried to add `"spark_version": "14.3.x-scala2.12",` inside cluster default label but not work.Thanks
- 7662 Views
- 5 replies
- 0 kudos
Latest Reply
Thanks. Got it.And the cluster has to be share mode. Can different DLT jobs share clusters or when DLT job is running, can other people use the cluster? Seems each DLT job running will start a new cluster. If it is not be able to shared, why it has t...
4 More Replies
- 2414 Views
- 1 replies
- 0 kudos
Can someone explain why this below code is throwing an error? My intuition is telling me it's my spark version (3.2.1) but would like confirmation:d = {'key':['a','a','c','d','e','f','g','h'],
'data':[1,2,3,4,5,6,7,8]}
x = ps.DataFrame(d)
x[x['...
- 2414 Views
- 1 replies
- 0 kudos
Latest Reply
@pjp94 - The error indicates the pandas pyspark implementation does not have the below method implemented.
pd.Series.duplicated()
Next steps is to use dataframe methods such as distinct, groupBy, dropDuplicates to resolve this.
- 3683 Views
- 1 replies
- 0 kudos
TimeoutException: Stream Execution thread for stream [id = xxx runId = xxxx] failed to stop within 15000 milliseconds (specified by spark.sql.streaming.stopTimeout). See the cause on what was being executed in the streaming query thread.I have a data...
- 3683 Views
- 1 replies
- 0 kudos
Latest Reply
@User_1611 - could you please try the following ?
Reduce the number of streaming queries running on the same clusterMake sure your code does not try to re-trigger/start an active streaming queryMake sure to collect the thread dumps if this error hap...
by
Shan1
• New Contributor II
- 7785 Views
- 5 replies
- 0 kudos
I have 50k + parquet files in the in azure datalake and i have mount point as well. I need to read all the files and load into a dataframe. i have around 2 billion records in total and all the files are not having all the columns, column order may di...
- 7785 Views
- 5 replies
- 0 kudos
Latest Reply
@Shan1 - This could be due to the files have cols that differ by data type. Eg. Integer vs long , Boolean vs integer. can be resolved by schemaMerge=False. Please refer to this code. https://github.com/apache/spark/blob/418bba5ad6053449a141f3c9c31e...
4 More Replies
- 4307 Views
- 2 replies
- 0 kudos
Hi everyone,I am using DBR version 13 and Managed tables in a custom catalog location of table is AWS S3.running notebook on single user clusterI am facing MalformedInputException while saving data to Tables or reading it.When I am running my noteboo...
- 4307 Views
- 2 replies
- 0 kudos
Latest Reply
@Retired_mod The issue is resolved as soon as I deployed it to mutlinode dev cluster.Issue is only occurring in single user clusters. Looks like limitation of running all updates in one node as distributed system.
1 More Replies
- 4763 Views
- 2 replies
- 1 kudos
There is no resource to create All Purpose Cluster, but I need it, so does it mean I should create it via Terraform or DBX and reference to it, which I dont prefer?
- 4763 Views
- 2 replies
- 1 kudos
Latest Reply
Hello @Ayushi_Suthar, Thanks for the quick reply! Where can I see these requests?https://ideas.databricks.com/ideas/DB-I-9451 ?
1 More Replies