cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rbauer
by New Contributor
  • 1389 Views
  • 0 replies
  • 0 kudos

Dask-Databricks init script not working

Hello everybody !  I am trying to use the Dask-Databricks distribution (https://github.com/dask-contrib/dask-databricks?tab=readme-ov-file)i set up the required init-script according to the instructions on the Github page and had no problems there, h...

  • 1389 Views
  • 0 replies
  • 0 kudos
MoJaMa
by Databricks Employee
  • 12235 Views
  • 7 replies
  • 2 kudos
  • 12235 Views
  • 7 replies
  • 2 kudos
Latest Reply
User15848365773
New Contributor II
  • 2 kudos

Hi @amitca71 @atanu .. yes you can associate as many vpcs(workspace deployment fundamental) across regions and aws accounts to one single databricks aws account infact its one of the super powers of databricks platform and you can even track all thei...

  • 2 kudos
6 More Replies
alxsbn
by Contributor
  • 1463 Views
  • 0 replies
  • 0 kudos

SELECT issue after an OPTIMIZE operation

I have a strange issue after an OPTIMIZE, no results are returned anymore.I can time travel over the version easily but passed this data nothing when I'm doing a simple SELECT *.But I still got a result when I'm doing a SELECT count(*).How is this po...

  • 1463 Views
  • 0 replies
  • 0 kudos
Bas1
by New Contributor III
  • 17225 Views
  • 16 replies
  • 20 kudos

Resolved! network security for DBFS storage account

In Azure Databricks the DBFS storage account is open to all networks. Changing that to use a private endpoint or minimizing access to selected networks is not allowed.Is there any way to add network security to this storage account? Alternatively, is...

  • 17225 Views
  • 16 replies
  • 20 kudos
Latest Reply
Odee79
New Contributor II
  • 20 kudos

How can we secure the storage account in the managed resource group which holds the DBFS with restricted network access, since access from all networks is blocked by our Azure storage account policy?

  • 20 kudos
15 More Replies
deltax_07
by New Contributor
  • 1901 Views
  • 0 replies
  • 0 kudos

Parse_Syntax_Error Help

i'm getting this error: Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ','.(line 1, pos 18) == SQL == sum(mp4) AS Videos, sum(csv+xlsx) AS Sheets, sum(docx+txt+pdf) AS Docu...

  • 1901 Views
  • 0 replies
  • 0 kudos
alm
by New Contributor III
  • 9806 Views
  • 6 replies
  • 2 kudos

Resolved! How to grant access to views without granting access to underlying tables

I have a medallion architecture: Bronze layer: Raw data in tablesSilver layer: Refined data in views created from the bronze layerGold layer: Data products as views created from the silver layerCurrently I have a data scientist that needs access to d...

  • 9806 Views
  • 6 replies
  • 2 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 2 kudos

Single-user clusters use a different security mode which is the reason for this difference. On single-user/assigned clusters, you'll need the Fine Grained Access Control service (which is a Serverless service) - that is the solution to this problem (...

  • 2 kudos
5 More Replies
Rishitha
by New Contributor III
  • 5012 Views
  • 3 replies
  • 0 kudos

Delta live tables straming

I'm trying to addmonotonicallyIncreasingId() column to a streaming table and I see the following errorFailed to start stream [table_name] in either append mode or complete mode. Append mode error: Expression(s): monotonically_increasing_id() is not s...

  • 5012 Views
  • 3 replies
  • 0 kudos
Latest Reply
Niro
New Contributor II
  • 0 kudos

Is aggregations with row_number() combined with a SQL window function and a watermark still supported in Databricks 14.3?

  • 0 kudos
2 More Replies
Brad
by Contributor II
  • 5962 Views
  • 5 replies
  • 0 kudos

Is there a way to control the cluster runtime version for DLT

Hi team, When I create a DLT job, is there a way to control the cluster runtime version somewhere? E.g. I want to use 14.3 LTS. I tried to add `"spark_version": "14.3.x-scala2.12",` inside cluster default label but not work.Thanks

  • 5962 Views
  • 5 replies
  • 0 kudos
Latest Reply
Brad
Contributor II
  • 0 kudos

Thanks. Got it.And the cluster has to be share mode. Can different DLT jobs share clusters or when DLT job is running, can other people use the cluster? Seems each DLT job running will start a new cluster. If it is not be able to shared, why it has t...

  • 0 kudos
4 More Replies
pjp94
by Contributor
  • 2037 Views
  • 1 replies
  • 0 kudos

pyspark.pandas PandasNotImplementedError

Can someone explain why this below code is throwing an error? My intuition is telling me it's my spark version (3.2.1) but would like confirmation:d = {'key':['a','a','c','d','e','f','g','h'], 'data':[1,2,3,4,5,6,7,8]} x = ps.DataFrame(d) x[x['...

  • 2037 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@pjp94  - The error indicates the pandas pyspark implementation does not have the below method implemented. pd.Series.duplicated() Next steps is to use dataframe methods such as distinct, groupBy, dropDuplicates to resolve this.

  • 0 kudos
User_1611
by New Contributor
  • 2396 Views
  • 1 replies
  • 0 kudos

TimeoutException: Stream Execution thread for stream [xxxxxx]failed to stop within 15000 millisecond

TimeoutException: Stream Execution thread for stream [id = xxx runId = xxxx] failed to stop within 15000 milliseconds (specified by spark.sql.streaming.stopTimeout). See the cause on what was being executed in the streaming query thread.I have a data...

  • 2396 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@User_1611  - could you please try the following ? Reduce the number of streaming queries running on the same clusterMake sure your code does not try to re-trigger/start an active streaming queryMake sure to collect the thread dumps if this error hap...

  • 0 kudos
Shan1
by New Contributor II
  • 6372 Views
  • 5 replies
  • 0 kudos

Read large volume of parquet files

I have 50k + parquet files in the in azure datalake and i have mount point as well. I need to read all the files and load into a dataframe. i have around 2 billion records in total and all the files are not having all the columns, column order may di...

  • 6372 Views
  • 5 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@Shan1 - This could be due to the files have cols that differ by data type.  Eg. Integer vs long , Boolean vs integer. can be resolved by schemaMerge=False. Please refer to this code.  https://github.com/apache/spark/blob/418bba5ad6053449a141f3c9c31e...

  • 0 kudos
4 More Replies
Chandraw
by New Contributor III
  • 3716 Views
  • 2 replies
  • 0 kudos

Resolved! Malformed Input Exception while saving or retreiving Table

Hi everyone,I am using DBR version 13 and Managed tables in a custom catalog location of table is AWS S3.running notebook on single user clusterI am facing MalformedInputException while saving data to Tables or reading it.When I am running my noteboo...

  • 3716 Views
  • 2 replies
  • 0 kudos
Latest Reply
Chandraw
New Contributor III
  • 0 kudos

@Retired_mod  The issue is resolved as soon as I deployed it to mutlinode dev cluster.Issue is only occurring in single user clusters. Looks like limitation of running all updates in one node as distributed system.

  • 0 kudos
1 More Replies
BerkerKozan
by New Contributor III
  • 2973 Views
  • 2 replies
  • 1 kudos

Creating All Purpose Cluster in Data Asset Bundles

There is no resource to create All Purpose Cluster, but I need it, so does it mean I should create it via Terraform or DBX and reference to it, which I dont prefer?

  • 2973 Views
  • 2 replies
  • 1 kudos
Latest Reply
BerkerKozan
New Contributor III
  • 1 kudos

Hello @Ayushi_Suthar, Thanks for the quick reply! Where can I see these requests?https://ideas.databricks.com/ideas/DB-I-9451 ? 

  • 1 kudos
1 More Replies
Andriy
by New Contributor II
  • 8330 Views
  • 2 replies
  • 1 kudos

Get Job Run Status

Is there a way to get a child Job Run status and show the result within the parent notebook execution?Here is the case: I have a master notebook and several child notebooks. As a result, I want to see which notebook failed: For example Notebook job s...

Screenshot 2024-02-06 at 17.41.51.png
  • 8330 Views
  • 2 replies
  • 1 kudos
Latest Reply
BR_DatabricksAI
Contributor III
  • 1 kudos

Hello, Are you also managing any return status while calling the notebook. Have a look the following reference URL : Run a Databricks notebook from another notebook | Databricks on AWS 

  • 1 kudos
1 More Replies
anupam676
by New Contributor II
  • 4187 Views
  • 2 replies
  • 1 kudos

Resolved! How can I enable disk cache in this scenario/

I have a notebook where I read multiple tables from delta lake (let say schema is db) and after that I did some sort of transformation (image enclosed) using all these tables lwith transformations like join,filter etc. After transformation and writin...

  • 4187 Views
  • 2 replies
  • 1 kudos
Latest Reply
anupam676
New Contributor II
  • 1 kudos

Thank you @shan_chandra 

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels