cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mumrel
by Contributor
  • 1365 Views
  • 2 replies
  • 2 kudos

Resolved! Error 95 when importing one Notebook into another

When I follow the instructions Modularize your code using files I get the following error:I am on azure, use DBRT 12.2 LTS, use ADLS as storage, I am happy to provide more details if needed. My research suggest that the reason is that the dfbs fuse...

image
  • 1365 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

import works for .py files..%run is for notebooks.is lib a .py file or a notebook?

  • 2 kudos
1 More Replies
Thijs
by New Contributor III
  • 1477 Views
  • 3 replies
  • 4 kudos

How do I define & run jobs that execute scripts that are copied inside a custom DataBricks container?

Hi all, we are building custom Databricks containers (https://docs.databricks.com/clusters/custom-containers.html). During the container build process we install dependencies and also python source code scripts. We now want to run some of these scrip...

  • 1477 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Thijs van den Berg​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 4 kudos
2 More Replies
frank7
by New Contributor II
  • 1865 Views
  • 2 replies
  • 1 kudos

Resolved! Is it possible to write a pyspark dataframe to a custom log table in Log Analytics workspace?

I have a pyspark dataframe that contains information about the tables that I have on sql database (creation date, number of rows, etc)Sample data: { "Day":"2023-04-28", "Environment":"dev", "DatabaseName":"default", "TableName":"discount"...

  • 1865 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Bruno Simoes​ :Yes, it is possible to write a PySpark DataFrame to a custom log table in Log Analytics workspace using the Azure Log Analytics Workspace API.Here's a high-level overview of the steps you can follow:Create an Azure Log Analytics Works...

  • 1 kudos
1 More Replies
annagriv
by New Contributor II
  • 1453 Views
  • 3 replies
  • 4 kudos

Resolved! How to get git commit ID of the repository the script runs on?

I have a script in a repository on DataBricks. The script should log the current git commit ID of the repository. How can that be implemented? I tried various command, for example: result = subprocess.run(['git', 'rev-parse', 'HEAD'], stdout=subproce...

  • 1453 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

I think this is because of the fact that the code and execution (clusters) are separated.

  • 4 kudos
2 More Replies
Long
by New Contributor
  • 707 Views
  • 1 replies
  • 0 kudos

Connecting to Azure SQL database using R in Databricks and Azure Key Vaults

I'm trying to connect to an Azure SQL database using R in Databricks. I want to read the credentials stored in Azure secret key vaults rather than hard coding in R code. I've seen some examples of it being done in Scala, however i'm after a R solutio...

  • 707 Views
  • 1 replies
  • 0 kudos
Latest Reply
ArturoNuor
New Contributor III
  • 0 kudos

Did you find a solution for this @Long Pham​ ?? I am having the same issue

  • 0 kudos
Chalki
by New Contributor III
  • 2126 Views
  • 2 replies
  • 4 kudos

Resolved! Delta Table Merge statement is not accepting broadcast hint

I have a statement like this with pyspark:target_tbl.alias("target")\            .merge(stage_df.hint("broadcast").alias("source"), merge_join_expr)\                .whenMatchedUpdateAll()\                .whenNotMatchedInsertAll()\                .w...

  • 2126 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Nikolay Chalkanov​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 4 kudos
1 More Replies
720677
by New Contributor III
  • 6799 Views
  • 2 replies
  • 0 kudos

S3 write to bucket - best performance tips

I'm writing big dataframes into deltas in s3 buckets. df.write\ .format("delta")\ .mode("append")\ .partitionBy(partitionColumns)\ .option("mergeSchema", "true")\ .save(target...

  • 6799 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Pablo (Ariel)​ :There are several ways to improve the performance of writing data to S3 using Spark. Here are some tips and recommendations:Increase the size of the write buffer: By default, Spark writes data in 1 MB batches. You can increase the si...

  • 0 kudos
1 More Replies
BenLambert
by Contributor
  • 1664 Views
  • 1 replies
  • 0 kudos

How to deal with deleted files in source directory in DLT?

We have a DLT pipeline that uses the autoloader to detect files added to a source storage bucket. It reads these updated files and adds new records to a bronze streaming table. However we would also like to automatically delete records from the bronz...

  • 1664 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Bennett Lambert​ :Yes, it is possible to automatically delete records from the bronze table when a source file is deleted, without doing a full refresh. One way to achieve this is by using the Change Data Capture (CDC) feature in Databricks Delta.CD...

  • 0 kudos
WillHeyer
by New Contributor II
  • 2274 Views
  • 1 replies
  • 2 kudos

Resolved! Best Practices for PowerBI Connectivity w/ Partner Connect. Access Token w/ Service Principal, Databricks Username w/ Service account, or OAuth?

I'm aware all are possible methods but are all equal? Or is the matter trivial? Thank you so much!

  • 2274 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Will Heyer​ :The best method for Power BI connectivity with Partner Connect depends on your specific use case and requirements. Here are some factors to consider for each method:Access Token with Service Principal: This method uses a client ID and s...

  • 2 kudos
DavideCagnoni
by Contributor
  • 4639 Views
  • 1 replies
  • 4 kudos

Resolved! How to use multi-cursor and rectangular selection for notebooks and query editor in Linux ?

The documentation explains how to use multicursor in notebooks. However, it only says it for Windows and MacOS. The Windows way would work in Linux (Ubuntu) up to a few days ago but it does not work now anymore.

  • 4639 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Davide Cagnoni​ :Multicursor support in Databricks notebooks is implemented using the Ace editor, which is a web-based code editor. Therefore, the behavior of multicursor support may depend on the specific browser and operating system you are using....

  • 4 kudos
vnc001
by New Contributor
  • 1033 Views
  • 1 replies
  • 1 kudos

Resolved! Clusters API 2.0 - Unable to execute cluster events api

Details: I keep getting "Missing required field: cluster_id" even though you can see it is supplied. Is this a bug? or I am missing something? I am testing this in postman. Error: {"error_code":"INVALID_PARAMETER_VALUE","message":"Missing required fi...

image
  • 1033 Views
  • 1 replies
  • 1 kudos
Latest Reply
SUMI1
New Contributor III
  • 1 kudos

Hi guysI'm sorry to hear that the Clusters API 2.0 and cluster event execution are giving you trouble. I advise getting in touch with the support staff for guidance on quickly fixing the problem.

  • 1 kudos
Phani1
by Valued Contributor
  • 903 Views
  • 1 replies
  • 1 kudos

Resolved! DLT best practices

Hi Team,Could you please recommend the best practices to implement the delta live tables?Regards,Phanindra

  • 903 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 1 kudos

Hi Phani, what exactly are you looking for with best practices? At a high level:Always provide an external storage location (S3, ADLS, GCS) for your pipelineUse Auto Scaling! Python imports can be leverage to reuse code With regards to providing a st...

  • 1 kudos
NOOR_BASHASHAIK
by Contributor
  • 1657 Views
  • 1 replies
  • 2 kudos

Resolved! Azure Databricks PATs expire even before validity

Hi all,we have this issue in our environment - even thought we give 365 days validity for Databricks PATS generation, the PATs expire every now and then. Is there any problem with the command we use : curl --location --request POST 'https://<<HOST_NA...

  • 1657 Views
  • 1 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@NOOR BASHA SHAIK​ It looks you are providing 365 days, can you please post your response. if you won't provide any lifetime then it should be valid indefinitely. can you please add 90 days validity and test

  • 2 kudos
Chinu
by New Contributor III
  • 577 Views
  • 1 replies
  • 1 kudos

API to get Databricks Status AWS.

Hi, Do you have an api endpoint to call to get the databricks status for AWS?Thanks,

  • 577 Views
  • 1 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@Chinu Lee​ you have webhook/slack that can be used to fetch status https://docs.databricks.com/resources/status.html#webhookare you specifically looking for your account workspace/above one

  • 1 kudos
marcin-sg
by New Contributor III
  • 884 Views
  • 1 replies
  • 2 kudos

Create (account wide) groups without account admin permissions

The use case is quite simple: each environment - databricks workspace (prod, test, dev) will be created by a separate service principal (which for isolation purpose should not have account wide admin permission) with terraform, but will belong to the...

  • 884 Views
  • 1 replies
  • 2 kudos
Latest Reply
marcin-sg
New Contributor III
  • 2 kudos

Another thing would be also to assign workspace to a metastore without account admin permission - for similar reason.

  • 2 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels