cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bency
by New Contributor III
  • 866 Views
  • 2 replies
  • 1 kudos

How to get the list of parameters passed from widget

Hi ,Could someone help me understand how I would be able to get all the parameters in the task (from the widget). ie I want to get input as parameter 'Start_Date' , but the case is that this will not always be passed . It could be 'Run_Date' as well ...

  • 866 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Bency Mathew​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 1 kudos
1 More Replies
ejloh
by New Contributor II
  • 1487 Views
  • 3 replies
  • 1 kudos

SQL query with leads and lags

I'm trying to create a new column that fills in the nulls below. I tried using leads and lags but isn't turning out right. Basically trying to figure out who is in "possession" of the record, given the TransferFrom and TransferTo columns and sequence...

image image
  • 1487 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi there @Eric Lohbeck​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
2 More Replies
dmayi
by New Contributor
  • 1930 Views
  • 2 replies
  • 1 kudos

Resolved! Setting up custom tags (JobName, JobID, UserId) on an all-purpose cluster

Hi i want to set up custom tags on an all-purpose cluster for purposes of cost break down and chargebacks. What: specifically, i want to capture JobName, JobID, UserId who ran jobI can set other custom tags such as Business Unit, Owner... However,...

  • 1930 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hey there @DIEUDONNE MAYI​ Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
ao1
by New Contributor III
  • 852 Views
  • 2 replies
  • 1 kudos

About privileges that clone a Git repository on Databricks

Hi,​allDo I need admin privileges to clone a Git repository on Databricks?​Cloning was not possible with an account that did not have administrator privileges.​Regards.

  • 852 Views
  • 2 replies
  • 1 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 1 kudos

Navigate to Settings > admin console > Workspace settings > Repos and check the value for "Repos Git URL Allow List permissions".When set to 'Disabled (no restrictions)', users can clone or commit and push to any Git repository.When set to 'Restrict ...

  • 1 kudos
1 More Replies
tanin
by Contributor
  • 2128 Views
  • 8 replies
  • 8 kudos

Using .repartition(100000) causes the unit test to be extremely slow (>20 mins). Is there a way to speed it up?

Here's the code:val result = spark .createDataset(List("test")) .rdd .repartition(100000) .map { _ => "test" } .collect() .toList   println(result)I write tests to test for correctness, so I wonde...

  • 2128 Views
  • 8 replies
  • 8 kudos
Latest Reply
Vidula
Honored Contributor
  • 8 kudos

Hey there @tanin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 8 kudos
7 More Replies
thushar
by Contributor
  • 12213 Views
  • 3 replies
  • 2 kudos

Resolved! Connect to an on-prem SQL server database

Need to connect to an on-prem SQL database to extract data, we are using the Apache Spark SQL connector. The problem is can't able to connect to connection failure SQLServerException: The TCP/IP connection to the host ***.***.X.XX, port 1433 has fail...

  • 12213 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Rob S (Customer)​ , We haven't heard from you on the last response from @Mohit Miglani​ â€‹, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to...

  • 2 kudos
2 More Replies
572509
by New Contributor
  • 954 Views
  • 3 replies
  • 1 kudos

Resolved! Noteboook-scoped env variables?

Is it possible to set environment variables at the notebook level instead of the cluster level? Will they be available in the workers in addition to the driver? Can they override the env variables set at the cluster level?

  • 954 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Martim Lobao​, We haven't heard from you on the last response from @Prabakar​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.Als...

  • 1 kudos
2 More Replies
data_testing1
by New Contributor III
  • 1758 Views
  • 6 replies
  • 6 kudos

Resolved! How much of this tutorial or blog post can I run before starting a cloud instance of databricks?

I'm new to python and databricks so I'm still running tests on features, and not sure how much of this can be run without databricks which I guess requires an AWS or Google cloud account? Can I do all three stages without the AWS databricks or how fa...

  • 1758 Views
  • 6 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @Andrew Schell​, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.@Hubert Dudek​, Thank you for your response.

  • 6 kudos
5 More Replies
RicksDB
by Contributor II
  • 1432 Views
  • 3 replies
  • 3 kudos

Resolved! Maximum job execution per hour

Hi, what is the maximum number of jobs we can execute in an hour for a given workspace?This page mentions 5000https://docs.microsoft.com/en-us/azure/databricks/data-engineering/jobs/jobsThe number of jobs a workspace can create in an hour is limited ...

  • 1432 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @E H​  , We haven’t heard from you on the last response from @Sivaprasad C S​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.

  • 3 kudos
2 More Replies
Tahseen0354
by Contributor III
  • 2112 Views
  • 4 replies
  • 3 kudos

Resolved! Why I am not receiving any mail sent to the Azure AD Group mailbox when databricks job fails ?

I have created an Azure AD Group in "Microsoft 365" type with its own email address, which being added to the Notification of a Databricks Job (on failure). But there is no mail sent to the Azure Group mailbox when the job fails.I am able to send a d...

  • 2112 Views
  • 4 replies
  • 3 kudos
Latest Reply
Vidula
Honored Contributor
  • 3 kudos

Hi @Md Tahseen Anam​ Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best? If not, please tell us so we can help you.Thanks!

  • 3 kudos
3 More Replies
BradSheridan
by Valued Contributor
  • 5310 Views
  • 20 replies
  • 3 kudos

Resolved! Cloudformation error when launching Databricks in AWS

I've seen many posts here in the Community as potential solutions to this error, but none seem to be a solution for us. We are trying to launch the 14 day free trial of Databricks from the AWS Marketplace and are getting the error below. Moreover, ...

  • 5310 Views
  • 20 replies
  • 3 kudos
Latest Reply
BradSheridan
Valued Contributor
  • 3 kudos

Here are some answers:copyObject error - we were using a Databricks provided cloudformation template but this error goes away when we use the AWS provided templatecreateWorkspace error - we had subscribed>unsubscribed>resubscribed to Databricks via t...

  • 3 kudos
19 More Replies
noimeta
by Contributor II
  • 1791 Views
  • 4 replies
  • 1 kudos

Apply change data with delete and schema evolution

Hi,Currently, I'm using structure streaming to insert/update/delete to a table. A row will be deleted if value in 'Operation' column is 'deleted'. Everything seems to work fine until there's a new column.Since I don't need 'Operation' column in the t...

  • 1791 Views
  • 4 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

please go through this documentation https://docs.delta.io/latest/api/python/index.html

  • 1 kudos
3 More Replies
170017
by New Contributor II
  • 970 Views
  • 2 replies
  • 1 kudos

Spark Error when running python script on databricks

I have the following basic script that works fine using pycharm on my machine.from pyspark.sql import SparkSessionprint("START")spark = SparkSession \ .Builder() \ .appName("myapp") \ .master('local[*, 4]') \ .getOrCreate()print(spark)dat...

  • 970 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Patricia Mayer​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
563641
by New Contributor II
  • 344 Views
  • 2 replies
  • 2 kudos

Advanced ML Virtual Training Video from 2022 Summit (not currently accessible)

There does not seem to be a way to log into and view the recent "paid" training sessions from the 2022 Data/AI Summit. I was able to log in and view the videos yesterday, but the website currently posted has no option for logging in/access. Is the...

  • 344 Views
  • 2 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hey there @Christopher Warner​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
1 More Replies
pawelmitrus
by New Contributor III
  • 1802 Views
  • 4 replies
  • 1 kudos

Why Databricks spawns multiple jobs

I have a Delta table spark101.airlines (sourced from `/databricks-datasets/airlines/`) partitioned by `Year`. My `spark.sql.shuffle.partitions` is set to default 200. I run a simple query:select Origin, count(*) from spark101.airlines group by Origi...

image
  • 1802 Views
  • 4 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

Could you please paste the query plan here to analyse the issue

  • 1 kudos
3 More Replies
Labels
Top Kudoed Authors