cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SailajaB
by Valued Contributor III
  • 22375 Views
  • 4 replies
  • 4 kudos

Unable to mount the blob storage account as soft delete got enabled

Hi Team,when we try to mount or access the blob storage where soft delete enabled. But it is getting failed with below errororg.apache.hadoop.fs.FileAlreadyExistsException: Operation failed: "This endpoint does not support BlobStorageEvents or So...

  • 22375 Views
  • 4 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Jeez, I was planning on enabling soft delete on our adls gen2, but I think I will wait a while after reading this.

  • 4 kudos
3 More Replies
JoeWMP
by New Contributor III
  • 7127 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks Job ID's increasing in massive sequence gaps

Has anyone seen something like this before? Today around midnight, our Job ID's started increasing in increments of quadrillions - was this a new change to how Job ID's are generated?

  • 7127 Views
  • 5 replies
  • 1 kudos
Latest Reply
JoeWMP
New Contributor III
  • 1 kudos

Thank you Ravi! Glad that this confirms my understanding

  • 1 kudos
4 More Replies
Edmondo
by New Contributor III
  • 9717 Views
  • 7 replies
  • 3 kudos

Resolved! Limiting parallelism when external APIs are invoked (i.e. mlflow)

We are applying a groupby operation to a pyspark.sql.Dataframe and then on each group train a single model for mlflow. We see intermittent failures because the MLFlow server replies with a 429, because of too many requests/s   What are the best pract...

  • 9717 Views
  • 7 replies
  • 3 kudos
Latest Reply
Edmondo
New Contributor III
  • 3 kudos

To me it's already resolved through professional services. The question I do have is how useful is this community if people with the right background aren't here, and if it takes a month to get a no-answer.

  • 3 kudos
6 More Replies
thushar
by Contributor
  • 6851 Views
  • 5 replies
  • 3 kudos

Resolved! dataframe.rdd.isEmpty() is throwing error in 9.1 LTS

Loaded a csv file with five columns into a dataframe, and then added around 15+ columns using dataframe.withColumn method.After adding these many columns, when I run the query df.rdd.isEmpty() - which throws the below error. org.apache.spark.SparkExc...

  • 6851 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Thushar R​ - Thank you for your patience. We are looking for the best person to help you.

  • 3 kudos
4 More Replies
hari
by Contributor
  • 3859 Views
  • 3 replies
  • 3 kudos

Resolved! Multi-cluster write for delta tables with s3 as the datastore

Does Delta currently support multi-cluster writes to delta table in s3?I see in the data bricks documentation that data bricks doesn't support writing to the same table from multiple spark drivers and thus multiple clusters.But s3Guard was also added...

  • 3859 Views
  • 3 replies
  • 3 kudos
Latest Reply
nastasiya09
New Contributor II
  • 3 kudos

that's really good post for memobdroverizon wifi

  • 3 kudos
2 More Replies
tonykun
by New Contributor
  • 4849 Views
  • 0 replies
  • 0 kudos

A dumb general question - why databricks no support java REPL?

I'm a new student to programming world, have strong interest in data engineering and databricks technology. I've tried this product, the UI, notebook, dbfs are very user-friendly and powerful.Recently, a doubt came to my mind why databricks doesn't s...

  • 4849 Views
  • 0 replies
  • 0 kudos
GMO
by New Contributor III
  • 3890 Views
  • 4 replies
  • 1 kudos

Resolved! Trigger.AvailableOnce in Pyspark?

There’s a new Trigger.AvailableOnce option in runtime 10.1 that we need to process a large folder bit by bit using Autoloader. But I don’t see how to engage this from pyspark.  Is this accessible from scala only or is it available in pyspark? Thanks...

  • 3890 Views
  • 4 replies
  • 1 kudos
Latest Reply
pottsork
New Contributor II
  • 1 kudos

Any update on this issue? I can see that one can use .trigger(availableNow=True) i DBR 10.3 (On Azure Databricks).... Unfortunately I can't get it to work with Autoloader. Is this supported? Additionally, can't find any answers when skimming through ...

  • 1 kudos
3 More Replies
enichante
by New Contributor
  • 4829 Views
  • 4 replies
  • 5 kudos

Resolved! Databricks: Report on SQL queries that are being executed

We have a SQL workspace with a cluster running that services a number of self service reports against a range of datasets. We want to be able to analyse and report on the queries our self service users are executing so we can get better visibility of...

  • 4829 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Looks like the people have spoken: API is your best option! (thanks @Werner Stinckens​  @Chris Grabiel​  and @Bilal Aslam​ !) @eni chante​ Let us know if you have questions about the API! If not, please mark one of the replies above as the "best answ...

  • 5 kudos
3 More Replies
cristianc
by Contributor
  • 6472 Views
  • 2 replies
  • 2 kudos

Resolved! Is VACUUM operation recorded in the history of the delta table?

Greetings,I have tried using Spark with DBR 9.1 LTS to run VACUUM on my delta table then DESCRIBE HISTORY to see the operation, but apparently the VACUUM operation was not in the history despite the things stated in the documentation from: https://do...

  • 6472 Views
  • 2 replies
  • 2 kudos
Latest Reply
cristianc
Contributor
  • 2 kudos

That makes sense, thanks for the reply!

  • 2 kudos
1 More Replies
adnanzak
by New Contributor II
  • 4216 Views
  • 3 replies
  • 0 kudos

Resolved! Deploy Databricks Machine Learing Models On Power BI

Hi Guys. I've implemented a Machine Learning model on Databricks and have registered it with a Model URL. I wanted to enquire if I could use this model on Power BI. Basically the model predicts industries based on client demographics. Ideally I would...

  • 4216 Views
  • 3 replies
  • 0 kudos
Latest Reply
adnanzak
New Contributor II
  • 0 kudos

Thank you @Werner Stinckens​  and @Joseph Kambourakis​  for your replies.

  • 0 kudos
2 More Replies
DarshilDesai
by New Contributor II
  • 15692 Views
  • 1 replies
  • 3 kudos

Resolved! How to Efficiently Read Nested JSON in PySpark?

I am having trouble efficiently reading & parsing in a large number of stream files in Pyspark! Context Here is the schema of the stream file that I am reading in JSON. Blank spaces are edits for confidentiality purposes. root |-- location_info: ar...

  • 15692 Views
  • 1 replies
  • 3 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 3 kudos

I'm interested in seeing what others have come up with. Currently I'm using Json. normalize() then taking any additional nested statements and using a loop to pull them out -> re-combine them.

  • 3 kudos
umair
by New Contributor
  • 3390 Views
  • 1 replies
  • 1 kudos

Resolved! Cannot Reproduce Result scikit-learn random forest

I'm running some machine learning experiments in databricks. For random forest algorithm when i restart the cluster, each time the training output is changes even though random state is set. Anyone has any clue about this issue?Note : I tried the sam...

  • 3390 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

RF is non-deterministic by it´s nature.However as you mentioned you can control this by using random_state.This will guarantee a deterministic result ON A CERTAIN SYSTEM, but not necessarily over systems.SO has a topic about this, check it out, very ...

  • 1 kudos
Anonymous
by Not applicable
  • 3573 Views
  • 1 replies
  • 2 kudos

Issue in creating workspace - Custom AWS Configuration

We have tried to create new workspace using "Custom AWS Configuration" and we have given our own VPC (Customer managed VPC) and tried but workspace failed to launch. We are getting below error which couldn't understand where the issue is in.Workspace...

  • 3573 Views
  • 1 replies
  • 2 kudos
Latest Reply
Mitesh_Patel
New Contributor III
  • 2 kudos

I'm also getting the same issue. I'm trying to create a E2 workspace using Terraform with Customer-managed VPC in us-east-1 (using private subnets for 1a and 1b). We have 1 network rule attached to our subnets that looks like this:  Similar question ...

  • 2 kudos
BasavarajAngadi
by Contributor
  • 5582 Views
  • 7 replies
  • 9 kudos

Resolved! Hi Experts , I am new to databricks. I want to know how to copy pyspark data into databricks SQL analytics ?

If we use two different clusters one for pyspark code for transformation and one for SQL analytics . how to make permenant tables derived from pyspark code make available for running queries in databricks SQL analytics

  • 5582 Views
  • 7 replies
  • 9 kudos
Latest Reply
BasavarajAngadi
Contributor
  • 9 kudos

@Aman Sehgal​  Can we write data from data engineering workspace to SQL end point in databricks?

  • 9 kudos
6 More Replies
Labels