cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JoeWMP
by New Contributor III
  • 3594 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks Job ID's increasing in massive sequence gaps

Has anyone seen something like this before? Today around midnight, our Job ID's started increasing in increments of quadrillions - was this a new change to how Job ID's are generated?

  • 3594 Views
  • 5 replies
  • 1 kudos
Latest Reply
JoeWMP
New Contributor III
  • 1 kudos

Thank you Ravi! Glad that this confirms my understanding

  • 1 kudos
4 More Replies
Edmondo
by New Contributor III
  • 6169 Views
  • 7 replies
  • 3 kudos

Resolved! Limiting parallelism when external APIs are invoked (i.e. mlflow)

We are applying a groupby operation to a pyspark.sql.Dataframe and then on each group train a single model for mlflow. We see intermittent failures because the MLFlow server replies with a 429, because of too many requests/sWhat are the best practice...

  • 6169 Views
  • 7 replies
  • 3 kudos
Latest Reply
Edmondo
New Contributor III
  • 3 kudos

To me it's already resolved through professional services. The question I do have is how useful is this community if people with the right background aren't here, and if it takes a month to get a no-answer.

  • 3 kudos
6 More Replies
thushar
by Contributor
  • 4514 Views
  • 5 replies
  • 3 kudos

Resolved! dataframe.rdd.isEmpty() is throwing error in 9.1 LTS

Loaded a csv file with five columns into a dataframe, and then added around 15+ columns using dataframe.withColumn method.After adding these many columns, when I run the query df.rdd.isEmpty() - which throws the below error. org.apache.spark.SparkExc...

  • 4514 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Thushar R​ - Thank you for your patience. We are looking for the best person to help you.

  • 3 kudos
4 More Replies
hari
by Contributor
  • 2468 Views
  • 3 replies
  • 3 kudos

Resolved! Multi-cluster write for delta tables with s3 as the datastore

Does Delta currently support multi-cluster writes to delta table in s3?I see in the data bricks documentation that data bricks doesn't support writing to the same table from multiple spark drivers and thus multiple clusters.But s3Guard was also added...

  • 2468 Views
  • 3 replies
  • 3 kudos
Latest Reply
nastasiya09
New Contributor II
  • 3 kudos

that's really good post for memobdroverizon wifi

  • 3 kudos
2 More Replies
tonykun
by New Contributor
  • 3698 Views
  • 0 replies
  • 0 kudos

A dumb general question - why databricks no support java REPL?

I'm a new student to programming world, have strong interest in data engineering and databricks technology. I've tried this product, the UI, notebook, dbfs are very user-friendly and powerful.Recently, a doubt came to my mind why databricks doesn't s...

  • 3698 Views
  • 0 replies
  • 0 kudos
GMO
by New Contributor III
  • 2687 Views
  • 4 replies
  • 1 kudos

Resolved! Trigger.AvailableOnce in Pyspark?

There’s a new Trigger.AvailableOnce option in runtime 10.1 that we need to process a large folder bit by bit using Autoloader. But I don’t see how to engage this from pyspark.  Is this accessible from scala only or is it available in pyspark? Thanks...

  • 2687 Views
  • 4 replies
  • 1 kudos
Latest Reply
pottsork
New Contributor II
  • 1 kudos

Any update on this issue? I can see that one can use .trigger(availableNow=True) i DBR 10.3 (On Azure Databricks).... Unfortunately I can't get it to work with Autoloader. Is this supported? Additionally, can't find any answers when skimming through ...

  • 1 kudos
3 More Replies
enichante
by New Contributor
  • 2957 Views
  • 4 replies
  • 5 kudos

Resolved! Databricks: Report on SQL queries that are being executed

We have a SQL workspace with a cluster running that services a number of self service reports against a range of datasets. We want to be able to analyse and report on the queries our self service users are executing so we can get better visibility of...

  • 2957 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Looks like the people have spoken: API is your best option! (thanks @Werner Stinckens​  @Chris Grabiel​  and @Bilal Aslam​ !) @eni chante​ Let us know if you have questions about the API! If not, please mark one of the replies above as the "best answ...

  • 5 kudos
3 More Replies
cristianc
by Contributor
  • 4338 Views
  • 2 replies
  • 2 kudos

Resolved! Is VACUUM operation recorded in the history of the delta table?

Greetings,I have tried using Spark with DBR 9.1 LTS to run VACUUM on my delta table then DESCRIBE HISTORY to see the operation, but apparently the VACUUM operation was not in the history despite the things stated in the documentation from: https://do...

  • 4338 Views
  • 2 replies
  • 2 kudos
Latest Reply
cristianc
Contributor
  • 2 kudos

That makes sense, thanks for the reply!

  • 2 kudos
1 More Replies
adnanzak
by New Contributor II
  • 2935 Views
  • 3 replies
  • 0 kudos

Resolved! Deploy Databricks Machine Learing Models On Power BI

Hi Guys. I've implemented a Machine Learning model on Databricks and have registered it with a Model URL. I wanted to enquire if I could use this model on Power BI. Basically the model predicts industries based on client demographics. Ideally I would...

  • 2935 Views
  • 3 replies
  • 0 kudos
Latest Reply
adnanzak
New Contributor II
  • 0 kudos

Thank you @Werner Stinckens​  and @Joseph Kambourakis​  for your replies.

  • 0 kudos
2 More Replies
DarshilDesai
by New Contributor II
  • 13332 Views
  • 1 replies
  • 3 kudos

Resolved! How to Efficiently Read Nested JSON in PySpark?

I am having trouble efficiently reading & parsing in a large number of stream files in Pyspark! Context Here is the schema of the stream file that I am reading in JSON. Blank spaces are edits for confidentiality purposes. root |-- location_info: ar...

  • 13332 Views
  • 1 replies
  • 3 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 3 kudos

I'm interested in seeing what others have come up with. Currently I'm using Json. normalize() then taking any additional nested statements and using a loop to pull them out -> re-combine them.

  • 3 kudos
umair
by New Contributor
  • 2333 Views
  • 1 replies
  • 1 kudos

Resolved! Cannot Reproduce Result scikit-learn random forest

I'm running some machine learning experiments in databricks. For random forest algorithm when i restart the cluster, each time the training output is changes even though random state is set. Anyone has any clue about this issue?Note : I tried the sam...

  • 2333 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

RF is non-deterministic by it´s nature.However as you mentioned you can control this by using random_state.This will guarantee a deterministic result ON A CERTAIN SYSTEM, but not necessarily over systems.SO has a topic about this, check it out, very ...

  • 1 kudos
Anonymous
by Not applicable
  • 2341 Views
  • 1 replies
  • 2 kudos

Issue in creating workspace - Custom AWS Configuration

We have tried to create new workspace using "Custom AWS Configuration" and we have given our own VPC (Customer managed VPC) and tried but workspace failed to launch. We are getting below error which couldn't understand where the issue is in.Workspace...

  • 2341 Views
  • 1 replies
  • 2 kudos
Latest Reply
Mitesh_Patel
New Contributor III
  • 2 kudos

I'm also getting the same issue. I'm trying to create a E2 workspace using Terraform with Customer-managed VPC in us-east-1 (using private subnets for 1a and 1b). We have 1 network rule attached to our subnets that looks like this:  Similar question ...

  • 2 kudos
BasavarajAngadi
by Contributor
  • 3311 Views
  • 7 replies
  • 9 kudos

Resolved! Hi Experts , I am new to databricks. I want to know how to copy pyspark data into databricks SQL analytics ?

If we use two different clusters one for pyspark code for transformation and one for SQL analytics . how to make permenant tables derived from pyspark code make available for running queries in databricks SQL analytics

  • 3311 Views
  • 7 replies
  • 9 kudos
Latest Reply
BasavarajAngadi
Contributor
  • 9 kudos

@Aman Sehgal​  Can we write data from data engineering workspace to SQL end point in databricks?

  • 9 kudos
6 More Replies
Users-all
by New Contributor
  • 2535 Views
  • 0 replies
  • 0 kudos

xml module not found error

ModuleNotFoundError: No module named 'com.databricks.spark.xml'I'm using Azure databricks, and I've added what I think is the correct library, Status InstalledCoordinatecom.databricks:spark-xml_2.12:0.13.0

  • 2535 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels