cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

bzh
by New Contributor
  • 5831 Views
  • 3 replies
  • 3 kudos

Large Data ingestion issue using auto loader

 The goal of this project is to ingest 1000+ files (100MB per file) from S3 into Databricks. Since this will be incremental changes, we are using Autoloader for continued ingestion and transformation using a cluster (i3.xlarge). The current process i...

  • 5831 Views
  • 3 replies
  • 3 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 3 kudos

 There are several possible ways to improve the performance of your Spark streaming job for ingesting a large volume of S3 files. Here are a few suggestions:Tune the spark.sql.shuffle.partitions config parameter:By default, the number of shuffle part...

  • 3 kudos
2 More Replies
elifa
by New Contributor II
  • 3915 Views
  • 3 replies
  • 1 kudos

DLT cloudfiles trigger interval not working

I have the following streaming table definition using cloudfiles format and pipelines.trigger.interval setting to reduce file discovery costs but the query is triggering every 12 seconds instead of every 5 minutes.Is there another configuration I am ...

Data Engineering
autloader
cloudFiles
dlt
trigger
  • 3915 Views
  • 3 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 1 kudos

@elifa Could you check for this message in the log file? INFO EnzymePlanner: Planning for flow: s3_dataAccording to the config pipelines.trigger.interval, the planning should happen once in every 5 minutes. 

  • 1 kudos
2 More Replies
pinaki1
by New Contributor III
  • 3591 Views
  • 3 replies
  • 2 kudos

databricks dashboard

how to download chart directly from databricks dashboard(not sql dashboard). download option is not available there, chart can be only downloaded from notebook

  • 3591 Views
  • 3 replies
  • 2 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 2 kudos

What exactly is your requirement

  • 2 kudos
2 More Replies
Zoumana
by New Contributor II
  • 21413 Views
  • 5 replies
  • 6 kudos

Resolved! How to get probability score for each prediction from mlflow

I trained my model and was able to get the batch prediction from that model as specified below. But I want to also get the probability scores for each prediction. Do you have any idea? Thank you!logged_model = path_to_model# Load model as a PyFuncMod...

  • 21413 Views
  • 5 replies
  • 6 kudos
Latest Reply
OndrejHavlicek
New Contributor III
  • 6 kudos

Now you can log the model using this parameter:mlflow.sklearn.log_model( ..., # the usual params pyfunc_predict_fn="predict_proba" ) which will return probabilities for the first class apparently when using the model for inference (e.g. when...

  • 6 kudos
4 More Replies
Chaitanya_Raju
by Honored Contributor
  • 6311 Views
  • 7 replies
  • 0 kudos
  • 6311 Views
  • 7 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @Ratna Chaitanya Raju Bandaru​Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best? If not, please tell us so we can help you.Thanks!

  • 0 kudos
6 More Replies
CrisCampos
by New Contributor II
  • 5524 Views
  • 1 replies
  • 1 kudos

How to load a "pickle/joblib" file on Databricks

Hi Community, I am trying to load a joblib on Databricks, but doesn't seems to be working.Getting an error message: "Incompatible format detected"  Any idea of how to load this type of file on db?Thanks!

image image
  • 5524 Views
  • 1 replies
  • 1 kudos
Latest Reply
tapash-db
Databricks Employee
  • 1 kudos

You can import joblib/joblibspark package to load joblib files

  • 1 kudos
FerArribas
by Contributor
  • 5539 Views
  • 3 replies
  • 0 kudos

Resolved! Azure Databricks - Difference between protecting the WEB UI with IP Access list or disabling public access?

Hi, Thoroughly investigating the best security practices for accessing the Databricks WEB UI. I have doubts about the difference between protecting the WEB UI with (1) IP Access list (https://learn.microsoft.com/en-us/azure/databricks/security/networ...

  • 5539 Views
  • 3 replies
  • 0 kudos
Latest Reply
Rik
New Contributor III
  • 0 kudos

"In short, would it be the same to configure only the IP of the private endpoint in the IP access list vs disable public access?"The access list doesn't apply to private IPs, only to public IP (internet). Relevant part from the docs:"If you use Priva...

  • 0 kudos
2 More Replies
mbhakta
by New Contributor II
  • 1677 Views
  • 1 replies
  • 0 kudos

Dashboard - get value from table on user click

I'm building a dashboard via Python notebook and trying to allow the end user to click a value on a table, and use the selected value in another query / panel. This somewhat works using widget dropdowns for a user to select which value, but I'd reall...

  • 1677 Views
  • 1 replies
  • 0 kudos
Latest Reply
Henrymartin
New Contributor II
  • 0 kudos

@mbhakta wrote:I'm building a dashboard via Python notebook and trying to allow the end user to click a value on a table, and use the selected value in another query / panel. This somewhat works using widget dropdowns for a user to select which value...

  • 0 kudos
dream
by Contributor
  • 10287 Views
  • 1 replies
  • 2 kudos

Comparing schemas of two dataframes

So I was comparing schemas of two different dataframe using this code: >>> df1.schema == df2.schema Out: False But the thing is, both the schemas are completely equal.When digging deeper I realized that some of the StructFields() that should have bee...

  • 10287 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 2 kudos

Hi @dream ,In this case, you can go with dataframe.dtypes for comparing the schema or datatypes for two dataframeMetadata store information about column properties

  • 2 kudos
PaulStuart
by New Contributor
  • 5244 Views
  • 1 replies
  • 1 kudos

Resolved! "Can't login to databricks socket is closed" when using vsCode Extension

hello there.  I am experiencing a problem using the Databricks Extension with Visual Studio Code, and I wonder if anyone else has experienced this.First, I have installed the databricks cli, and configured some profiles using tokens.  Those profiles ...

  • 5244 Views
  • 1 replies
  • 1 kudos
Latest Reply
nkls
New Contributor III
  • 1 kudos

I finally solved it!I had the same error code as you.Running Databricks Extension v1.1.1, vscode 1.79 on Windows 10.I'm behind a company proxy and the main issue was that vscode didn't have proxy support enabled as default.Adding this to my settings....

  • 1 kudos
piterpan
by New Contributor III
  • 8614 Views
  • 8 replies
  • 11 kudos

Resolved! _sqldf not defined on Azure job cluster v12.2

Since yesterday we have errors in notebooks that were previously working.  NameError: name '_sqldf' is not defined  It was working previously.We are on Azure databricks, usng job pool Driver: Standard_D4s_v5 · Workers: Standard_D4s_v5 · 1-6 workers ·...

Data Engineering
azure
Notebook
pyspark
  • 8614 Views
  • 8 replies
  • 11 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 11 kudos

@piterpan This was a regression issue which impacted the jobs where _sqldf was referenced and the notebook those weren't run interactively. Our Engineering team has fixed this issue yesterday.Could you check whether you are still facing the issue?

  • 11 kudos
7 More Replies
marianopenn
by New Contributor III
  • 19938 Views
  • 6 replies
  • 4 kudos

Resolved! [UDF_MAX_COUNT_EXCEEDED] Exceeded query-wide UDF limit of 5 UDFs

We are using DLT to ingest data into our Unity catalog and then, in a separate job, we are reading and manipulating this data and then writing it to a table like:df.write.saveAsTable(name=target_table_path)We are getting an error which I cannot find ...

Data Engineering
data engineering
dlt
python
udf
Unity Catalog
  • 19938 Views
  • 6 replies
  • 4 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 4 kudos

@AlexPrev You can traverse to the Advanced Settings in the Cluster configuration and include this config in the Spark section.

  • 4 kudos
5 More Replies
Atifdatabricks
by New Contributor II
  • 2128 Views
  • 2 replies
  • 1 kudos

Suspended - Databricks Certified Associate Developer for Apache Spark

During middle of the exam I got suspended. It said due to my eye movement. I had the test on left part of my monitor and pdf (which was provided as a testing aid for this exam) on right side. I was just moving my eyes left and right as I was using PD...

  • 2128 Views
  • 2 replies
  • 1 kudos
Latest Reply
Atifdatabricks
New Contributor II
  • 1 kudos

My request number is 00353935

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels