cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Sandesh87
by New Contributor III
  • 1972 Views
  • 3 replies
  • 2 kudos

Resolved! log error to cosmos db

Objective:- Retrieve objects from an S3 bucket using a 'get' api call, write the retrieved object to azure datalake and in case of errors like 404s (object not found) write the error message to cosmos DB"my_dataframe" consists of the a column (s3Obje...

  • 1972 Views
  • 3 replies
  • 2 kudos
Latest Reply
User16763506477
Contributor III
  • 2 kudos

Hi @Sandesh Puligundla​  issue is that you are using spark context inside foreachpartition. You can create a dataframe only on the spark driver. Few stack overflow references https://stackoverflow.com/questions/46964250/nullpointerexception-creatin...

  • 2 kudos
2 More Replies
UM
by New Contributor II
  • 2484 Views
  • 2 replies
  • 4 kudos

Resolved! Identifying the right tools for the job

Hi all, thank you for taking the time to attend to my post. A background to preface, my team and I have been prototyping an ML model that we would like to push into the production and deployment phase. We have been prototyping on Jupyter Notebooks bu...

untitled
  • 2484 Views
  • 2 replies
  • 4 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 4 kudos

For production model serving, why not just use MLflow Model Serving? You just code it up/import it with the notebooks, then Log it using MLflow, then Register it with the MLflow Registry, then there is a nice UI to serve it using Model Serving. It wi...

  • 4 kudos
1 More Replies
Mohit_m
by Valued Contributor II
  • 1211 Views
  • 1 replies
  • 4 kudos

Enabling of Task Orchestration feature in Jobs via API as well Databricks supports the ability to orchestrate multiple tasks within a job. You must en...

Enabling of Task Orchestration feature in Jobs via API as wellDatabricks supports the ability to orchestrate multiple tasks within a job. You must enable this feature in the admin console. Once enabled, this feature cannot be disabled. To enable orch...

  • 1211 Views
  • 1 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

@Mohit Miglani​ this will be really helpful for those who prefer CLI / API more than the UI.

  • 4 kudos
SEOCO
by New Contributor II
  • 2900 Views
  • 3 replies
  • 3 kudos

Passing parameters from DevOps Pipeline/release to DataBricks Notebook

Hi,This is all a bit new to me.Does anybody have any idea how to pass a parameter to the Databricks notebook.I have a DevOps pipeline/release that moves my databricks notebooks towards QA and Production environment. The only problem I am facing is th...

  • 2900 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Mario Walle​ - If @Hubert Dudek​'s answer solved the issue, would you be happy to mark his answer as best so that it will be more visible to other members?

  • 3 kudos
2 More Replies
Anonymous
by Not applicable
  • 1911 Views
  • 0 replies
  • 0 kudos

On-prem DNS - entries to be added

Hello,I have a question about DNS addresses Databricks uses.In our network configuration, we are using custom VNet injection with no public IPs, and are required to use on-premises corporate DNS.Therefore we would like to add necessary entries to on-...

  • 1911 Views
  • 0 replies
  • 0 kudos
Erik
by Valued Contributor III
  • 3439 Views
  • 4 replies
  • 2 kudos

Resolved! Does Z-ordering speed up reading of a single file?

Situation: we have one partion per date, and it just so happens that each partition ends up (after optimize) as *a single* 128mb file. We partition on date, and zorder on userid, and our query is something like "find max value of column A where useri...

  • 3439 Views
  • 4 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Z-Order will make sure that in case you need to read multiple files, these files are co-located.For a single file this does not matter as a single file is always local to itself.If you are certain that your spark program will only read a single file,...

  • 2 kudos
3 More Replies
Alexander1
by New Contributor III
  • 2854 Views
  • 4 replies
  • 0 kudos

Databricks JDBC 2.6.19 documentation

I am searching for the Databricks JDBC 2.6.19 documentation page. I can find release notes from the Databricks download page (https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/jdbc/2.6.19/docs/release-notes.txt) but on Mag...

  • 2854 Views
  • 4 replies
  • 0 kudos
Latest Reply
Alexander1
New Contributor III
  • 0 kudos

By the way what is still wild, is that the Simba docs say 2.6.16 does only support until Spark 2.4 while the release notes on Databricks download page say 2.6.16 already supports Spark 3.0. Strange that we get contradicting info from the actual driv...

  • 0 kudos
3 More Replies
Jeff_Luecht
by New Contributor II
  • 3223 Views
  • 1 replies
  • 2 kudos

Resarting existing community edition clusters

I am new to Databricks community edition. I was following the quckstart guide and running through basic cluster management - create, start, etc. For whatever reason, I cannot restart an e3xisting cluster. There is nothing in the cluster event logs or...

  • 3223 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Community free edition is quite limited so it can be the reason.

  • 2 kudos
Anonymous
by Not applicable
  • 552 Views
  • 0 replies
  • 0 kudos

spacecoastdaily.com

This Vigor Now male improvement pill contains still up in the air trimmings that together work on working on your overall prosperity by boosting the levels and production of testosterone in your body. Such extended testosterone creation can certainly...

  • 552 Views
  • 0 replies
  • 0 kudos
Daniel
by New Contributor III
  • 8442 Views
  • 11 replies
  • 6 kudos

Resolved! Autocomplete parentheses, quotation marks, brackets and square stopped working

Hello guys, can someone help me?Autocomplete parentheses, quotation marks, brackets and square stopped working in python notebooks.How can I fix this?Daniel

  • 8442 Views
  • 11 replies
  • 6 kudos
Latest Reply
Daniel
New Contributor III
  • 6 kudos

@Piper Wilson​ , @Werner Stinckens​ Thank you so much for your help.I made the suggestion of the @Jose Gonzalez​ and now it works.

  • 6 kudos
10 More Replies
Constantine
by Contributor III
  • 1875 Views
  • 2 replies
  • 4 kudos

Resolved! Generating Spark SQL query using Python

I have a Spark SQL notebook on DB where I have a sql query likeSELECT * FROM table_name WHERE condition_1 = 'fname' OR condition_1 = 'lname' OR condition_1 = 'mname' AND condition_2 = 'apple' AND condition_3 ='orange'There are a lot ...

  • 1875 Views
  • 2 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @John Constantine​ ,I think you can also use arrays_overlap() for your OR statements docs here

  • 4 kudos
1 More Replies
Braxx
by Contributor II
  • 2434 Views
  • 4 replies
  • 5 kudos

Resolved! Conditionally create a dataframe

I would like to implement a simple logic:if Df1 is empty return Df2 else newDf = Df1.union(Df2)May happened that Df1 is empty and the output is simply: []. In that case I do not need union.I have it like this but getting error when creating datafra...

  • 2434 Views
  • 4 replies
  • 5 kudos
Latest Reply
cconnell
Contributor II
  • 5 kudos

Also try df.head(1).isEmpty

  • 5 kudos
3 More Replies
Vaibhav1000
by New Contributor II
  • 3687 Views
  • 2 replies
  • 1 kudos
  • 3687 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

@Vaibhav Gour​ , It kinda depends on the case:if there are no workers available when your job starts, you get an error. As the cluster is unable to start so code cannot be executed. But this is not an autoscale issue.If you need to scale up, but for ...

  • 1 kudos
1 More Replies
Braxx
by Contributor II
  • 7347 Views
  • 6 replies
  • 4 kudos

Resolved! Object of type bool_ is not JSON serializable

I am doing a convertion of a data frame to nested dict/json. One of the column called "Problematic__c" is boolean type.For some reason json does not accept this data type retriving error: "Object of type bool_ is not JSON serializable" I need this as...

  • 7347 Views
  • 6 replies
  • 4 kudos
Latest Reply
Braxx
Contributor II
  • 4 kudos

Thanks Dan, that make sens!

  • 4 kudos
5 More Replies
Manoj
by Contributor II
  • 9498 Views
  • 4 replies
  • 8 kudos

Resolved! Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

  • 9498 Views
  • 4 replies
  • 8 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 8 kudos

@Manoj Kumar Rayalla​  DBSQL currently limits execution to 10 concurrent queries per cluster so there could be some queuing with 30 concurrent queries. You may want to turn on multi-cluster load balancing to horizontally scale with 1 more cluster for...

  • 8 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels