cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Alexander1
by New Contributor III
  • 4139 Views
  • 4 replies
  • 0 kudos

Databricks JDBC 2.6.19 documentation

I am searching for the Databricks JDBC 2.6.19 documentation page. I can find release notes from the Databricks download page (https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/jdbc/2.6.19/docs/release-notes.txt) but on Mag...

  • 4139 Views
  • 4 replies
  • 0 kudos
Latest Reply
Alexander1
New Contributor III
  • 0 kudos

By the way what is still wild, is that the Simba docs say 2.6.16 does only support until Spark 2.4 while the release notes on Databricks download page say 2.6.16 already supports Spark 3.0. Strange that we get contradicting info from the actual driv...

  • 0 kudos
3 More Replies
Jeff_Luecht
by New Contributor II
  • 4203 Views
  • 1 replies
  • 2 kudos

Resarting existing community edition clusters

I am new to Databricks community edition. I was following the quckstart guide and running through basic cluster management - create, start, etc. For whatever reason, I cannot restart an e3xisting cluster. There is nothing in the cluster event logs or...

  • 4203 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

Community free edition is quite limited so it can be the reason.

  • 2 kudos
Anonymous
by Not applicable
  • 923 Views
  • 0 replies
  • 0 kudos

spacecoastdaily.com

This Vigor Now male improvement pill contains still up in the air trimmings that together work on working on your overall prosperity by boosting the levels and production of testosterone in your body. Such extended testosterone creation can certainly...

  • 923 Views
  • 0 replies
  • 0 kudos
Daniel
by New Contributor III
  • 15268 Views
  • 11 replies
  • 6 kudos

Resolved! Autocomplete parentheses, quotation marks, brackets and square stopped working

Hello guys, can someone help me?Autocomplete parentheses, quotation marks, brackets and square stopped working in python notebooks.How can I fix this?Daniel

  • 15268 Views
  • 11 replies
  • 6 kudos
Latest Reply
Daniel
New Contributor III
  • 6 kudos

@Piper Wilson​ , @Werner Stinckens​ Thank you so much for your help.I made the suggestion of the @Jose Gonzalez​ and now it works.

  • 6 kudos
10 More Replies
Constantine
by Contributor III
  • 3807 Views
  • 2 replies
  • 4 kudos

Resolved! Generating Spark SQL query using Python

I have a Spark SQL notebook on DB where I have a sql query likeSELECT * FROM table_name WHERE condition_1 = 'fname' OR condition_1 = 'lname' OR condition_1 = 'mname' AND condition_2 = 'apple' AND condition_3 ='orange'There are a lot ...

  • 3807 Views
  • 2 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @John Constantine​ ,I think you can also use arrays_overlap() for your OR statements docs here

  • 4 kudos
1 More Replies
Braxx
by Contributor II
  • 3631 Views
  • 4 replies
  • 5 kudos

Resolved! Conditionally create a dataframe

I would like to implement a simple logic:if Df1 is empty return Df2 else newDf = Df1.union(Df2)May happened that Df1 is empty and the output is simply: []. In that case I do not need union.I have it like this but getting error when creating datafra...

  • 3631 Views
  • 4 replies
  • 5 kudos
Latest Reply
cconnell
Contributor II
  • 5 kudos

Also try df.head(1).isEmpty

  • 5 kudos
3 More Replies
Vaibhav1000
by New Contributor II
  • 5553 Views
  • 2 replies
  • 1 kudos
  • 5553 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

@Vaibhav Gour​ , It kinda depends on the case:if there are no workers available when your job starts, you get an error. As the cluster is unable to start so code cannot be executed. But this is not an autoscale issue.If you need to scale up, but for ...

  • 1 kudos
1 More Replies
Braxx
by Contributor II
  • 11281 Views
  • 6 replies
  • 4 kudos

Resolved! Object of type bool_ is not JSON serializable

I am doing a convertion of a data frame to nested dict/json. One of the column called "Problematic__c" is boolean type.For some reason json does not accept this data type retriving error: "Object of type bool_ is not JSON serializable" I need this as...

  • 11281 Views
  • 6 replies
  • 4 kudos
Latest Reply
Braxx
Contributor II
  • 4 kudos

Thanks Dan, that make sens!

  • 4 kudos
5 More Replies
Manoj
by Contributor II
  • 11699 Views
  • 4 replies
  • 8 kudos

Resolved! Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

  • 11699 Views
  • 4 replies
  • 8 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 8 kudos

@Manoj Kumar Rayalla​  DBSQL currently limits execution to 10 concurrent queries per cluster so there could be some queuing with 30 concurrent queries. You may want to turn on multi-cluster load balancing to horizontally scale with 1 more cluster for...

  • 8 kudos
3 More Replies
Nick_Hughes
by New Contributor III
  • 3069 Views
  • 3 replies
  • 3 kudos

Is there an alerting API please?

Is there an alerting api so that alerts can be source controlled and automated, please ?https://docs.databricks.com/sql/user/alerts/index.html

  • 3069 Views
  • 3 replies
  • 3 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 3 kudos

Hello @Nick Hughes​ , as of today we do not expose or document the API for these features. I think it will be a useful feature so I created an internal feature request for it (DB-I-4289). If you (or any future readers) want more information on this f...

  • 3 kudos
2 More Replies
William_Scardua
by Valued Contributor
  • 4700 Views
  • 6 replies
  • 2 kudos

How not to reprocess old files without delta ?

Hi guys,​Look that case: Company ACME (hypothetical company)​This company does not use delta, but uses open source Spark to process raw data for .parquet, we have a 'sales' process which consists of receiving every hour a new dataset (.csv) within th...

  • 4700 Views
  • 6 replies
  • 2 kudos
Latest Reply
William_Scardua
Valued Contributor
  • 2 kudos

Hi @Jose Gonzalez​ , ​I agree the best option is to use auto load, but some cases you don`t have the databricks plataform and don`t use delta, i this cases you need build a way to process the new raw files

  • 2 kudos
5 More Replies
kaslan
by New Contributor II
  • 10498 Views
  • 5 replies
  • 0 kudos

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want to filter them out, preferably in the stream itself rather than using a filter operation.A...

  • 10498 Views
  • 5 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

According to the docs you linked, the glob filter on input-path only works on directories, not on the files themselves.So if you want to filter on certain files in the concerning dirs, you can include an additional filter through the pathGlobFilter o...

  • 0 kudos
4 More Replies
HamzaJosh
by New Contributor II
  • 17575 Views
  • 6 replies
  • 3 kudos

I want to use databricks workers to run a function in parallel on the worker nodes

I have a function making api calls. I want to run this function in parallel so I can use the workers in databricks clusters to run it in parallel. I have tried with ThreadPoolExecutor() as executor: results = executor.map(getspeeddata, alist)to run m...

  • 17575 Views
  • 6 replies
  • 3 kudos
Latest Reply
HamzaJosh
New Contributor II
  • 3 kudos

You guys are not getting the point, I am making API calls in a function and want to store the results in a dataframe. I want multiple processes to run this task in parallel. How do I create a UDF and use it in a dataframe when the task is calling an ...

  • 3 kudos
5 More Replies
sarosh
by New Contributor
  • 20264 Views
  • 2 replies
  • 1 kudos

ModuleNotFoundError / SerializationError when executing over databricks-connect

I am running into the following error when I run a model fitting process over databricks-connect.It looks like worker nodes are unable to access modules from the project's parent directory. Note that the program runs successfully up to this point; n...

modulenotfoundanno
  • 20264 Views
  • 2 replies
  • 1 kudos
Latest Reply
Manjunath
Databricks Employee
  • 1 kudos

@Sarosh Ahmad​ , Could you try adding the zip of the module to the addPyFile like belowspark.sparkContext.addPyFile("src.zip")

  • 1 kudos
1 More Replies
Labels