cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

FemiAnthony
by New Contributor III
  • 3787 Views
  • 6 replies
  • 5 kudos

Resolved! /dbfs is empty

Why does /dbfs seem to be empty in my Databricks cluster ?If I run %sh ls /dbfsI get no output.I am looking for the databricks-datasets subdirectory ? I can't find it under /dbfs

  • 3787 Views
  • 6 replies
  • 5 kudos
Latest Reply
FemiAnthony
New Contributor III
  • 5 kudos

Thanks @Prabakar Ammeappin​ 

  • 5 kudos
5 More Replies
Sandesh87
by New Contributor III
  • 1351 Views
  • 3 replies
  • 2 kudos

Resolved! log error to cosmos db

Objective:- Retrieve objects from an S3 bucket using a 'get' api call, write the retrieved object to azure datalake and in case of errors like 404s (object not found) write the error message to cosmos DB"my_dataframe" consists of the a column (s3Obje...

  • 1351 Views
  • 3 replies
  • 2 kudos
Latest Reply
User16763506477
Contributor III
  • 2 kudos

Hi @Sandesh Puligundla​  issue is that you are using spark context inside foreachpartition. You can create a dataframe only on the spark driver. Few stack overflow references https://stackoverflow.com/questions/46964250/nullpointerexception-creatin...

  • 2 kudos
2 More Replies
SEOCO
by New Contributor II
  • 2021 Views
  • 3 replies
  • 3 kudos

Passing parameters from DevOps Pipeline/release to DataBricks Notebook

Hi,This is all a bit new to me.Does anybody have any idea how to pass a parameter to the Databricks notebook.I have a DevOps pipeline/release that moves my databricks notebooks towards QA and Production environment. The only problem I am facing is th...

  • 2021 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Mario Walle​ - If @Hubert Dudek​'s answer solved the issue, would you be happy to mark his answer as best so that it will be more visible to other members?

  • 3 kudos
2 More Replies
Jeff_Luecht
by New Contributor II
  • 2526 Views
  • 2 replies
  • 4 kudos

Resolved! Resarting existing community edition clusters

I am new to Databricks community edition. I was following the quckstart guide and running through basic cluster management - create, start, etc. For whatever reason, I cannot restart an e3xisting cluster. There is nothing in the cluster event logs or...

  • 2526 Views
  • 2 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @ Jeff Luecht,Please refresh the event logs. You can clone your cluster.As a Community Edition user, your cluster will automatically terminate after an idle period of two hours.For more configuration options, please upgrade your Databricks subscri...

  • 4 kudos
1 More Replies
Erik
by Valued Contributor II
  • 2132 Views
  • 6 replies
  • 2 kudos

Resolved! Does Z-ordering speed up reading of a single file?

Situation: we have one partion per date, and it just so happens that each partition ends up (after optimize) as *a single* 128mb file. We partition on date, and zorder on userid, and our query is something like "find max value of column A where useri...

  • 2132 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Z-Order will make sure that in case you need to read multiple files, these files are co-located.For a single file this does not matter as a single file is always local to itself.If you are certain that your spark program will only read a single file,...

  • 2 kudos
5 More Replies
Alexander1
by New Contributor III
  • 1989 Views
  • 5 replies
  • 0 kudos

Databricks JDBC 2.6.19 documentation

I am searching for the Databricks JDBC 2.6.19 documentation page. I can find release notes from the Databricks download page (https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/jdbc/2.6.19/docs/release-notes.txt) but on Mag...

  • 1989 Views
  • 5 replies
  • 0 kudos
Latest Reply
Alexander1
New Contributor III
  • 0 kudos

By the way what is still wild, is that the Simba docs say 2.6.16 does only support until Spark 2.4 while the release notes on Databricks download page say 2.6.16 already supports Spark 3.0. Strange that we get contradicting info from the actual driv...

  • 0 kudos
4 More Replies
Anonymous
by Not applicable
  • 349 Views
  • 0 replies
  • 0 kudos

spacecoastdaily.com

This Vigor Now male improvement pill contains still up in the air trimmings that together work on working on your overall prosperity by boosting the levels and production of testosterone in your body. Such extended testosterone creation can certainly...

  • 349 Views
  • 0 replies
  • 0 kudos
Daniel
by New Contributor III
  • 5167 Views
  • 11 replies
  • 6 kudos

Resolved! Autocomplete parentheses, quotation marks, brackets and square stopped working

Hello guys, can someone help me?Autocomplete parentheses, quotation marks, brackets and square stopped working in python notebooks.How can I fix this?Daniel

  • 5167 Views
  • 11 replies
  • 6 kudos
Latest Reply
Daniel
New Contributor III
  • 6 kudos

@Piper Wilson​ , @Werner Stinckens​ Thank you so much for your help.I made the suggestion of the @Jose Gonzalez​ and now it works.

  • 6 kudos
10 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 7596 Views
  • 5 replies
  • 17 kudos

Resolved! Optimize and Vacuum - which is the best order of operations?

Optimize -> VacuumorVacuum -> Optimize

  • 7596 Views
  • 5 replies
  • 17 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 17 kudos

I optimize first as delta lake knows which files are relevant for the optimize. Like that I have my optimized data available faster. Then a vacuum. Seemed logical to me, but I might be wrong. Never actually thought about it

  • 17 kudos
4 More Replies
Constantine
by Contributor III
  • 1319 Views
  • 2 replies
  • 4 kudos

Resolved! Generating Spark SQL query using Python

I have a Spark SQL notebook on DB where I have a sql query likeSELECT * FROM table_name WHERE condition_1 = 'fname' OR condition_1 = 'lname' OR condition_1 = 'mname' AND condition_2 = 'apple' AND condition_3 ='orange'There are a lot ...

  • 1319 Views
  • 2 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @John Constantine​ ,I think you can also use arrays_overlap() for your OR statements docs here

  • 4 kudos
1 More Replies
Braxx
by Contributor II
  • 1554 Views
  • 5 replies
  • 5 kudos

Resolved! Conditionally create a dataframe

I would like to implement a simple logic:if Df1 is empty return Df2 else newDf = Df1.union(Df2)May happened that Df1 is empty and the output is simply: []. In that case I do not need union.I have it like this but getting error when creating datafra...

  • 1554 Views
  • 5 replies
  • 5 kudos
Latest Reply
cconnell
Contributor II
  • 5 kudos

Also try df.head(1).isEmpty

  • 5 kudos
4 More Replies
Braxx
by Contributor II
  • 5098 Views
  • 7 replies
  • 5 kudos

Resolved! Object of type bool_ is not JSON serializable

I am doing a convertion of a data frame to nested dict/json. One of the column called "Problematic__c" is boolean type.For some reason json does not accept this data type retriving error: "Object of type bool_ is not JSON serializable" I need this as...

  • 5098 Views
  • 7 replies
  • 5 kudos
Latest Reply
Braxx
Contributor II
  • 5 kudos

Thanks Dan, that make sens!

  • 5 kudos
6 More Replies
Manoj
by Contributor II
  • 7389 Views
  • 4 replies
  • 8 kudos

Resolved! Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

  • 7389 Views
  • 4 replies
  • 8 kudos
Latest Reply
BilalAslamDbrx
Honored Contributor III
  • 8 kudos

@Manoj Kumar Rayalla​  DBSQL currently limits execution to 10 concurrent queries per cluster so there could be some queuing with 30 concurrent queries. You may want to turn on multi-cluster load balancing to horizontally scale with 1 more cluster for...

  • 8 kudos
3 More Replies
Nick_Hughes
by New Contributor III
  • 1393 Views
  • 3 replies
  • 3 kudos

Is there an alerting API please?

Is there an alerting api so that alerts can be source controlled and automated, please ?https://docs.databricks.com/sql/user/alerts/index.html

  • 1393 Views
  • 3 replies
  • 3 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 3 kudos

Hello @Nick Hughes​ , as of today we do not expose or document the API for these features. I think it will be a useful feature so I created an internal feature request for it (DB-I-4289). If you (or any future readers) want more information on this f...

  • 3 kudos
2 More Replies
William_Scardua
by Valued Contributor
  • 1921 Views
  • 7 replies
  • 2 kudos

How not to reprocess old files without delta ?

Hi guys,​Look that case: Company ACME (hypothetical company)​This company does not use delta, but uses open source Spark to process raw data for .parquet, we have a 'sales' process which consists of receiving every hour a new dataset (.csv) within th...

  • 1921 Views
  • 7 replies
  • 2 kudos
Latest Reply
William_Scardua
Valued Contributor
  • 2 kudos

Hi @Jose Gonzalez​ , ​I agree the best option is to use auto load, but some cases you don`t have the databricks plataform and don`t use delta, i this cases you need build a way to process the new raw files

  • 2 kudos
6 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels