cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Meaz10
by New Contributor III
  • 2192 Views
  • 3 replies
  • 2 kudos

Resolved! Current DBR is not yet available to this notebook

Any one has an idea why i am getting this error:"The current DBR is not yet available to this notebook. Give it a second and try again!"

  • 2192 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Meysam az​ - Thank you for letting us know that the issue has been resolved and for the extra information.

  • 2 kudos
2 More Replies
SreedharVengala
by New Contributor III
  • 12198 Views
  • 2 replies
  • 1 kudos

Parsing deeply nested XML in Databricks

Hi Guys,Can someone point me to libraries to parse XML files in Databricks using Python / Scala.Any link to blog / documentations will be helpful.Looked into https://docs.databricks.com/data/data-sources/xml.html.Want to parse XSD's, seem this is exp...

  • 12198 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sreedhar Vengala​ - I heard back from the team. As you noted, the feature is still experimental and not supported at this time.I would like to assure you that the team is aware of this. I have no information about a time frame to make this a support...

  • 1 kudos
1 More Replies
PraveenKumar188
by New Contributor
  • 3637 Views
  • 2 replies
  • 2 kudos

Resolved! Is is possible to Mount multiple ADLS Gen2 Storage paths in single workspace

Hello Experts,We are looking on feasibility of mounting more that one ADLS Gen2 storages on a single workspace of databricks.Best RegardsPraveen

  • 3637 Views
  • 2 replies
  • 2 kudos
Latest Reply
Erik
Valued Contributor III
  • 2 kudos

Yes, its possible, we are doing it. Just mount them to different folders like @Werner Stinckens​ is saying.

  • 2 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 1951 Views
  • 2 replies
  • 0 kudos

What subset of mysql sql syntax we support in spark sql?

https://spark.apache.org/docs/latest/sql-ref-syntax.html

  • 1951 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Spark 3 has experimental support for ANSI. Read more here:https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html

  • 0 kudos
1 More Replies
HafidzZulkifli
by New Contributor II
  • 15828 Views
  • 8 replies
  • 0 kudos

How to import data and apply multiline and charset UTF8 at the same time?

I'm running Spark 2.2.0 at the moment. Currently I'm facing an issue when importing data of Mexican origin, where the characters can have special characters and with multiline for certain columns. Ideally, this is the command I'd like to run: T_new_...

  • 15828 Views
  • 8 replies
  • 0 kudos
Latest Reply
DianGermishuize
New Contributor II
  • 0 kudos

You could also potentially use the .withColumns() function on the data frame, and use the pyspark.sql.functions.encode function to convert the characterset to the one you need. Convert the Character Set/Encoding of a String field in a PySpark DataFr...

  • 0 kudos
7 More Replies
AndreStarker
by New Contributor III
  • 2318 Views
  • 3 replies
  • 2 kudos

Certification status

I've passed the "Databricks Certified Associate Developer for Apache Spark 3.0 - Scala" certification exam on 7/17/2021. The Webassessor record says I should receive certification status from Databricks within a week. I have not received any communi...

  • 2318 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Andre Starker​ - Congratulations!!!

  • 2 kudos
2 More Replies
subhaawsazure
by New Contributor II
  • 4425 Views
  • 2 replies
  • 1 kudos

Resolved! Instance was not reachable.

Instance was not reachable. This can be a transient networking issue. If the problem persists, this usually indicates a network environment misconfiguration. Please check your cloud provider configuration, and make sure that Databricks control plane...

  • 4425 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi,Maybe there was a connectivity issue or an outage happening right at that time. We have a status page that you can check in case you see this error in the future. Please go to https://status.databricks.com/

  • 1 kudos
1 More Replies
prasadvaze
by Valued Contributor II
  • 2794 Views
  • 2 replies
  • 1 kudos

Delta RUST API (not REST )

@dennylee Delta RUST API seems a good option to query delta table without spinning up spark cluster so I am trying out this - https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html using Python app"Read...

  • 2794 Views
  • 2 replies
  • 1 kudos
Latest Reply
prasadvaze
Valued Contributor II
  • 1 kudos

https://github.com/delta-io/delta-rs/issues/392 This issue is being actively worked on .

  • 1 kudos
1 More Replies
fymaterials_199
by New Contributor II
  • 1439 Views
  • 1 replies
  • 0 kudos

pyspark intermediate dataframe consumes many memory

I have pyspark code running in my local mac, which has 6 cores and 16 GB. I run it in pycharm to do first test.spark = ( SparkSession.builder.appName("loc") .master("local[2]") .config("spark.driver.bindAddress","localhost") .config("...

  • 1439 Views
  • 1 replies
  • 0 kudos
Latest Reply
fymaterials_199
New Contributor II
  • 0 kudos

Here is my input fileEID,EffectiveTime,OrderHistory,dummy_col,Period_Start_Date11,2019-04-19T02:50:42.6918667Z,"[{'Codes': [{'CodeSystem': 'sys_1', 'Code': '1-2'}], 'EffectiveDateTime': '2019-04-18T23:48:00Z', 'ComponentResults': [{'Codes': [{'CodeSy...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 3768 Views
  • 2 replies
  • 2 kudos

Resolved! Error/Exception when a read websocket with readStream

Hi guys, how are you ? Can you help me ? that my situation When I try to read a websocket with readStream I receive a unknow error exception java.net.UnknownHostException That's my code wssocket = spark\ .readStream\ .forma...

  • 3768 Views
  • 2 replies
  • 2 kudos
Latest Reply
Deepak_Bhutada
Contributor III
  • 2 kudos

It will definitely create a streaming object. So, don't go by wssocket.isStreaming = Truepiece. Also, it will create the streaming object without any issue. Since lazy evaluation Now, coming to the issue, please put the IP directly, sometimes the sla...

  • 2 kudos
1 More Replies
jay_kum
by New Contributor III
  • 2234 Views
  • 2 replies
  • 0 kudos

Resolved! Unable to execute Self Learning Path codes

I am unable to execute code examples given in the learning path. I understand it could be due to access issue. How do I change the working directory to User folder for creating/uploading/read/write etc? By default everything is on driver node. Even...

  • 2234 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@jay.kum​ - Fantastic! Thanks for letting us know.

  • 0 kudos
1 More Replies
saipujari_spark
by Databricks Employee
  • 8448 Views
  • 1 replies
  • 3 kudos

Resolved! How to restrict the number of tasks per executor?

In general, one task per core is how spark executes the tasks.If we want to restrict the number of tasks submitted to the executor to get more task to memory ratio, How can we achieve that?

  • 8448 Views
  • 1 replies
  • 3 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 3 kudos

We can use a config called "spark.task.cpus"This specifies the number of cores to allocate for each task.The default value is 1If we specify say 2, it means fewer tasks will be assigned to the executor.

  • 3 kudos
Geeya
by New Contributor II
  • 1921 Views
  • 1 replies
  • 0 kudos

After several iteration of filter and union, the data is bigger than spark.driver.maxResultSize

The process for me to build model is:filter dataset and split into two datasetsfit model based on two datasets union two datasetsrepeat 1-3 stepsThe problem is that after several iterations, the model fitting time becomes longer dramatically, and the...

  • 1921 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

I assume that you are using PySpark to train a model? It sounds like you are collecting data on the driver and likely need to increase the size. Can you share any code?

  • 0 kudos
jacek
by New Contributor II
  • 5096 Views
  • 4 replies
  • 1 kudos

Is there an option to have cell titles in notebook view 'Table of contents' ? If not - could you add one?

I like cell title more than separate %md cell. Having cell title in the table of contents seems like quite simple feature. Is it possible? If not - could you add one?

  • 5096 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@jacek​ - I wanted to pop in and give you a status update. The team is aware of your request. I can't make any promises on when something may change, but we appreciate your idea and bringing this to our attention.

  • 1 kudos
3 More Replies
gbrueckl
by Contributor II
  • 11868 Views
  • 2 replies
  • 4 kudos

Resolved! dbutils.notebook.run with multiselect parameter

I have a notebook which has a parameter defined as dbutils.widgets.multiselect("my_param", "ALL", ["ALL", "A", "B", "C")and I would like to pass this parameter when calling the notebook via dbutils.notebook.run()However, I tried passing it as an pyth...

  • 11868 Views
  • 2 replies
  • 4 kudos
Latest Reply
gbrueckl
Contributor II
  • 4 kudos

you are right, this actually works fine.I just realized I had two multiselect parameters in my tests and only changing one of them still resulted in the same error message for the second one I ended up writing a function that parses whatever comes in...

  • 4 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels