cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

SreedharVengala
by New Contributor III
  • 11296 Views
  • 2 replies
  • 1 kudos

Parsing deeply nested XML in Databricks

Hi Guys,Can someone point me to libraries to parse XML files in Databricks using Python / Scala.Any link to blog / documentations will be helpful.Looked into https://docs.databricks.com/data/data-sources/xml.html.Want to parse XSD's, seem this is exp...

  • 11296 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sreedhar Vengala​ - I heard back from the team. As you noted, the feature is still experimental and not supported at this time.I would like to assure you that the team is aware of this. I have no information about a time frame to make this a support...

  • 1 kudos
1 More Replies
PraveenKumar188
by New Contributor
  • 3081 Views
  • 2 replies
  • 2 kudos

Resolved! Is is possible to Mount multiple ADLS Gen2 Storage paths in single workspace

Hello Experts,We are looking on feasibility of mounting more that one ADLS Gen2 storages on a single workspace of databricks.Best RegardsPraveen

  • 3081 Views
  • 2 replies
  • 2 kudos
Latest Reply
Erik
Valued Contributor III
  • 2 kudos

Yes, its possible, we are doing it. Just mount them to different folders like @Werner Stinckens​ is saying.

  • 2 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 1534 Views
  • 2 replies
  • 0 kudos

What subset of mysql sql syntax we support in spark sql?

https://spark.apache.org/docs/latest/sql-ref-syntax.html

  • 1534 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Spark 3 has experimental support for ANSI. Read more here:https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html

  • 0 kudos
1 More Replies
HafidzZulkifli
by New Contributor II
  • 14366 Views
  • 8 replies
  • 0 kudos

How to import data and apply multiline and charset UTF8 at the same time?

I'm running Spark 2.2.0 at the moment. Currently I'm facing an issue when importing data of Mexican origin, where the characters can have special characters and with multiline for certain columns. Ideally, this is the command I'd like to run: T_new_...

  • 14366 Views
  • 8 replies
  • 0 kudos
Latest Reply
DianGermishuize
New Contributor II
  • 0 kudos

You could also potentially use the .withColumns() function on the data frame, and use the pyspark.sql.functions.encode function to convert the characterset to the one you need. Convert the Character Set/Encoding of a String field in a PySpark DataFr...

  • 0 kudos
7 More Replies
AndreStarker
by New Contributor III
  • 2014 Views
  • 3 replies
  • 2 kudos

Certification status

I've passed the "Databricks Certified Associate Developer for Apache Spark 3.0 - Scala" certification exam on 7/17/2021. The Webassessor record says I should receive certification status from Databricks within a week. I have not received any communi...

  • 2014 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Andre Starker​ - Congratulations!!!

  • 2 kudos
2 More Replies
subhaawsazure
by New Contributor II
  • 3739 Views
  • 2 replies
  • 1 kudos

Resolved! Instance was not reachable.

Instance was not reachable. This can be a transient networking issue. If the problem persists, this usually indicates a network environment misconfiguration. Please check your cloud provider configuration, and make sure that Databricks control plane...

  • 3739 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi,Maybe there was a connectivity issue or an outage happening right at that time. We have a status page that you can check in case you see this error in the future. Please go to https://status.databricks.com/

  • 1 kudos
1 More Replies
prasadvaze
by Valued Contributor II
  • 2415 Views
  • 2 replies
  • 1 kudos

Delta RUST API (not REST )

@dennylee Delta RUST API seems a good option to query delta table without spinning up spark cluster so I am trying out this - https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html using Python app"Read...

  • 2415 Views
  • 2 replies
  • 1 kudos
Latest Reply
prasadvaze
Valued Contributor II
  • 1 kudos

https://github.com/delta-io/delta-rs/issues/392 This issue is being actively worked on .

  • 1 kudos
1 More Replies
fymaterials_199
by New Contributor II
  • 1131 Views
  • 1 replies
  • 0 kudos

pyspark intermediate dataframe consumes many memory

I have pyspark code running in my local mac, which has 6 cores and 16 GB. I run it in pycharm to do first test.spark = ( SparkSession.builder.appName("loc") .master("local[2]") .config("spark.driver.bindAddress","localhost") .config("...

  • 1131 Views
  • 1 replies
  • 0 kudos
Latest Reply
fymaterials_199
New Contributor II
  • 0 kudos

Here is my input fileEID,EffectiveTime,OrderHistory,dummy_col,Period_Start_Date11,2019-04-19T02:50:42.6918667Z,"[{'Codes': [{'CodeSystem': 'sys_1', 'Code': '1-2'}], 'EffectiveDateTime': '2019-04-18T23:48:00Z', 'ComponentResults': [{'Codes': [{'CodeSy...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 3219 Views
  • 2 replies
  • 2 kudos

Resolved! Error/Exception when a read websocket with readStream

Hi guys, how are you ? Can you help me ? that my situation When I try to read a websocket with readStream I receive a unknow error exception java.net.UnknownHostException That's my code wssocket = spark\ .readStream\ .forma...

  • 3219 Views
  • 2 replies
  • 2 kudos
Latest Reply
Deepak_Bhutada
Contributor III
  • 2 kudos

It will definitely create a streaming object. So, don't go by wssocket.isStreaming = Truepiece. Also, it will create the streaming object without any issue. Since lazy evaluation Now, coming to the issue, please put the IP directly, sometimes the sla...

  • 2 kudos
1 More Replies
jay_kum
by New Contributor III
  • 1875 Views
  • 2 replies
  • 0 kudos

Resolved! Unable to execute Self Learning Path codes

I am unable to execute code examples given in the learning path. I understand it could be due to access issue. How do I change the working directory to User folder for creating/uploading/read/write etc? By default everything is on driver node. Even...

  • 1875 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@jay.kum​ - Fantastic! Thanks for letting us know.

  • 0 kudos
1 More Replies
saipujari_spark
by Databricks Employee
  • 7358 Views
  • 1 replies
  • 3 kudos

Resolved! How to restrict the number of tasks per executor?

In general, one task per core is how spark executes the tasks.If we want to restrict the number of tasks submitted to the executor to get more task to memory ratio, How can we achieve that?

  • 7358 Views
  • 1 replies
  • 3 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 3 kudos

We can use a config called "spark.task.cpus"This specifies the number of cores to allocate for each task.The default value is 1If we specify say 2, it means fewer tasks will be assigned to the executor.

  • 3 kudos
Geeya
by New Contributor II
  • 1600 Views
  • 1 replies
  • 0 kudos

After several iteration of filter and union, the data is bigger than spark.driver.maxResultSize

The process for me to build model is:filter dataset and split into two datasetsfit model based on two datasets union two datasetsrepeat 1-3 stepsThe problem is that after several iterations, the model fitting time becomes longer dramatically, and the...

  • 1600 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

I assume that you are using PySpark to train a model? It sounds like you are collecting data on the driver and likely need to increase the size. Can you share any code?

  • 0 kudos
jacek
by New Contributor II
  • 4476 Views
  • 4 replies
  • 1 kudos

Is there an option to have cell titles in notebook view 'Table of contents' ? If not - could you add one?

I like cell title more than separate %md cell. Having cell title in the table of contents seems like quite simple feature. Is it possible? If not - could you add one?

  • 4476 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@jacek​ - I wanted to pop in and give you a status update. The team is aware of your request. I can't make any promises on when something may change, but we appreciate your idea and bringing this to our attention.

  • 1 kudos
3 More Replies
gbrueckl
by Contributor II
  • 10357 Views
  • 2 replies
  • 4 kudos

Resolved! dbutils.notebook.run with multiselect parameter

I have a notebook which has a parameter defined as dbutils.widgets.multiselect("my_param", "ALL", ["ALL", "A", "B", "C")and I would like to pass this parameter when calling the notebook via dbutils.notebook.run()However, I tried passing it as an pyth...

  • 10357 Views
  • 2 replies
  • 4 kudos
Latest Reply
gbrueckl
Contributor II
  • 4 kudos

you are right, this actually works fine.I just realized I had two multiselect parameters in my tests and only changing one of them still resulted in the same error message for the second one I ended up writing a function that parses whatever comes in...

  • 4 kudos
1 More Replies
tarente
by New Contributor III
  • 1452 Views
  • 2 replies
  • 3 kudos

Resolved! How to create a csv using a Scala notebook that as " in some columns?

In a project we use Azure Databricks to create csv files to be loaded in ThoughtSpot.Below is a sample to the code I use to write the file:val fileRepartition = 1 val fileFormat = "csv" val fileSaveMode = "overwrite" var fileOptions = Map ( ...

  • 1452 Views
  • 2 replies
  • 3 kudos
Latest Reply
tarente
New Contributor III
  • 3 kudos

Hi Shan,Thanks for the link.I now know more options for creating different csv files.I have not yet completed the problem, but that is related with a destination application (ThoughtSpot) not being able to load the data in the csv file correctly.Rega...

  • 3 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels