cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

William_Scardua
by Valued Contributor
  • 11244 Views
  • 5 replies
  • 3 kudos

Resolved! Read just the new file ???

Hi guys,How can I read just the new file in a batch process ?Can you help me ? pleasThank you

  • 11244 Views
  • 5 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 3 kudos

What type of file? Is the file stored in a storage account? Typically, you would read and write data with something like the following code: # read a parquet file df = spark.read.format("parquet").load("/path/to/file")   # write the data as a file df...

  • 3 kudos
4 More Replies
Meaz10
by Databricks Partner
  • 3094 Views
  • 3 replies
  • 2 kudos

Resolved! Current DBR is not yet available to this notebook

Any one has an idea why i am getting this error:"The current DBR is not yet available to this notebook. Give it a second and try again!"

  • 3094 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Meysam az​ - Thank you for letting us know that the issue has been resolved and for the extra information.

  • 2 kudos
2 More Replies
SreedharVengala
by New Contributor III
  • 13638 Views
  • 2 replies
  • 1 kudos

Parsing deeply nested XML in Databricks

Hi Guys,Can someone point me to libraries to parse XML files in Databricks using Python / Scala.Any link to blog / documentations will be helpful.Looked into https://docs.databricks.com/data/data-sources/xml.html.Want to parse XSD's, seem this is exp...

  • 13638 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sreedhar Vengala​ - I heard back from the team. As you noted, the feature is still experimental and not supported at this time.I would like to assure you that the team is aware of this. I have no information about a time frame to make this a support...

  • 1 kudos
1 More Replies
PraveenKumar188
by New Contributor
  • 4598 Views
  • 2 replies
  • 2 kudos

Resolved! Is is possible to Mount multiple ADLS Gen2 Storage paths in single workspace

Hello Experts,We are looking on feasibility of mounting more that one ADLS Gen2 storages on a single workspace of databricks.Best RegardsPraveen

  • 4598 Views
  • 2 replies
  • 2 kudos
Latest Reply
Erik
Valued Contributor III
  • 2 kudos

Yes, its possible, we are doing it. Just mount them to different folders like @Werner Stinckens​ is saying.

  • 2 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 2947 Views
  • 2 replies
  • 0 kudos

What subset of mysql sql syntax we support in spark sql?

https://spark.apache.org/docs/latest/sql-ref-syntax.html

  • 2947 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Spark 3 has experimental support for ANSI. Read more here:https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html

  • 0 kudos
1 More Replies
HafidzZulkifli
by New Contributor II
  • 19003 Views
  • 8 replies
  • 0 kudos

How to import data and apply multiline and charset UTF8 at the same time?

I'm running Spark 2.2.0 at the moment. Currently I'm facing an issue when importing data of Mexican origin, where the characters can have special characters and with multiline for certain columns. Ideally, this is the command I'd like to run: T_new_...

  • 19003 Views
  • 8 replies
  • 0 kudos
Latest Reply
DianGermishuize
New Contributor II
  • 0 kudos

You could also potentially use the .withColumns() function on the data frame, and use the pyspark.sql.functions.encode function to convert the characterset to the one you need. Convert the Character Set/Encoding of a String field in a PySpark DataFr...

  • 0 kudos
7 More Replies
AndreStarker
by New Contributor III
  • 3370 Views
  • 3 replies
  • 2 kudos

Certification status

I've passed the "Databricks Certified Associate Developer for Apache Spark 3.0 - Scala" certification exam on 7/17/2021. The Webassessor record says I should receive certification status from Databricks within a week. I have not received any communi...

  • 3370 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Andre Starker​ - Congratulations!!!

  • 2 kudos
2 More Replies
subhaawsazure
by New Contributor II
  • 5960 Views
  • 2 replies
  • 1 kudos

Resolved! Instance was not reachable.

Instance was not reachable. This can be a transient networking issue. If the problem persists, this usually indicates a network environment misconfiguration. Please check your cloud provider configuration, and make sure that Databricks control plane...

  • 5960 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi,Maybe there was a connectivity issue or an outage happening right at that time. We have a status page that you can check in case you see this error in the future. Please go to https://status.databricks.com/

  • 1 kudos
1 More Replies
prasadvaze
by Valued Contributor II
  • 4251 Views
  • 2 replies
  • 1 kudos

Delta RUST API (not REST )

@dennylee Delta RUST API seems a good option to query delta table without spinning up spark cluster so I am trying out this - https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html using Python app"Read...

  • 4251 Views
  • 2 replies
  • 1 kudos
Latest Reply
prasadvaze
Valued Contributor II
  • 1 kudos

https://github.com/delta-io/delta-rs/issues/392 This issue is being actively worked on .

  • 1 kudos
1 More Replies
fymaterials_199
by New Contributor II
  • 2054 Views
  • 1 replies
  • 0 kudos

pyspark intermediate dataframe consumes many memory

I have pyspark code running in my local mac, which has 6 cores and 16 GB. I run it in pycharm to do first test.spark = ( SparkSession.builder.appName("loc") .master("local[2]") .config("spark.driver.bindAddress","localhost") .config("...

  • 2054 Views
  • 1 replies
  • 0 kudos
Latest Reply
fymaterials_199
New Contributor II
  • 0 kudos

Here is my input fileEID,EffectiveTime,OrderHistory,dummy_col,Period_Start_Date11,2019-04-19T02:50:42.6918667Z,"[{'Codes': [{'CodeSystem': 'sys_1', 'Code': '1-2'}], 'EffectiveDateTime': '2019-04-18T23:48:00Z', 'ComponentResults': [{'Codes': [{'CodeSy...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 4903 Views
  • 2 replies
  • 2 kudos

Resolved! Error/Exception when a read websocket with readStream

Hi guys, how are you ? Can you help me ? that my situation When I try to read a websocket with readStream I receive a unknow error exception java.net.UnknownHostException That's my code wssocket = spark\ .readStream\ .forma...

  • 4903 Views
  • 2 replies
  • 2 kudos
Latest Reply
Deepak_Bhutada
Databricks Employee
  • 2 kudos

It will definitely create a streaming object. So, don't go by wssocket.isStreaming = Truepiece. Also, it will create the streaming object without any issue. Since lazy evaluation Now, coming to the issue, please put the IP directly, sometimes the sla...

  • 2 kudos
1 More Replies
jay_kum
by New Contributor III
  • 3065 Views
  • 2 replies
  • 0 kudos

Resolved! Unable to execute Self Learning Path codes

I am unable to execute code examples given in the learning path. I understand it could be due to access issue. How do I change the working directory to User folder for creating/uploading/read/write etc? By default everything is on driver node. Even...

  • 3065 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@jay.kum​ - Fantastic! Thanks for letting us know.

  • 0 kudos
1 More Replies
saipujari_spark
by Databricks Employee
  • 10237 Views
  • 1 replies
  • 3 kudos

Resolved! How to restrict the number of tasks per executor?

In general, one task per core is how spark executes the tasks.If we want to restrict the number of tasks submitted to the executor to get more task to memory ratio, How can we achieve that?

  • 10237 Views
  • 1 replies
  • 3 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 3 kudos

We can use a config called "spark.task.cpus"This specifies the number of cores to allocate for each task.The default value is 1If we specify say 2, it means fewer tasks will be assigned to the executor.

  • 3 kudos
Geeya
by New Contributor II
  • 2672 Views
  • 1 replies
  • 0 kudos

After several iteration of filter and union, the data is bigger than spark.driver.maxResultSize

The process for me to build model is:filter dataset and split into two datasetsfit model based on two datasets union two datasetsrepeat 1-3 stepsThe problem is that after several iterations, the model fitting time becomes longer dramatically, and the...

  • 2672 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

I assume that you are using PySpark to train a model? It sounds like you are collecting data on the driver and likely need to increase the size. Can you share any code?

  • 0 kudos
jacek
by New Contributor II
  • 6154 Views
  • 4 replies
  • 1 kudos

Is there an option to have cell titles in notebook view 'Table of contents' ? If not - could you add one?

I like cell title more than separate %md cell. Having cell title in the table of contents seems like quite simple feature. Is it possible? If not - could you add one?

  • 6154 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@jacek​ - I wanted to pop in and give you a status update. The team is aware of your request. I can't make any promises on when something may change, but we appreciate your idea and bringing this to our attention.

  • 1 kudos
3 More Replies
Labels