cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RajaLakshmanan
by New Contributor
  • 3415 Views
  • 2 replies
  • 1 kudos

Resolved! Spark StreamingQuery not processing all data from source directory

Hi,I have setup a streaming process that consumers files from HDFS staging directory and writes into target location. Input directory continuesouly gets files from another process.Lets say file producer produces 5 million records sends it to hdfs sta...

  • 3415 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16763506586
Contributor
  • 1 kudos

If it helps , you run try running the Left-Anti join on source and sink to identify missing records and see whether the record is in match with the schema provided or not

  • 1 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 2222 Views
  • 1 replies
  • 1 kudos

How can I get Databricks notebooks to stop cutting off the explain plans?

(since Spark 3.0)Dataset.queryExecution.debug.toFilewill dump the full plan to a file, without concatenating the output as a fully materialized Java string in memory.

  • 2222 Views
  • 1 replies
  • 1 kudos
Latest Reply
dazfuller
Contributor III
  • 1 kudos

Notebooks really aren't the best method of viewing large files. Two methods you could employ areSave the file to dbfs and then use databricks CLI to download the fileUse the web terminalIn the web terminal option you can do something like "cat my_lar...

  • 1 kudos
YuvSaha
by New Contributor
  • 1086 Views
  • 1 replies
  • 0 kudos

Auto Loader for the Shape File ?.

Hello, As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???Any help on this is greatly appreciated. avro: Avro filebinaryFile: Binary f...

  • 1086 Views
  • 1 replies
  • 0 kudos
Latest Reply
dbkent
Databricks Employee
  • 0 kudos

Hi @Yuv Saha​ ,Currently, shapefiles are not a supported file-type when using auto-loader. Would you be willing to share more about your use case? I am the Product Manager responsible for Geospatial in Databricks, and I need help from customers like ...

  • 0 kudos
delta_lake
by New Contributor
  • 5227 Views
  • 1 replies
  • 0 kudos

Delta Lake Python

I have setup a virtual environment inside my existing hadoop cluster. Since the current cluster does not have spark >3 , so i installed delta spark using virtual environment. While trying to access the hdfs which is kerberose one, Getting below error...

  • 5227 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Valued Contributor
  • 0 kudos

Hi @Vasanth P​ ,Could you please confirm the DBR version that you are using in the cluster? There are multiple DBR versions that is available with different spark and scala version which can utilised directly in databricks cluster.

  • 0 kudos
EricOX
by New Contributor
  • 4644 Views
  • 1 replies
  • 3 kudos

Resolved! How to handle configuration for different environment (e.g. DEV, PROD)?

May I know any suggested way to handle different environment variables for the same code base? For example, the mount point of Data Lake for DEV, UAT, and PROD. Any recommendations or best practices? Moreover, how to handle Azure DevOps?

  • 4644 Views
  • 1 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Eric Yeung​ , you can put all your configuration parameters in a file (JSON, CONF, YAML whatever you like) and read that file at the beginning of each program.I like to use the ConfigFactory in Scala for example.You only have to make sure the file c...

  • 3 kudos
DouglasLinder
by New Contributor III
  • 10175 Views
  • 4 replies
  • 1 kudos

Is it possible to pass configuration to a job on high concurrency cluster?

On a regular cluster, you can use:```spark.sparkContext._jsc.hadoopConfiguration().set(key, value)```These values are then available on the executors using the hadoop configuration. However, on a high concurrency cluster, attempting to do so results ...

  • 10175 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

I am not sure why you are getting that error on a high concurrency cluster. As I am able to set the configuration as you show above. Can you try the following code instead? sc._jsc.hadoopConfiguration().set(key, value)

  • 1 kudos
3 More Replies
Nazar
by New Contributor II
  • 5790 Views
  • 3 replies
  • 4 kudos

Resolved! Incremental write

Hi All,I have a daily spark job that reads and joins 3-4 source tables and writes the df in a parquet format. This data frame consists of 100+ columns. As this job run daily, our deduplication logic identifies the latest record from each of source t...

  • 5790 Views
  • 3 replies
  • 4 kudos
Latest Reply
Nazar
New Contributor II
  • 4 kudos

Thanks werners

  • 4 kudos
2 More Replies
William_Scardua
by Valued Contributor
  • 8120 Views
  • 5 replies
  • 3 kudos

Resolved! Read just the new file ???

Hi guys,How can I read just the new file in a batch process ?Can you help me ? pleasThank you

  • 8120 Views
  • 5 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 3 kudos

What type of file? Is the file stored in a storage account? Typically, you would read and write data with something like the following code: # read a parquet file df = spark.read.format("parquet").load("/path/to/file")   # write the data as a file df...

  • 3 kudos
4 More Replies
Meaz10
by New Contributor III
  • 1830 Views
  • 3 replies
  • 2 kudos

Resolved! Current DBR is not yet available to this notebook

Any one has an idea why i am getting this error:"The current DBR is not yet available to this notebook. Give it a second and try again!"

  • 1830 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Meysam az​ - Thank you for letting us know that the issue has been resolved and for the extra information.

  • 2 kudos
2 More Replies
SreedharVengala
by New Contributor III
  • 11482 Views
  • 2 replies
  • 1 kudos

Parsing deeply nested XML in Databricks

Hi Guys,Can someone point me to libraries to parse XML files in Databricks using Python / Scala.Any link to blog / documentations will be helpful.Looked into https://docs.databricks.com/data/data-sources/xml.html.Want to parse XSD's, seem this is exp...

  • 11482 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sreedhar Vengala​ - I heard back from the team. As you noted, the feature is still experimental and not supported at this time.I would like to assure you that the team is aware of this. I have no information about a time frame to make this a support...

  • 1 kudos
1 More Replies
PraveenKumar188
by New Contributor
  • 3189 Views
  • 2 replies
  • 2 kudos

Resolved! Is is possible to Mount multiple ADLS Gen2 Storage paths in single workspace

Hello Experts,We are looking on feasibility of mounting more that one ADLS Gen2 storages on a single workspace of databricks.Best RegardsPraveen

  • 3189 Views
  • 2 replies
  • 2 kudos
Latest Reply
Erik
Valued Contributor III
  • 2 kudos

Yes, its possible, we are doing it. Just mount them to different folders like @Werner Stinckens​ is saying.

  • 2 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 1624 Views
  • 2 replies
  • 0 kudos

What subset of mysql sql syntax we support in spark sql?

https://spark.apache.org/docs/latest/sql-ref-syntax.html

  • 1624 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Spark 3 has experimental support for ANSI. Read more here:https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html

  • 0 kudos
1 More Replies
HafidzZulkifli
by New Contributor II
  • 14603 Views
  • 8 replies
  • 0 kudos

How to import data and apply multiline and charset UTF8 at the same time?

I'm running Spark 2.2.0 at the moment. Currently I'm facing an issue when importing data of Mexican origin, where the characters can have special characters and with multiline for certain columns. Ideally, this is the command I'd like to run: T_new_...

  • 14603 Views
  • 8 replies
  • 0 kudos
Latest Reply
DianGermishuize
New Contributor II
  • 0 kudos

You could also potentially use the .withColumns() function on the data frame, and use the pyspark.sql.functions.encode function to convert the characterset to the one you need. Convert the Character Set/Encoding of a String field in a PySpark DataFr...

  • 0 kudos
7 More Replies
AndreStarker
by New Contributor III
  • 2091 Views
  • 3 replies
  • 2 kudos

Certification status

I've passed the "Databricks Certified Associate Developer for Apache Spark 3.0 - Scala" certification exam on 7/17/2021. The Webassessor record says I should receive certification status from Databricks within a week. I have not received any communi...

  • 2091 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Andre Starker​ - Congratulations!!!

  • 2 kudos
2 More Replies
subhaawsazure
by New Contributor II
  • 3872 Views
  • 2 replies
  • 1 kudos

Resolved! Instance was not reachable.

Instance was not reachable. This can be a transient networking issue. If the problem persists, this usually indicates a network environment misconfiguration. Please check your cloud provider configuration, and make sure that Databricks control plane...

  • 3872 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi,Maybe there was a connectivity issue or an outage happening right at that time. We have a status page that you can check in case you see this error in the future. Please go to https://status.databricks.com/

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels