cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

DouglasLinder
by New Contributor III
  • 9608 Views
  • 5 replies
  • 1 kudos

Is it possible to pass configuration to a job on high concurrency cluster?

On a regular cluster, you can use:```spark.sparkContext._jsc.hadoopConfiguration().set(key, value)```These values are then available on the executors using the hadoop configuration. However, on a high concurrency cluster, attempting to do so results ...

  • 9608 Views
  • 5 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

I am not sure why you are getting that error on a high concurrency cluster. As I am able to set the configuration as you show above. Can you try the following code instead? sc._jsc.hadoopConfiguration().set(key, value)

  • 1 kudos
4 More Replies
Nazar
by New Contributor II
  • 5092 Views
  • 5 replies
  • 5 kudos

Resolved! Incremental write

Hi All,I have a daily spark job that reads and joins 3-4 source tables and writes the df in a parquet format. This data frame consists of 100+ columns. As this job run daily, our deduplication logic identifies the latest record from each of source t...

  • 5092 Views
  • 5 replies
  • 5 kudos
Latest Reply
Nazar
New Contributor II
  • 5 kudos

Thanks werners

  • 5 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 6802 Views
  • 5 replies
  • 3 kudos

Resolved! Read just the new file ???

Hi guys,How can I read just the new file in a batch process ?Can you help me ? pleasThank you

  • 6802 Views
  • 5 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 3 kudos

What type of file? Is the file stored in a storage account? Typically, you would read and write data with something like the following code: # read a parquet file df = spark.read.format("parquet").load("/path/to/file")   # write the data as a file df...

  • 3 kudos
4 More Replies
Meaz10
by New Contributor III
  • 1480 Views
  • 4 replies
  • 2 kudos

Resolved! Current DBR is not yet available to this notebook

Any one has an idea why i am getting this error:"The current DBR is not yet available to this notebook. Give it a second and try again!"

  • 1480 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Meysam az​ - Thank you for letting us know that the issue has been resolved and for the extra information.

  • 2 kudos
3 More Replies
Kaniz_Fatma
by Community Manager
  • 2241 Views
  • 2 replies
  • 1 kudos
  • 2241 Views
  • 2 replies
  • 1 kudos
Latest Reply
dazfuller
Contributor III
  • 1 kudos

A class is the definition and you can create many instances of them, just like classes in any other language. An object is the instance of the class, a singleton, and can be used to create features you might recognise as static methods.Often when wri...

  • 1 kudos
1 More Replies
Kaniz_Fatma
by Community Manager
  • 3882 Views
  • 1 replies
  • 1 kudos
  • 3882 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

We can check using this method:-import boto3from botocore.errorfactory import ClientErrors3 = boto3.client('s3')try: s3.head_object(Bucket='bucket_name', Key='file_path')except ClientError: # Not found pass

  • 1 kudos
User15787040559
by Esteemed Contributor III
  • 1299 Views
  • 2 replies
  • 0 kudos

What subset of mysql sql syntax we support in spark sql?

https://spark.apache.org/docs/latest/sql-ref-syntax.html

  • 1299 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor III
  • 0 kudos

Spark 3 has experimental support for ANSI. Read more here:https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html

  • 0 kudos
1 More Replies
HafidzZulkifli
by New Contributor II
  • 13470 Views
  • 8 replies
  • 0 kudos

How to import data and apply multiline and charset UTF8 at the same time?

I'm running Spark 2.2.0 at the moment. Currently I'm facing an issue when importing data of Mexican origin, where the characters can have special characters and with multiline for certain columns. Ideally, this is the command I'd like to run: T_new_...

  • 13470 Views
  • 8 replies
  • 0 kudos
Latest Reply
DianGermishuize
New Contributor II
  • 0 kudos

You could also potentially use the .withColumns() function on the data frame, and use the pyspark.sql.functions.encode function to convert the characterset to the one you need. Convert the Character Set/Encoding of a String field in a PySpark DataFr...

  • 0 kudos
7 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels