cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

rgb
by New Contributor
  • 785 Views
  • 0 replies
  • 0 kudos

Migration_pipeline.py failing to get default credentials

cat ~/.databrickscfg looks like this (with the correct token/host values in place of xxxxxx)[DEFAULT]host = xxxxxxtoken = xxxxxxjobs-api-version = 2.0The command I run to start the pipeline with default configured credentials is :sudo python3 migrati...

databrickserror
  • 785 Views
  • 0 replies
  • 0 kudos
693872
by New Contributor II
  • 2707 Views
  • 5 replies
  • 2 kudos

Here I am getting this error when i execute left join on two data frame: PythonException: 'pyspark.serializers.SerializationError: Caused by Traceback (most recent call last): going to post full traceback:

I simply do left join on two data frame and both data frame content i was able to print.Here is the code looks like:-df_silver = spark.sql("select ds.PropertyID,\              ds.*             from dfsilver as ds LEFT JOIN dfaddmaster as dm \        ...

  • 2707 Views
  • 5 replies
  • 2 kudos
Latest Reply
Dooley
Valued Contributor II
  • 2 kudos

Did that answer your question? Did it work?

  • 2 kudos
4 More Replies
marcus1
by New Contributor III
  • 413 Views
  • 0 replies
  • 0 kudos

Why does databricks https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html#get-users take so long

I've been observing as we added more workspaces and users to those workspaces that fetching users per workspace is now taking 11 minutes or more.Our automation to provision group access is now unacceptably long. I've noted that the UI doesn't suffer...

  • 413 Views
  • 0 replies
  • 0 kudos
J_M_W
by Contributor
  • 2916 Views
  • 2 replies
  • 5 kudos

Resolved! Databricks is automatically creating a _apply_changes_storage table in the database when using apply_changes for Delta Live Tables

Hi there,I am using apply_changes (aka. Delta Live Tables Change Data Capture) and it works fine. However, it seems to automatically create a secondary table in the database metastore called _apply_storage_changes_{tableName}So for every table I use ...

image image
  • 2916 Views
  • 2 replies
  • 5 kudos
Latest Reply
J_M_W
Contributor
  • 5 kudos

Hi - Thanks @Hubert Dudek​ I will look into disabling access for the users!

  • 5 kudos
1 More Replies
berserkersap
by Contributor
  • 9658 Views
  • 1 replies
  • 0 kudos

How to deal with Decimal data type arithmetic operations ?

I am dealing with values ranging from 10^9 to 10^-9 , the sum of values can go up to 10^20 and need accuracy. So I wanted to use Decimal Data type [ Using SQL in Data Science & Engineering workspace]. However, I got to know the peculiar behavior of D...

  • 9658 Views
  • 1 replies
  • 0 kudos
Latest Reply
berserkersap
Contributor
  • 0 kudos

Hello Everyone,I understand that there is no best answer for this question. So, I could only do the same thing I found when I surfed the net.The method I found works whenIf you know the range of values you deal with (not just the input data but also ...

  • 0 kudos
190809
by Contributor
  • 1120 Views
  • 2 replies
  • 0 kudos

Invalid port error when trying to read from PlanetScale MySQL databse

Using the code below I am attempting to connect to a PlanetScale MySQL database. I get the following error: java.sql.SQLException: error parsing url : Incorrect port value. However the port is the default 3306, and I have used the correct url based o...

  • 1120 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pat
Honored Contributor III
  • 0 kudos

HI @Rachel Cunningham​ ,maybe you can share your `driver` and `url` value (masked)?

  • 0 kudos
1 More Replies
eques_99
by New Contributor II
  • 1389 Views
  • 2 replies
  • 0 kudos

Remove a category (slice) from a Pie Chart

I added a grand total row to a "Count" in SQL, which I needed for some counter visualisations. I used the "ROLL UP" command to get the grand total.However, I have a pie chart which references the same count, and so the grand total row has been added...

Capture
  • 1389 Views
  • 2 replies
  • 0 kudos
Latest Reply
eques_99
New Contributor II
  • 0 kudos

hi, as per the picture above, the slice disappears but the name ("null" in this case) remains on the legend.

  • 0 kudos
1 More Replies
Jayanth746
by New Contributor III
  • 4533 Views
  • 2 replies
  • 2 kudos

Databricks <-> Kafka - SSL handshake failed

I am receiving SSL handshake error even though the trust-store I have created is based on server certificate and the fingerprint in the certificate matches the trust-store fingerprint.kafkashaded.org.apache.kafka.common.errors.SslAuthenticationExcept...

  • 4533 Views
  • 2 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi @Jayanth Goulla​ , worth a try ,https://stackoverflow.com/questions/54903381/kafka-failed-authentication-due-to-ssl-handshake-failedDid you follow: https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/kafka?

  • 2 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 1639 Views
  • 1 replies
  • 2 kudos

Disable auto-complete (tab button)

Hello. How could we disable autocomplete that appears with tab button? Thank you

  • 1639 Views
  • 1 replies
  • 2 kudos
Latest Reply
elgeo
Valued Contributor II
  • 2 kudos

Thank you @Kaniz Fatma​ 

  • 2 kudos
vs_29
by New Contributor II
  • 2474 Views
  • 1 replies
  • 3 kudos

Custom Log4j logs are not being written to the DBFS storage.

 I used custom Log4j appender to write the custom logs through the init script and I can see the Custom Log file on the Driver logs but Databricks is not writing those custom logs to the DBFS. I have configured Logging Destination in the Advanced sec...

init script driver logs logs destination
  • 2474 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi @VIjeet Sharma​ , Do you receive any error? This can be an issue using DBFS mount point /dbfs in an init script: the DBFS mount point is installed asynchronously, so at the very beginning of init script execution, that mount point might not be ava...

  • 3 kudos
sharonbjehome
by New Contributor
  • 1376 Views
  • 1 replies
  • 1 kudos

Structered Streamin from MongoDB Atlas not parsing JSON correctly

HI all,I have a table in MongoDB Atlas that I am trying to read continuously to memory and then will write that file out eventually. However, when I look at the in-memory table it doesn't have the correct schema.Code here:from pyspark.sql.types impo...

image.png
  • 1376 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi @sharonbjehome​ , This has to be checked thoroughly via a support ticket, did you follow: https://docs.databricks.com/external-data/mongodb.html Also, could you please check with mongodb support, Was this working before?

  • 1 kudos
dara
by New Contributor
  • 932 Views
  • 1 replies
  • 1 kudos

How to count DelayCategories?

I would like to know how many count of each categories in each year, When I run count, it doesn't work.

image
  • 932 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, @Dara Tourt​ , When you say it does not work, what is the error? You can run count aggregate function. https://docs.databricks.com/sql/language-manual/functions/count.htmlPlease let us know if this helps.

  • 1 kudos
547284
by New Contributor II
  • 810 Views
  • 1 replies
  • 1 kudos

How to read in csvs from s3 directory with different columns

I can read all csvs under an S3 uri byu doing:files = dbutils.fs.ls('s3://example-path')df = spark.read.options(header='true',            encoding='iso-8859-1',            dateFormat='yyyyMMdd',            ignoreLeadingWhiteSpace='true',            i...

  • 810 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi @Anthony Wang​ As of now, I think that's the only way. Please refer: https://docs.databricks.com/external-data/csv.html#pitfalls-of-reading-a-subset-of-columns. Please let us know if this helps.

  • 1 kudos
sage5616
by Valued Contributor
  • 6544 Views
  • 3 replies
  • 6 kudos

Saving PySpark standard out and standard error logs to cloud object storage

I am running my PySpark data pipeline code on a standard databricks cluster. I need to save all Python/PySpark standard output and standard error messages into a file in an Azure BLOB account.When I run my Python code locally I can see all messages i...

  • 6544 Views
  • 3 replies
  • 6 kudos
Latest Reply
sage5616
Valued Contributor
  • 6 kudos

This is the approach I am currently taking. It is documented here: https://stackoverflow.com/questions/62774448/how-to-capture-cells-output-in-databricks-notebook from IPython.utils.capture import CapturedIO capture = CapturedIO(sys.stdout, sys.st...

  • 6 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels