cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ashkd7310
by New Contributor II
  • 681 Views
  • 2 replies
  • 4 kudos

date type conversion error

Hello,I am trying to convert the date in MM/dd/yyyy format. So I am first using the date_format function and converting the date into MM/dd/yyyy. So it becomes string. However, my use case is to have the data as date. so I am again converting the str...

  • 681 Views
  • 2 replies
  • 4 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 4 kudos

Check with this method if it works.# Convert date to MM/dd/yyyy format (string) df = df.withColumn("formatted_date", date_format("date", "MM/dd/yyyy")) # Convert string back to date df = df.withColumn("converted_date", to_date("formatted_date", "MM/...

  • 4 kudos
1 More Replies
DataEnginerrOO
by New Contributor III
  • 1848 Views
  • 4 replies
  • 2 kudos

Error while trying to install jdbc8.jar

Hi,I am attempting to connect to an Oracle server. I tried to install the ojdbc8.jar library, but I encountered an error: "Library installation attempted on the driver node of cluster 0718-101257-h5k9c5ud failed. Please refer to the following error m...

  • 1848 Views
  • 4 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
Hi,I am attempting to connect to an Oracle server. I tried to install the ojdbc8.jar library, but I encountered an error: "Library installation attempted on the driver node of cluster 0718-101257-h5k9c5ud failed. Please refer to the following error m...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
3 More Replies
prith
by New Contributor III
  • 2952 Views
  • 7 replies
  • 1 kudos

Resolved! Datbricks JDK 17 upgrade error

We tried upgrading to JDK 17Using Spark version 3.0.5 and runtime 14.3 LTSGetting this exception using parallelstream()With Java 17 I am not able to parallel process different partitions at the same time.  This means when there is more than 1 partiti...

  • 2952 Views
  • 7 replies
  • 1 kudos
Latest Reply
prith
New Contributor III
  • 1 kudos

Anyways - thanks for your response - We found a workaround for this error and JDK 17 is actually working - it appears faster than JDK 8

  • 1 kudos
6 More Replies
mb1234
by New Contributor
  • 450 Views
  • 1 replies
  • 1 kudos

Error using curl within a job

I have a notebook that, as a first step, needs to download and install some drivers. The actual code is this:%sh# Install gdebi command line toolapt-get -y install gdebi-core# Install Posit professional driverscurl -LO https://cdn.rstudio.com/drivers...

  • 450 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 1 kudos

Hi @mb1234 ,What error did you get?Edit: I've checked and it worked in job  

  • 1 kudos
pernilak
by New Contributor III
  • 2396 Views
  • 1 replies
  • 2 kudos

Working with Unity Catalog from VSCode using the Databricks Extension

Hi!As suggested by Databricks, we are working with Databricks from VSCode using Databricks bundles for our deployment and using the VSCode Databricks Extension and Databricks Connect during development.However, there are some limitations that we are ...

  • 2396 Views
  • 1 replies
  • 2 kudos
Latest Reply
rustam
New Contributor II
  • 2 kudos

Thank you for the detailed reply, @Retired_mod and the great question @pernilak!I would also like to code and debug in VS Code while all the code in my Jupyter notebooks can be executed on a databricks cluster cell by cell with access to the data in ...

  • 2 kudos
leungi
by Contributor
  • 1633 Views
  • 1 replies
  • 0 kudos

Spark Out of Memory Error

BackgroundUsing R language's {sparklyr} package to fetch data from tables in Unity Catalog, and faced the error below.Tried the following, to no avail:Using memory optimized cluster - e.g., E4d.Using bigger (RAM) cluster - e.g., E8d.Enable auto-scali...

  • 1633 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
BackgroundUsing R language's {sparklyr} package to fetch data from tables in Unity Catalog, and faced the error below.Tried the following, to no avail:Using memory optimized cluster - e.g., E4d.Using bigger (RAM) cluster - e.g., E8d.Enable auto-scali...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
venkatgmf
by New Contributor II
  • 436 Views
  • 1 replies
  • 0 kudos

Resolved! DLT Pipeline failing (due > 500 tables) any graph tables limitation

DLT Pipeline Faling due to INTERNAL_ERROR: Communication lost with driver. Cluster 0719-162209-rx37csry was not reachable for 120 seconds

DLT communication error.png
  • 436 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @venkatgmf ,Yeah, you are right that high number of tables could be a problem If you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the driver ar...

  • 0 kudos
Jackson1111
by New Contributor III
  • 471 Views
  • 1 replies
  • 0 kudos

Multiple sources found for csv

When I run a job using spark2 jar, I then run a python job to report :Multiple sources found for csv (org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the...

  • 471 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Jackson1111 It looks like you've installed two different libraries to handle CSV data.You need to specify which one you want to use, ex:df = self.spark.read.format("org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2").option("header",...

  • 0 kudos
Sreyasi_Thakur
by New Contributor II
  • 534 Views
  • 1 replies
  • 0 kudos

DLT Pipeline on Hive Metastore

I am creating a DLT pipeline on Hive Metastore (destination is Hive Metastore) and using a notebook within the pipeline which reads a unity catalog table. But, I am getting an error- [UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.Is it...

  • 534 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
I am creating a DLT pipeline on Hive Metastore (destination is Hive Metastore) and using a notebook within the pipeline which reads a unity catalog table. But, I am getting an error- [UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.Is it...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Sheeraj9191
by New Contributor
  • 441 Views
  • 1 replies
  • 0 kudos
  • 441 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi @Sheeraj9191 , I believe the table you are looking for is `system.billing.usage`  (docs: https://docs.databricks.com/en/admin/system-tables/billing.html#billable-usage-table-schema). This table contains information at the job level in field `usage...

  • 0 kudos
128941
by New Contributor III
  • 1119 Views
  • 2 replies
  • 1 kudos

What are best practices for the Datatabricks workflow jobs?

Recommendations on how many tables per workflow?inter dependency between the workflows?Custom schedule?Monitoring?Reports? 

  • 1119 Views
  • 2 replies
  • 1 kudos
Latest Reply
128941
New Contributor III
  • 1 kudos

product max limit and best practices.

  • 1 kudos
1 More Replies
KosmaS
by New Contributor III
  • 742 Views
  • 0 replies
  • 0 kudos

Lost Databricks' dependency in a job.

Hey,I had a stable notebook within the whole job. It contains one action defined as dumping data to s3. Currently, it started generating some issues. Maybe someone can suggest either how to investigate it further or what to try to do with such kinds ...

Screenshot 2024-07-19 at 19.55.48.png
  • 742 Views
  • 0 replies
  • 0 kudos
8b1tz
by Contributor
  • 474 Views
  • 1 replies
  • 0 kudos

Data factory logs into databricks delta table

Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

  • 474 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
pankaj30
by New Contributor II
  • 1809 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Hi ,We are trying to read data from mongodb using databricks notebook with pyspark connectivity.When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key...

  • 1809 Views
  • 3 replies
  • 2 kudos
Latest Reply
an313x
New Contributor III
  • 2 kudos

UPDATE:Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframebsonmongodb-driver-coremongodb-driver-syncAlso, I noticed that both DBR 13.3 LTS and 14...

  • 2 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels