cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User15725630784
by Databricks Employee
  • 1681 Views
  • 1 replies
  • 0 kudos

Spark JDBC query isn't working for Oracle Databases

I am trying to read with the following syntaxval df = spark.read .format("jdbc") .option("url", "<url>") .option("query", "SELECT * FROM oracle_test_table)") .option("user", "<user>") .option("password", "<password>") .option("driver", "oracle...

  • 1681 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
Databricks Employee
  • 0 kudos

https://kb.databricks.com/data-sources/query-option-not-work-oracle.html#problem-apache-spark-jdbc-datasource-query-option-doesnt-work-for-oracle-database

  • 0 kudos
User16790091296
by Contributor II
  • 2483 Views
  • 1 replies
  • 0 kudos
  • 2483 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16790091296
Contributor II
  • 0 kudos

Databricks starts to charge for DBUs once the virtual machine is up and the Spark context is initialized, which may include a portion of start up costs, but not all. Init scripts are loaded before the Spark context is initialized, which therefore wou...

  • 0 kudos
User16790091296
by Contributor II
  • 1136 Views
  • 1 replies
  • 0 kudos

Does the price increase as I attach more notebooks to the same cluster?

Databricks pricing related question - Do I consume more DBUs when I attach more notebooks to the same cluster?

  • 1136 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16790091296
Contributor II
  • 0 kudos

Hey PJ, short answer is - No, attaching more notebooks does not increase the price of the cluster, which is solely based on compute power. Attaching more notebooks to the cluster is a value-add of the platform.If you're interested, you can find some ...

  • 0 kudos
saninanda
by New Contributor II
  • 12491 Views
  • 7 replies
  • 0 kudos

how to read schema from text file stored in cloud storage

I have file a.csv or a.parquet while creating data frame reading we can explictly define schema with struct type. instead of write the schema in the notebook want to create schema lets say for all my csv i have one schema like csv_schema and stored ...

  • 12491 Views
  • 7 replies
  • 0 kudos
Latest Reply
Nakeman
New Contributor II
  • 0 kudos

@shyampsr big thanks, was searching for the solution almost 3 hours _https://luckycanadian.com/

  • 0 kudos
6 More Replies
Anonymous
by Not applicable
  • 1242 Views
  • 0 replies
  • 0 kudos

foreachBatch in pyspark throwing OSError: [WinError 10022] An invalid argument was supplied

Hello Team,Since last 3 weeks I am trying to move my project from batch to structure streaming.But every time I am running my code I am getting below error:Traceback (most recent call last): File "C:\Users\avisriva1\git_4May2021\comint-ml-scores\src\...

  • 1242 Views
  • 0 replies
  • 0 kudos
User15813097110
by New Contributor III
  • 5540 Views
  • 1 replies
  • 0 kudos
  • 5540 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15813097110
New Contributor III
  • 0 kudos

We can use the below steps to push Cluster Logs to Elastic Search:1. Download the log4j-elasticsearch-java-api repo and build the jar file:git clone https://github.com/Downfy/log4j-elasticsearch-java-api.git cd log4j-elasticsearch-java-api/ mvn clean...

  • 0 kudos
User16871418122
by Contributor III
  • 9581 Views
  • 1 replies
  • 0 kudos

Resolved! How do I download maven libraries with dependencies?

I want to import a maven library with its dependencies. How to do it?

  • 9581 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16871418122
Contributor III
  • 0 kudos

I recommend creating a UBER jar or download jars offline use it in clusters when the maven becomes healthy again: 1. Install the MVN CLI tool on your local mac: brew install mvnvm2. Download the Artifact with all dependencies: mvn dependency:get -Dr...

  • 0 kudos
User15813097110
by New Contributor III
  • 1740 Views
  • 1 replies
  • 0 kudos
  • 1740 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15813097110
New Contributor III
  • 0 kudos

Since the SparkContext is already up and running, it requires a restart. Technically, it might be possible to kill the JVM process and restart it but we do not recommend that approach. In this case, we recommend restarting the cluster so that the Sp...

  • 0 kudos
User16873043212
by New Contributor III
  • 696 Views
  • 0 replies
  • 0 kudos

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks ...

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks pools for driver and workers. It provides a way to support driver vs. worker heterogeneity, and ther...

  • 696 Views
  • 0 replies
  • 0 kudos
FernandoBenedet
by New Contributor
  • 5505 Views
  • 2 replies
  • 0 kudos

Loop through Dataframe in Python

Hello, Imagine you have a dataframe with cols: A, B, C. I want to add a column D based on some calculations of columns B and C of the previous record of the df. Which is the best way of doing this? I am trying to avoid looping through the df. I am u...

  • 5505 Views
  • 2 replies
  • 0 kudos
Latest Reply
quincybatten
New Contributor II
  • 0 kudos

Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a...

  • 0 kudos
1 More Replies
winston12
by New Contributor
  • 14935 Views
  • 5 replies
  • 0 kudos

Connect to Blob storage "no credentials found for them in the configuration"

I'm working with Databricks notebook backed by spark cluster. Having trouble trying to connect to the Azure blob storage. I used this link and tried the section Access Azure Blob Storage Directly - Set up an account access key. I get no errors here:s...

  • 14935 Views
  • 5 replies
  • 0 kudos
Latest Reply
Feder
New Contributor II
  • 0 kudos

I have been facing the same problem over and over. Now trying to follow what's written here (https://docs.databricks.com/data/data-sources/azure/azure-storage.html#access-azure-blob-storage-directly), but always getting "shaded.databricks.org.apache...

  • 0 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels