cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sanjay
by Valued Contributor II
  • 13696 Views
  • 13 replies
  • 10 kudos

Spark tasks too slow and not doing parellel processing

Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...

  • 13696 Views
  • 13 replies
  • 10 kudos
Latest Reply
plondon
New Contributor II
  • 10 kudos

Will it be any different if using Spark but within Azure, i.e. faster? 

  • 10 kudos
12 More Replies
Yyyyy
by New Contributor III
  • 1266 Views
  • 3 replies
  • 2 kudos

showing only a limited number of lines from the CSV file

Expected no of lines is - 16400 Showing only 20 No of records Script spark.conf.set(     "REDACTED",     "REDACTED" ) # File location file_location = "REDACTED" # Read in the data to dataframe df df = spark.read.format("CSV").option("inferSchema",...

  • 1266 Views
  • 3 replies
  • 2 kudos
Latest Reply
Yyyyy
New Contributor III
  • 2 kudos

 hi, pls look help mespark.conf.set(    "REDACTED",    "REDACTED")# File locationfile_location = "REDACTED"# Read in the data to dataframe dfdf = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",")...

  • 2 kudos
2 More Replies
stefano0929
by New Contributor II
  • 405 Views
  • 0 replies
  • 0 kudos

Error 301 Moved Permanently in cells of plotting

Hi, I created a workbook for academic purposes and had completed it... from one moment to the next all the plot cells of charts (and only those) started returning the following error and I really don't know how to solve it by today.Failed to store th...

  • 405 Views
  • 0 replies
  • 0 kudos
Bhabs
by New Contributor
  • 480 Views
  • 1 replies
  • 0 kudos

Replace one tag in a Jason file in the data bricks table .

 There is a column (src_json) in emp_table . I need to replace (ages to age )in each json in the src_json column in emp_table.Can you pls suggest the best way to do it .

  • 480 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Bhabs ,You can do it in following way (assuming that src_json contains json string):from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr spark = SparkSession.builder.appName("Replace JSON Keys").getOrCreate() data = ...

  • 0 kudos
Olaoye_Somide
by New Contributor III
  • 1070 Views
  • 1 replies
  • 0 kudos

AutoLoader File Notification Setup on AWS

I’m encountering issues setting up Databricks AutoLoader in File Notification mode. The error seems to be related to UC access to the S3 bucket. I have tried running it on a single-node dedicated cluster but no luck.Any guidance or assistance on reso...

  • 1070 Views
  • 1 replies
  • 0 kudos
Latest Reply
Olaoye_Somide
New Contributor III
  • 0 kudos

Thanks @Retired_mod. I have reviewed all the steps mentioned, including the IAM policy, as per the setup guide. I believe the permissions granted are sufficient for the setup.To validate the permissions, I used IAM credentials with Admin privileges i...

  • 0 kudos
Sudharsan24
by New Contributor II
  • 1539 Views
  • 2 replies
  • 2 kudos

Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

While ingesting data from Oracle to databricks(writing into ADLS) using jdbc I am getting connection reset by peer error when ingesting a large table which has millions of rows.I am using oracle sql developer and azure databricks.I tried every way li...

  • 1539 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 2 kudos

Try using this code .import pyspark from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder.appName("OracleToDatabricks").getOrCreate() # Oracle connection properties conn = "jdbc:oracle:thin:@//<host>:<port>/<s...

  • 2 kudos
1 More Replies
Mehdi-LAMRANI
by New Contributor II
  • 5062 Views
  • 2 replies
  • 2 kudos

Resolved! Upload file from local file system to DBFS (2024)

Recent changes to the worskpace UI (and introduction of Unity Catalog) seem to have discretely sunset the ability to upload data directly to DBFS from the local Filesystem using the UI (NOT the CLI)I want to be able to load a raw file (no matter the ...

  • 5062 Views
  • 2 replies
  • 2 kudos
Latest Reply
pavithra
New Contributor III
  • 2 kudos

not working in community edition 

  • 2 kudos
1 More Replies
DatabricksHero
by New Contributor II
  • 2501 Views
  • 2 replies
  • 0 kudos

Unity Catalog 2.1 API Not Returning SQL Function/View Dependencies

Hi all,I have a problem with reading responses generated by Unity Catalog API 2.1 as they are missing fields that are otherwise described in the specification:List functions - The fields routine_dependencies, return_params, and input_params are missi...

Data Engineering
API
sql
Unity Catalog
  • 2501 Views
  • 2 replies
  • 0 kudos
Latest Reply
vyas
New Contributor II
  • 0 kudos

Hi @Retired_mod , I have the same issue as @DatabricksHero .View dependencies are not returned. Could you clarify the usage of this API call?

  • 0 kudos
1 More Replies
ashkd7310
by New Contributor II
  • 1132 Views
  • 2 replies
  • 4 kudos

date type conversion error

Hello,I am trying to convert the date in MM/dd/yyyy format. So I am first using the date_format function and converting the date into MM/dd/yyyy. So it becomes string. However, my use case is to have the data as date. so I am again converting the str...

  • 1132 Views
  • 2 replies
  • 4 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 4 kudos

Check with this method if it works.# Convert date to MM/dd/yyyy format (string) df = df.withColumn("formatted_date", date_format("date", "MM/dd/yyyy")) # Convert string back to date df = df.withColumn("converted_date", to_date("formatted_date", "MM/...

  • 4 kudos
1 More Replies
DataEnginerrOO
by New Contributor III
  • 2737 Views
  • 4 replies
  • 2 kudos

Error while trying to install jdbc8.jar

Hi,I am attempting to connect to an Oracle server. I tried to install the ojdbc8.jar library, but I encountered an error: "Library installation attempted on the driver node of cluster 0718-101257-h5k9c5ud failed. Please refer to the following error m...

  • 2737 Views
  • 4 replies
  • 2 kudos
prith
by New Contributor III
  • 4062 Views
  • 7 replies
  • 1 kudos

Resolved! Datbricks JDK 17 upgrade error

We tried upgrading to JDK 17Using Spark version 3.0.5 and runtime 14.3 LTSGetting this exception using parallelstream()With Java 17 I am not able to parallel process different partitions at the same time.  This means when there is more than 1 partiti...

  • 4062 Views
  • 7 replies
  • 1 kudos
Latest Reply
prith
New Contributor III
  • 1 kudos

Anyways - thanks for your response - We found a workaround for this error and JDK 17 is actually working - it appears faster than JDK 8

  • 1 kudos
6 More Replies
mb1234
by New Contributor
  • 629 Views
  • 1 replies
  • 1 kudos

Error using curl within a job

I have a notebook that, as a first step, needs to download and install some drivers. The actual code is this:%sh# Install gdebi command line toolapt-get -y install gdebi-core# Install Posit professional driverscurl -LO https://cdn.rstudio.com/drivers...

  • 629 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @mb1234 ,What error did you get?Edit: I've checked and it worked in job  

  • 1 kudos
pernilak
by New Contributor III
  • 3368 Views
  • 1 replies
  • 2 kudos

Working with Unity Catalog from VSCode using the Databricks Extension

Hi!As suggested by Databricks, we are working with Databricks from VSCode using Databricks bundles for our deployment and using the VSCode Databricks Extension and Databricks Connect during development.However, there are some limitations that we are ...

  • 3368 Views
  • 1 replies
  • 2 kudos
Latest Reply
rustam
New Contributor II
  • 2 kudos

Thank you for the detailed reply, @Retired_mod and the great question @pernilak!I would also like to code and debug in VS Code while all the code in my Jupyter notebooks can be executed on a databricks cluster cell by cell with access to the data in ...

  • 2 kudos
leungi
by Contributor
  • 2340 Views
  • 1 replies
  • 0 kudos

Spark Out of Memory Error

BackgroundUsing R language's {sparklyr} package to fetch data from tables in Unity Catalog, and faced the error below.Tried the following, to no avail:Using memory optimized cluster - e.g., E4d.Using bigger (RAM) cluster - e.g., E8d.Enable auto-scali...

  • 2340 Views
  • 1 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels