cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AnandNair
by New Contributor
  • 1053 Views
  • 0 replies
  • 0 kudos

Load an explicit schema from an external metadata.csv file or a json file for reading csv's into dataframe

Hi, I have a metadata csv file which contains column name, and datatype such as Colm1: INT Colm2: String. I can also get the same in a json format as shown: I can store this on ADLS. How can I convert this into a schema like: "Myschema" that I can ...

  • 1053 Views
  • 0 replies
  • 0 kudos
Devaraj
by New Contributor
  • 3723 Views
  • 0 replies
  • 0 kudos

Not able to fetch data from Simba Spark Jdbc Driver

We are getting below error when we tried to set the date in preparedstatement using Simba Spark Jdbc Driver. Exception: Query execution failed: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.h...

  • 3723 Views
  • 0 replies
  • 0 kudos
twotwoiscute
by New Contributor
  • 1854 Views
  • 0 replies
  • 0 kudos

PySpark pandas_udf slower than single thread

I used @pandas_udf write a function for speeding up the process(parsing xml file ) and then compare it's speed with single thread , Surprisingly , Using @pandas_udf is two times slower than single-thread code. And the number of xml files I need to p...

  • 1854 Views
  • 0 replies
  • 0 kudos
User16776430979
by New Contributor III
  • 1599 Views
  • 0 replies
  • 1 kudos

Repos file size limit - Is it possible to clone a specific branch into Repos?

We refactored our codebase into another branch of our existing repo and consolidated the files so that they should be useable within the Databricks Repos size/file limitations. However, even though the new branch is smaller, I am still getting an err...

  • 1599 Views
  • 0 replies
  • 1 kudos
User16752239289
by Databricks Employee
  • 2025 Views
  • 1 replies
  • 1 kudos

Resolved! Failed to add S3 init script in job cluster

I use below payload to submit my job that include am init script saved on S3. The instance profile and init script worked on interactive cluster. But when I move to job cluster the init script cannot be configure. { "new_cluster": { "spar...

  • 2025 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Databricks Employee
  • 1 kudos

It is due to the region is missing. For init script saved in S3, the region field is required. The init script section should be like below :"init_scripts": [ { "s3": { "destination": "s3://<my bucket>...

  • 1 kudos
User16790091296
by Contributor II
  • 3191 Views
  • 1 replies
  • 0 kudos

Notebook path can't be in DBFS?

Some of us are working with IDEs and trying to deploy notebooks (.py) files to dbfs. the problem I have noticed is when configuring jobs, those paths are not recognized.notebook_path: If I use this :dbfs:/artifacts/client-state-vector/0.0.0/bootstrap...

  • 3191 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752239289
Databricks Employee
  • 0 kudos

The issue is that the python file saved under DBFS not as a workspace notebook. When you given /artifacts/client-state vector/0.0.0/bootstrap.py, the workspace will search the notebook(python file in this case) under the folder that under Workspace t...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1223 Views
  • 1 replies
  • 0 kudos

Is it possible that only a particular cluster have only access to a s3 bucket or folder in s3

Hi I want to set up a cluster and want to give access to that cluster to some user only those user on that particular cluster should have access to read and write from and to the bucket. that particular bucket is not mounted on the workspace.Is th...

  • 1223 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752239289
Databricks Employee
  • 0 kudos

Yes, you can set up an instance profile that can access the S3 bucket and then only give certain users privilege to use the instance profile. For more details, you can check here

  • 0 kudos
StephanieAlba
by Databricks Employee
  • 1475 Views
  • 1 replies
  • 0 kudos

Is the delta schema enforcement flexible?

 In the sense that, is it possible to only check for column names or column data types or will it always be both?

  • 1475 Views
  • 1 replies
  • 0 kudos
Latest Reply
StephanieAlba
Databricks Employee
  • 0 kudos

No, I do not believe that is possible. However, I would be interested in understanding a use case where that is ideal behavior. How Does Schema Enforcement Work?Delta Lake uses schema validation on write, which means that all new writes to a table ar...

  • 0 kudos
tthorpe
by New Contributor
  • 65187 Views
  • 3 replies
  • 4 kudos

how do i delete files from the DBFS

I can't see where in the databricks UI that I can delete files that have been either uploaded or saved to the DBFS - how do I do this?

  • 65187 Views
  • 3 replies
  • 4 kudos
Latest Reply
SophieGou
New Contributor III
  • 4 kudos

Open a notebook and run the command dbutils.fs.rm("/FileStore/tables/your_table_name.csv") referencing this link https://docs.databricks.com/data/databricks-file-system.html

  • 4 kudos
2 More Replies
User16752239289
by Databricks Employee
  • 3904 Views
  • 1 replies
  • 1 kudos

Resolved! SparkR session failed to initialize

When run sparkR.session()I faced below error:Spark package found in SPARK_HOME: /databricks/spark   Launching java with spark-submit command /databricks/spark/bin/spark-submit sparkr-shell /tmp/Rtmp5hnW8G/backend_porte9141208532d   Error: Could not f...

  • 3904 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Databricks Employee
  • 1 kudos

This is due to the when users run their R scripts on Rstudio, the R session is not shut down gracefully. Databricks is working on handle the R session better and removed the limit. As a workaround, you can create and run below init script to increase...

  • 1 kudos
MikeBrewer
by New Contributor II
  • 21007 Views
  • 3 replies
  • 0 kudos

Am trying to use SQL, but createOrReplaceTempView("myDataView")​ fails

Am trying to use SQL, but createOrReplaceTempView("myDataView") fails. I can create and display a DataFrame fine... import pandas as pd df = pd.DataFrame(['$3,000,000.00','$3,000.00', '$200.5', '$5.5'], columns = ['Amount']) df I add another cell, ...

  • 21007 Views
  • 3 replies
  • 0 kudos
Latest Reply
sachinthana
New Contributor II
  • 0 kudos

This is worked for me. Thank you @acorson​ 

  • 0 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 1487 Views
  • 1 replies
  • 0 kudos

What are the best practices for Adaptive query execution

What are common configurations used and which workload will get benefit

  • 1487 Views
  • 1 replies
  • 0 kudos
Latest Reply
amr
Databricks Employee
  • 0 kudos

Leave it turned on. the bet is with each Spark version released AQE will get better and better and eventually will lead to a much more performance optimisation plan than manually trying to tune it.

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels