Data Engineering

Forum Posts

Sorted by:

by MohitAnchlia • New Contributor II

07-15-2021 11:22:31 AM

1533 Views
0 replies
1 kudos

Change AWS storage setting and account

I am seeing a super weird behaviour in databricks. We initially configured the following: 1. Account X in Account Console -> AWS Account arn:aws:iam::X:role/databricks-s3 2. We setup databricks-s3 as S3 bucket in Account Console -> AWS Storage 3. W...

Data Engineering

1533 Views
0 replies
1 kudos

07-15-2021 11:22:31 AM

by Abdus • New Contributor

07-15-2021 10:55:02 AM

1194 Views
0 replies
0 kudos

Apache spark Streaming

When was the last commit done on Spark Streaming

Data Engineering

1194 Views
0 replies
0 kudos

07-15-2021 10:55:02 AM

by TrinaDe • New Contributor II

07-15-2021 8:11:23 AM

5835 Views
1 replies
1 kudos

How can we join two pyspark dataframes side by side (without using join,equivalent to pd.concat() in pandas) ? I am trying to join two extremely large dataframes where each is of the order of 50 million.

My two dataframes look like new_df2_record1 and new_df2_record2 and the expected output dataframe I want is like new_df2: The code I have tried is the following: If I print the top 5 rows of new_df2, it gives the output as expected but I cannot pri...

Data Engineering

5835 Views
1 replies
1 kudos

07-15-2021 8:11:23 AM

View Replies

Latest Reply

TrinaDe
New Contributor II

07-15-2021 8:21:19 AM

1 kudos

The code in a more legible format:

1 kudos

07-15-2021 8:21:19 AM

by AnandNair • New Contributor

07-15-2021 4:45:44 AM

1232 Views
0 replies
0 kudos

Load an explicit schema from an external metadata.csv file or a json file for reading csv's into dataframe

Hi, I have a metadata csv file which contains column name, and datatype such as Colm1: INT Colm2: String. I can also get the same in a json format as shown: I can store this on ADLS. How can I convert this into a schema like: "Myschema" that I can ...

Data Engineering

1232 Views
0 replies
0 kudos

07-15-2021 4:45:44 AM

by Devaraj • New Contributor

07-14-2021 11:22:56 PM

4106 Views
0 replies
0 kudos

Not able to fetch data from Simba Spark Jdbc Driver

We are getting below error when we tried to set the date in preparedstatement using Simba Spark Jdbc Driver. Exception: Query execution failed: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.h...

Data Engineering

4106 Views
0 replies
0 kudos

07-14-2021 11:22:56 PM

by twotwoiscute • New Contributor

07-14-2021 8:00:55 PM

2261 Views
0 replies
0 kudos

PySpark pandas_udf slower than single thread

I used @pandas_udf write a function for speeding up the process(parsing xml file ) and then compare it's speed with single thread , Surprisingly , Using @pandas_udf is two times slower than single-thread code. And the number of xml files I need to p...

Data Engineering

2261 Views
0 replies
0 kudos

07-14-2021 8:00:55 PM

by User16776430979 • Databricks Employee

07-13-2021 3:59:12 PM

1905 Views
0 replies
1 kudos

Repos file size limit - Is it possible to clone a specific branch into Repos?

We refactored our codebase into another branch of our existing repo and consolidated the files so that they should be useable within the Databricks Repos size/file limitations. However, even though the new branch is smaller, I am still getting an err...

Data Engineering

1905 Views
0 replies
1 kudos

07-13-2021 3:59:12 PM

by User16752239289 • Databricks Employee

07-13-2021 10:00:54 AM

2382 Views
1 replies
1 kudos

Resolved! Failed to add S3 init script in job cluster

I use below payload to submit my job that include am init script saved on S3. The instance profile and init script worked on interactive cluster. But when I move to job cluster the init script cannot be configure. { "new_cluster": { "spar...

Data Engineering

2382 Views
1 replies
1 kudos

07-13-2021 10:00:54 AM

View Replies

Latest Reply

User16752239289
Databricks Employee

07-13-2021 11:50:42 AM

1 kudos

It is due to the region is missing. For init script saved in S3, the region field is required. The init script section should be like below :"init_scripts": [ { "s3": { "destination": "s3://<my bucket>...

1 kudos

07-13-2021 11:50:42 AM

by StephanieAlba • Databricks Employee

07-12-2021 8:25:02 AM

1516 Views
1 replies
0 kudos

Can we access cool storage with Databricks?

Data Engineering

1516 Views
1 replies
0 kudos

07-12-2021 8:25:02 AM

View Replies

Latest Reply

StephanieAlba
Databricks Employee

07-12-2021 8:28:35 AM

0 kudos

You can use any storage level except glacier/archive with delta tables.

0 kudos

07-12-2021 8:28:35 AM

by User16790091296 • Databricks Employee

06-24-2021 8:15:30 AM

3592 Views
1 replies
0 kudos

Notebook path can't be in DBFS?

Some of us are working with IDEs and trying to deploy notebooks (.py) files to dbfs. the problem I have noticed is when configuring jobs, those paths are not recognized.notebook_path: If I use this :dbfs:/artifacts/client-state-vector/0.0.0/bootstrap...

Data Engineering

3592 Views
1 replies
0 kudos

06-24-2021 8:15:30 AM

View Replies

Latest Reply

User16752239289
Databricks Employee

07-08-2021 2:49:49 PM

0 kudos

The issue is that the python file saved under DBFS not as a workspace notebook. When you given /artifacts/client-state vector/0.0.0/bootstrap.py, the workspace will search the notebook(python file in this case) under the folder that under Workspace t...

0 kudos

07-08-2021 2:49:49 PM

by User16826994223 • Databricks Employee

06-27-2021 6:29:25 AM

1462 Views
1 replies
0 kudos

Is it possible that only a particular cluster have only access to a s3 bucket or folder in s3

Hi I want to set up a cluster and want to give access to that cluster to some user only those user on that particular cluster should have access to read and write from and to the bucket. that particular bucket is not mounted on the workspace.Is th...

Data Engineering

1462 Views
1 replies
0 kudos

06-27-2021 6:29:25 AM

View Replies

Latest Reply

User16752239289
Databricks Employee

07-08-2021 2:40:30 PM

0 kudos

Yes, you can set up an instance profile that can access the S3 bucket and then only give certain users privilege to use the instance profile. For more details, you can check here

0 kudos

07-08-2021 2:40:30 PM

by StephanieAlba • Databricks Employee

07-06-2021 11:56:40 AM

1863 Views
1 replies
0 kudos

Is the delta schema enforcement flexible?

In the sense that, is it possible to only check for column names or column data types or will it always be both?

Data Engineering

1863 Views
1 replies
0 kudos

07-06-2021 11:56:40 AM

View Replies

Latest Reply

StephanieAlba
Databricks Employee

07-06-2021 12:19:41 PM

0 kudos

No, I do not believe that is possible. However, I would be interested in understanding a use case where that is ideal behavior. How Does Schema Enforcement Work?Delta Lake uses schema validation on write, which means that all new writes to a table ar...

0 kudos

07-06-2021 12:19:41 PM

by brickster_2018 • Databricks Employee

06-25-2021 3:49:18 PM

9573 Views
3 replies
1 kudos

Do we have a way to assign a static IP to a cluster on Databricks

Data Engineering

9573 Views
3 replies
1 kudos

06-25-2021 3:49:18 PM

View Replies

Latest Reply

StephanieAlba
Databricks Employee

07-06-2021 12:00:13 PM

1 kudos

You can get a static IP at the workspace level https://docs.microsoft.com/en-us/azure/databricks/kb/cloud/azure-vnet-single-ip

1 kudos

07-06-2021 12:00:13 PM

2 More Replies

by tthorpe • New Contributor

05-18-2017 4:48:03 AM

68753 Views
3 replies
4 kudos

how do i delete files from the DBFS

I can't see where in the databricks UI that I can delete files that have been either uploaded or saved to the DBFS - how do I do this?

Data Engineering

68753 Views
3 replies
4 kudos

05-18-2017 4:48:03 AM

View Replies

Latest Reply

SophieGou
New Contributor III

11-18-2019 4:04:14 PM

4 kudos

Open a notebook and run the command dbutils.fs.rm("/FileStore/tables/your_table_name.csv") referencing this link https://docs.databricks.com/data/databricks-file-system.html

4 kudos

11-18-2019 4:04:14 PM

2 More Replies

by User16752239289 • Databricks Employee

07-02-2021 9:11:29 AM

4581 Views
1 replies
1 kudos

Resolved! SparkR session failed to initialize

When run sparkR.session()I faced below error:Spark package found in SPARK_HOME: /databricks/spark Launching java with spark-submit command /databricks/spark/bin/spark-submit sparkr-shell /tmp/Rtmp5hnW8G/backend_porte9141208532d Error: Could not f...

Data Engineering

4581 Views
1 replies
1 kudos

07-02-2021 9:11:29 AM

View Replies

Latest Reply

User16752239289
Databricks Employee

07-02-2021 9:18:23 AM

1 kudos

This is due to the when users run their R scripts on Rstudio, the R session is not shut down gracefully. Databricks is working on handle the R session better and removed the limit. As a workaround, you can create and run below init script to increase...

1 kudos

07-02-2021 9:18:23 AM

Databricks Community

Forum Posts

Change AWS storage setting and account

Apache spark Streaming

How can we join two pyspark dataframes side by side (without using join,equivalent to pd.concat() in pandas) ? I am trying to join two extremely large dataframes where each is of the order of 50 million.

Load an explicit schema from an external metadata.csv file or a json file for reading csv's into dataframe

Not able to fetch data from Simba Spark Jdbc Driver

PySpark pandas_udf slower than single thread

Repos file size limit - Is it possible to clone a specific branch into Repos?

Resolved! Failed to add S3 init script in job cluster

Can we access cool storage with Databricks?

Notebook path can't be in DBFS?

Is it possible that only a particular cluster have only access to a s3 bucket or folder in s3

Is the delta schema enforcement flexible?

Do we have a way to assign a static IP to a cluster on Databricks

how do i delete files from the DBFS

Resolved! SparkR session failed to initialize

Join Us as a Local Community Builder!

Serverless Compute Access Restriction Not Supporte...

Liquid Clustering and S3 Performance

Strange DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_...

DATABRICKS LAKEFLOW SQL SERVER INGESTION PIPELINE ...

Azure databricks streamlit app unity catalog acces...