Data Engineering

Forum Posts

Sorted by:

by ShriS1221 • New Contributor II

02-11-2022 11:47:03 AM

3213 Views
5 replies
1 kudos

Resolved! Removing new line character from spark dataframe column

I have to remove new line character from entire column of a dataframe , I tried with regex_replace but its not working.Help me on this.

Data Engineering

3213 Views
5 replies
1 kudos

02-11-2022 11:47:03 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-15-2022 12:19:37 AM

1 kudos

Hi @Shriram S , Please try the following code:import pandas as pd df=pd.read_csv("xyz.csv") col = ["columns_of _the _dataframes_that_needs_modification"] new_df = df[col].replace('\\n',' ', regex=True) new_df.to_csv("newFile.csv", index=Fals...

1 kudos

02-15-2022 12:19:37 AM

4 More Replies

by GabrieleMuciacc • New Contributor III

02-14-2022 9:30:11 AM

2431 Views
7 replies
2 kudos

Resolved! Support for kwargs parameter in `/2.1/jobs/create` endpoint for `python_wheel_task`

If I create a job from the web UI and I select Python wheel, I can add kwargs parameters. Judging from the generated JSON job description, they appear under a section named `namedParameters`.However, if I use the REST APIs to create a job, it appears...

Data Engineering

2431 Views
7 replies
2 kudos

02-14-2022 9:30:11 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-16-2022 1:57:24 AM

2 kudos

Hi @Gabriele Muciaccia , Does @Rajeev Kumar 's solution answer your question?If yes, would you like to mark his answer as the best?

2 kudos

03-16-2022 1:57:24 AM

6 More Replies

by gaurav_khanna • New Contributor II

03-12-2022 5:54:13 AM

2390 Views
3 replies
3 kudos

Resolved! Notebook is not attaching to a cluster, asks to contact your administrator. Completely stumped. Please help.

Data Engineering

2390 Views
3 replies
3 kudos

03-12-2022 5:54:13 AM

View Replies

Latest Reply

User16753725182
Contributor III

03-16-2022 4:13:11 AM

3 kudos

Hi @gaurav khanna , this may happen if you do not have appropriate level permissions on the cluster or you may only have 'Can read' on a notebook that doesn't allow detach and reattach.https://docs.databricks.com/security/access-control/cluster-acl....

3 kudos

03-16-2022 4:13:11 AM

2 More Replies

by Anonymous • Not applicable

02-10-2022 4:10:04 AM

3070 Views
9 replies
2 kudos

Resolved! Issue in creating workspace - Custom AWS Configuration

We have tried to create new workspace using "Custom AWS Configuration" and we have given our own VPC (Customer managed VPC) and tried but workspace failed to launch. We are getting below error which couldn't understand where the issue is in.Workspace...

Data Engineering

3070 Views
9 replies
2 kudos

02-10-2022 4:10:04 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2022 1:57:43 PM

2 kudos

@Mitesh Patel - As Atanu thinks the issue may be resolved, I wanted to check in with you, also. How goes it?

2 kudos

03-08-2022 1:57:43 PM

8 More Replies

by lecardozo • New Contributor II

02-17-2022 7:22:57 AM

2886 Views
6 replies
1 kudos

Resolved! Problems with HiveMetastoreClient and internal Databricks Metastore.

I've been trying to use the HiveMetastoreClient class in Scala to extract some metadata from Databricks internal Metastore, without success. I'm currently using the 7.3 LTS runtime.The error seems to be related to some kind of inconsistency between...

Data Engineering

2886 Views
6 replies
1 kudos

02-17-2022 7:22:57 AM

View Replies

Latest Reply

lecardozo
New Contributor II

03-04-2022 9:28:58 AM

1 kudos

Thanks for the reference, @Atanu Sarkar .Seems a little odd to me that I'd need to change the internal Databricks Metastore table to add a column expected by the client default Scala client. I'm afraid this could cause issues with other users/jobs ...

1 kudos

03-04-2022 9:28:58 AM

5 More Replies

by irfanaziz • Contributor II

02-08-2022 6:51:27 AM

3601 Views
4 replies
0 kudos

Resolved! If two Data Factory pipelines are run at the same time or share a window of execution do they share the Databricks spark cluster(if both have the same linked service)? ( job clusters are those that are create on the go, defined in the linked service).

Continuing the above case, does that mean if i have several like 5 ADF pipelines scheduled regularly at the same time, its better to use an existing cluster as all of the ADF pipelines would share the same cluster and hence the cost will be lower?

Data Engineering

3601 Views
4 replies
0 kudos

02-08-2022 6:51:27 AM

View Replies

Latest Reply

Atanu
Esteemed Contributor

03-15-2022 10:03:59 PM

0 kudos

for adf or job run we always prefer job cluster. but for streaming, you may consider using interactive cluster . but anyway you need to monitor the cluster load, if loads are high there will be chance to job slowness as well as failure. also data siz...

0 kudos

03-15-2022 10:03:59 PM

3 More Replies

by gibbona1 • New Contributor II

02-07-2022 8:28:46 AM

2381 Views
5 replies
1 kudos

Resolved! Correct setup and format for calling REST API for image classification

I trained a basic image classification model on MNIST using Tensorflow, logging the experiment run with MLflow.Model: "my_sequential" _________________________________________________________________ Layer (type) Output Shape ...

Data Engineering

2381 Views
5 replies
1 kudos

02-07-2022 8:28:46 AM

View Replies

Latest Reply

Atanu
Esteemed Contributor

03-15-2022 9:40:04 PM

1 kudos

@Anthony Gibbons may be this git should work with your use case - https://github.com/mlflow/mlflow/issues/1661

1 kudos

03-15-2022 9:40:04 PM

4 More Replies

by matt_t • New Contributor

02-17-2022 12:54:21 PM

1804 Views
3 replies
1 kudos

Resolved! S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.The only problem is some of the files are giving this error that t...

Data Engineering

1804 Views
3 replies
1 kudos

02-17-2022 12:54:21 PM

View Replies

Latest Reply

Atanu
Esteemed Contributor

03-15-2022 9:38:16 PM

1 kudos

@Matthew Tribby does above suggestion work. Please let us know if you need further help on this. Thanks.

1 kudos

03-15-2022 9:38:16 PM

2 More Replies

by bonjih • New Contributor

02-19-2022 6:38:32 PM

4647 Views
3 replies
3 kudos

Resolved! AttributeError: module 'dbutils' has no attribute 'fs'

Hi,Using db in SageMaker to connect EC2 to S3. Following other examples I get 'AttributeError: module 'dbutils' has no attribute 'fs'....I guess Im missing an import?

Data Engineering

4647 Views
3 replies
3 kudos

02-19-2022 6:38:32 PM

View Replies

Latest Reply

Atanu
Esteemed Contributor

03-15-2022 9:36:29 PM

3 kudos

agree with @Werner Stinckens . also may try importing dbutils - @ben Hamilton

3 kudos

03-15-2022 9:36:29 PM

2 More Replies

by jstatic • New Contributor II

02-16-2022 5:13:17 AM

1838 Views
5 replies
1 kudos

Resolved! Quick way to know delta table is zordered

Hello,I created a delta table table using SQL and specifying the partitioning and zorder strategy. I then loaded data into it for the first time by doing a write as delta with mode of append and save as table. However, I don’t know of a way to verify...

Data Engineering

1838 Views
5 replies
1 kudos

02-16-2022 5:13:17 AM

View Replies

Latest Reply

User16763506477
Contributor III

03-14-2022 10:48:11 PM

1 kudos

If there is no data then lines 10 and 11 will not have any impact. I am assuming that line (1-5) is creating an empty table but the actual load is happening when you do df.write operation. Also delta.autoOptimize.autoCompact will not trigger the z-or...

1 kudos

03-14-2022 10:48:11 PM

4 More Replies

by hari • Contributor

02-14-2022 4:47:14 AM

3336 Views
8 replies
4 kudos

Resolved! How to write Change Data from Delta Lake to aws dynamodb

Is there some direct way to write data from DeltaLake to AWS DynamoDB.If there is none, Is there any way to do the same.

Data Engineering

3336 Views
8 replies
4 kudos

02-14-2022 4:47:14 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

03-14-2022 5:55:55 PM

4 kudos

Hi @Harikrishnan P H ,Did @Werner Stinckens reply help you to resolved your issue? if yes, please mark it as best. if not, please let us know.

4 kudos

03-14-2022 5:55:55 PM

7 More Replies

by Vibhor • Contributor

12-07-2021 11:10:38 AM

2550 Views
6 replies
2 kudos

Resolved! Databricks Data Type Conversion error

In databricks while writing data to curated layer, see error - Failed to execute user defined function (Double => decimal(38,18)). Does anyone know if faced such issue and how to resolve it.

Data Engineering

2550 Views
6 replies
2 kudos

12-07-2021 11:10:38 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-20-2022 11:43:30 PM

2 kudos

Hi @Vibhor Sethi , Can you tell me your dbr version?

2 kudos

01-20-2022 11:43:30 PM

5 More Replies

by Anonymous • Not applicable

03-14-2022 10:41:34 AM

364 Views
1 replies
2 kudos

The Next Databricks Office HoursOur next Office Hours session is scheduled for March 23 2022 - 8:00 am PDT Do you have questions about how to set up o...

The Next Databricks Office HoursOur next Office Hours session is scheduled for March 23 2022 - 8:00 am PDTDo you have questions about how to set up or use Databricks? Do you want to get best practices for deploying your use case or tips on data archi...

Data Engineering

364 Views
1 replies
2 kudos

03-14-2022 10:41:34 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-14-2022 3:18:53 PM

2 kudos

Signed in!

2 kudos

03-14-2022 3:18:53 PM

by bchaubey • Contributor II

01-05-2022 12:38:32 AM

1035 Views
1 replies
0 kudos

How to read Azure storage data through databricks in python . Could you please help me.

Data Engineering

1035 Views
1 replies
0 kudos

01-05-2022 12:38:32 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

03-14-2022 8:53:08 AM

0 kudos

@Bhagwan Chaubey May be you can give this a try, if this is a Blob Storage account.https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=environment-variable-windowsFor Datalake storage, please try belowhttps://do...

0 kudos

03-14-2022 8:53:08 AM

by Santosh09 • New Contributor II

01-18-2022 3:07:25 AM

3404 Views
5 replies
3 kudos

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

Spark data frame with text data when schema is in Struct type spark is taking too much time to write / save / push data to ADLS or SQL Db or download as csv.

Data Engineering

3404 Views
5 replies
3 kudos

01-18-2022 3:07:25 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

03-14-2022 8:27:45 AM

3 kudos

@shiva Santosh Have to checked the count of the dataframe that you are trying to save to ADLS?As @Joseph Kambourakis mentioned the explode can result in 1-many rows, better to check data frame count and see if Spark OOMs in the workspace.

3 kudos

03-14-2022 8:27:45 AM

4 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Removing new line character from spark dataframe column

Resolved! Support for kwargs parameter in `/2.1/jobs/create` endpoint for `python_wheel_task`

Resolved! Notebook is not attaching to a cluster, asks to contact your administrator. Completely stumped. Please help.

Resolved! Issue in creating workspace - Custom AWS Configuration

Resolved! Problems with HiveMetastoreClient and internal Databricks Metastore.

Resolved! If two Data Factory pipelines are run at the same time or share a window of execution do they share the Databricks spark cluster(if both have the same linked service)? ( job clusters are those that are create on the go, defined in the linked service).

Resolved! Correct setup and format for calling REST API for image classification

Resolved! S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

Resolved! AttributeError: module 'dbutils' has no attribute 'fs'

Resolved! Quick way to know delta table is zordered

Resolved! How to write Change Data from Delta Lake to aws dynamodb

Resolved! Databricks Data Type Conversion error

The Next Databricks Office HoursOur next Office Hours session is scheduled for March 23 2022 - 8:00 am PDT Do you have questions about how to set up o...

How to read Azure storage data through databricks in python . Could you please help me.

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...