cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ShriS1221
by New Contributor II
  • 3213 Views
  • 5 replies
  • 1 kudos

Resolved! Removing new line character from spark dataframe column

I have to remove new line character from entire column of a dataframe , I tried with regex_replace but its not working.Help me on this.​

  • 3213 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Shriram S​ , Please try the following code:import pandas as pd   df=pd.read_csv("xyz.csv")   col = ["columns_of _the _dataframes_that_needs_modification"]   new_df = df[col].replace('\\n',' ', regex=True)   new_df.to_csv("newFile.csv", index=Fals...

  • 1 kudos
4 More Replies
GabrieleMuciacc
by New Contributor III
  • 2431 Views
  • 7 replies
  • 2 kudos

Resolved! Support for kwargs parameter in `/2.1/jobs/create` endpoint for `python_wheel_task`

If I create a job from the web UI and I select Python wheel, I can add kwargs parameters. Judging from the generated JSON job description, they appear under a section named `namedParameters`.However, if I use the REST APIs to create a job, it appears...

  • 2431 Views
  • 7 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Gabriele Muciaccia​ , Does @Rajeev Kumar​ 's solution answer your question?If yes, would you like to mark his answer as the best?

  • 2 kudos
6 More Replies
gaurav_khanna
by New Contributor II
  • 2390 Views
  • 3 replies
  • 3 kudos
  • 2390 Views
  • 3 replies
  • 3 kudos
Latest Reply
User16753725182
Contributor III
  • 3 kudos

Hi @gaurav khanna​ , this may happen if you do not have appropriate level permissions on the cluster or you may only have 'Can read' on a notebook that doesn't allow detach and reattach.https://docs.databricks.com/security/access-control/cluster-acl....

  • 3 kudos
2 More Replies
Anonymous
by Not applicable
  • 3070 Views
  • 9 replies
  • 2 kudos

Resolved! Issue in creating workspace - Custom AWS Configuration

We have tried to create new workspace using "Custom AWS Configuration" and we have given our own VPC (Customer managed VPC) and tried but workspace failed to launch. We are getting below error which couldn't understand where the issue is in.Workspace...

  • 3070 Views
  • 9 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Mitesh Patel​ - As Atanu thinks the issue may be resolved, I wanted to check in with you, also. How goes it?

  • 2 kudos
8 More Replies
lecardozo
by New Contributor II
  • 2886 Views
  • 6 replies
  • 1 kudos

Resolved! Problems with HiveMetastoreClient and internal Databricks Metastore.

I've been trying to use ​the HiveMetastoreClient class in Scala to extract some metadata from Databricks internal Metastore, without success. I'm currently using the 7.3 LTS runtime.​The error seems to be related to some kind of inconsistency between...

  • 2886 Views
  • 6 replies
  • 1 kudos
Latest Reply
lecardozo
New Contributor II
  • 1 kudos

Thanks for the reference, @Atanu Sarkar​ .​Seems a little odd to me that I'd need to change the internal Databricks Metastore table to add a column expected by the client default Scala client. I'm afraid this could cause issues with other users/jobs ...

  • 1 kudos
5 More Replies
irfanaziz
by Contributor II
  • 3601 Views
  • 4 replies
  • 0 kudos

Resolved! If two Data Factory pipelines are run at the same time or share a window of execution do they share the Databricks spark cluster(if both have the same linked service)? ( job clusters are those that are create on the go, defined in the linked service).

Continuing the above case, does that mean if i have several like 5 ADF pipelines scheduled regularly at the same time, its better to use an existing cluster as all of the ADF pipelines would share the same cluster and hence the cost will be lower?

  • 3601 Views
  • 4 replies
  • 0 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 0 kudos

for adf or job run we always prefer job cluster. but for streaming, you may consider using interactive cluster . but anyway you need to monitor the cluster load, if loads are high there will be chance to job slowness as well as failure. also data siz...

  • 0 kudos
3 More Replies
gibbona1
by New Contributor II
  • 2381 Views
  • 5 replies
  • 1 kudos

Resolved! Correct setup and format for calling REST API for image classification

I trained a basic image classification model on MNIST using Tensorflow, logging the experiment run with MLflow.Model: "my_sequential" _________________________________________________________________ Layer (type) Output Shape ...

mnist_model_error
  • 2381 Views
  • 5 replies
  • 1 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 1 kudos

@Anthony Gibbons​  may be this git should work with your use case - https://github.com/mlflow/mlflow/issues/1661

  • 1 kudos
4 More Replies
matt_t
by New Contributor
  • 1804 Views
  • 3 replies
  • 1 kudos

Resolved! S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.The only problem is some of the files are giving this error that t...

  • 1804 Views
  • 3 replies
  • 1 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 1 kudos

@Matthew Tribby​  does above suggestion work. Please let us know if you need further help on this. Thanks.

  • 1 kudos
2 More Replies
bonjih
by New Contributor
  • 4647 Views
  • 3 replies
  • 3 kudos

Resolved! AttributeError: module 'dbutils' has no attribute 'fs'

Hi,Using db in SageMaker to connect EC2 to S3. Following other examples I get 'AttributeError: module 'dbutils' has no attribute 'fs'....I guess Im missing an import?

  • 4647 Views
  • 3 replies
  • 3 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 3 kudos

agree with @Werner Stinckens​  . also may try importing dbutils - @ben Hamilton​ 

  • 3 kudos
2 More Replies
jstatic
by New Contributor II
  • 1838 Views
  • 5 replies
  • 1 kudos

Resolved! Quick way to know delta table is zordered

Hello,I created a delta table table using SQL and specifying the partitioning and zorder strategy. I then loaded data into it for the first time by doing a write as delta with mode of append and save as table. However, I don’t know of a way to verify...

  • 1838 Views
  • 5 replies
  • 1 kudos
Latest Reply
User16763506477
Contributor III
  • 1 kudos

If there is no data then lines 10 and 11 will not have any impact. I am assuming that line (1-5) is creating an empty table but the actual load is happening when you do df.write operation. Also delta.autoOptimize.autoCompact will not trigger the z-or...

  • 1 kudos
4 More Replies
hari
by Contributor
  • 3336 Views
  • 8 replies
  • 4 kudos

Resolved! How to write Change Data from Delta Lake to aws dynamodb

Is there some direct way to write data from DeltaLake to AWS DynamoDB.If there is none, Is there any way to do the same.

  • 3336 Views
  • 8 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @Harikrishnan P H​ ,Did @Werner Stinckens​ reply help you to resolved your issue? if yes, please mark it as best. if not, please let us know.

  • 4 kudos
7 More Replies
Vibhor
by Contributor
  • 2550 Views
  • 6 replies
  • 2 kudos

Resolved! Databricks Data Type Conversion error

In databricks while writing data to curated layer, see error - Failed to execute user defined function (Double => decimal(38,18)). Does anyone know if faced such issue and how to resolve it.

  • 2550 Views
  • 6 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Vibhor Sethi​ , Can you tell me your dbr version?

  • 2 kudos
5 More Replies
Anonymous
by Not applicable
  • 364 Views
  • 1 replies
  • 2 kudos

The Next Databricks Office HoursOur next Office Hours session is scheduled for March 23 2022 - 8:00 am PDT Do you have questions about how to set up o...

The Next Databricks Office HoursOur next Office Hours session is scheduled for March 23 2022 - 8:00 am PDTDo you have questions about how to set up or use Databricks? Do you want to get best practices for deploying your use case or tips on data archi...

  • 364 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Signed in!

  • 2 kudos
bchaubey
by Contributor II
  • 1035 Views
  • 1 replies
  • 0 kudos
  • 1035 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16764241763
Honored Contributor
  • 0 kudos

@Bhagwan Chaubey​ May be you can give this a try, if this is a Blob Storage account.https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=environment-variable-windowsFor Datalake storage, please try belowhttps://do...

  • 0 kudos
Santosh09
by New Contributor II
  • 3404 Views
  • 5 replies
  • 3 kudos

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

Spark data frame with text data when schema is in Struct type spark is taking too much time to write / save / push data to ADLS or SQL Db or download as csv.

image.png
  • 3404 Views
  • 5 replies
  • 3 kudos
Latest Reply
User16764241763
Honored Contributor
  • 3 kudos

@shiva Santosh​ Have to checked the count of the dataframe that you are trying to save to ADLS?As @Joseph Kambourakis​  mentioned the explode can result in 1-many rows, better to check data frame count and see if Spark OOMs in the workspace.

  • 3 kudos
4 More Replies
Labels
Top Kudoed Authors