Data Engineering

Forum Posts

Sorted by:

by Anonymous • Not applicable

06-22-2021 6:13:00 PM

3594 Views
2 replies
0 kudos

How to read a compressed file in spark if the filename does not include the file extension for that compression format?

For example, let's say I have a file called some-file, which is a gzipped text file. If I try spark.read.text('some-file'), it will return a bunch of gibberish since it doesn't know that the file is gzipped. I'm looking to manually tell spark the fil...

Data Engineering

3594 Views
2 replies
0 kudos

06-22-2021 6:13:00 PM

View Replies

Latest Reply

Francie
New Contributor II

03-13-2022 7:24:02 AM

0 kudos

The community is field for the approval of the terms. The struggle of a great site is recommend for the norms. The value is suggested for the top of the vital paths for the finding members.

0 kudos

03-13-2022 7:24:02 AM

1 More Replies

by prasadvaze • Valued Contributor

10-30-2021 10:20:53 AM

11223 Views
14 replies
6 kudos

Resolved! How to query delta lake using SQL desktop tools like SSMS or DBVisualizer

Is there a way to use sql desktop tools? because delta OSS or databricks does not provide desktop client (similar to azure data studio) to browse and query delta lake objects.I currently use databricks SQL , a webUI in the databricks workspace but se...

Data Engineering

11223 Views
14 replies
6 kudos

10-30-2021 10:20:53 AM

View Replies

Latest Reply

prasadvaze
Valued Contributor

03-12-2022 11:38:40 AM

6 kudos

DSR is Delta Standalone Reader. see more here - https://docs.delta.io/latest/delta-standalone.htmlIts a crate (and also now a py library) that allows you to connect to delta tables without using spark (e.g. directly from python and not using pyspa...

6 kudos

03-12-2022 11:38:40 AM

13 More Replies

by rajib76 • New Contributor II

03-11-2022 12:20:13 PM

1430 Views
1 replies
2 kudos

Resolved! DBFS with Google Cloud Storage(GCS)

Does DBFS support GCS?

Data Engineering

1430 Views
1 replies
2 kudos

03-11-2022 12:20:13 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-12-2022 2:51:44 AM

2 kudos

Yes you need just to create service account for databricks and than assign storage admin role to bucket. After that you can mount GCS standard way:bucket_name = "<bucket-name>" mount_name = "<mount-name>" dbutils.fs.mount("gs://%s" % bucket_name, "/m...

2 kudos

03-12-2022 2:51:44 AM

by Jan_A • New Contributor III

02-02-2022 5:08:18 AM

2779 Views
5 replies
5 kudos

Resolved! Move/Migrate database from dbfs root (s3) to other mounted s3 bucket

Hi,I have a databricks database that has been created in the dbfs root S3 bucket, containing managed tables. I am looking for a way to move/migrate it to a mounted S3 bucket instead, and keep the database name.Any good ideas on how this can be done?T...

Data Engineering

2779 Views
5 replies
5 kudos

02-02-2022 5:08:18 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-11-2022 9:26:04 AM

5 kudos

Hi @Jan Ahlbeck , Did @DARSHAN BARGAL 's solution work in your case?

5 kudos

03-11-2022 9:26:04 AM

4 More Replies

by study_community • New Contributor III

01-17-2022 2:57:32 AM

6471 Views
13 replies
4 kudos

Resolved! Not able to move files from local to dbfs through dbfs CLI

Hi Folks,I have installed and configured databricks CLI in my local machine. I tried to move a local file from my personal computer using dbfs cp to dbfs:/ path. I can see the file is copied from local, and is only visible in local. I am not able to ...

Data Engineering

6471 Views
13 replies
4 kudos

01-17-2022 2:57:32 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2022 4:10:58 AM

4 kudos

Hi, Could you try to save the file from your local machine to dbfs:/FileStore location?# Put local file test.py to dbfs:/FileStore/test.pydbfs cp test.py dbfs:/FileStore/test.py

4 kudos

03-08-2022 4:10:58 AM

12 More Replies

by gzenz • New Contributor II

03-11-2022 3:47:03 AM

1175 Views
1 replies
1 kudos

Resolved! concat_ws() throws AnalysisException when too many columns are supplied

Hi,i'm using concat_ws in scala to calculate a checksum for the dataframe, i.e.:df.withColumn("CHECKSUM", sha2(functions.concat_ws("", dataframe.columns.map(col): _*), 512))I have one example here with just 24 columns that already throws the followin...

Data Engineering

1175 Views
1 replies
1 kudos

03-11-2022 3:47:03 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-11-2022 4:15:44 AM

1 kudos

at least one of column names can have some strange char, whitespace or something,or at least one of column type is not compatible (for example StructType)you can separate your code to two or more steps. First generate list of columns as some variable...

1 kudos

03-11-2022 4:15:44 AM

by SG_ • New Contributor II

03-11-2022 12:01:57 AM

1082 Views
2 replies
2 kudos

Resolved! How do i changes the fonts and color of the title of widget and the background color of widget?

Currently there is no documentation on how I can change the fonts and background color of widget? Is there a way to do so?

Data Engineering

1082 Views
2 replies
2 kudos

03-11-2022 12:01:57 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-11-2022 3:04:39 AM

2 kudos

I saw on roadmap presentation that in future there will be more widget options but for now is like @Kavya Manohar Parag said.

2 kudos

03-11-2022 3:04:39 AM

1 More Replies

by tz1 • New Contributor III

03-01-2022 7:22:41 AM

8687 Views
6 replies
5 kudos

Resolved! How to migrate data from an existing workspace to a new workspace?

Data Engineering

8687 Views
6 replies
5 kudos

03-01-2022 7:22:41 AM

View Replies

Latest Reply

User16753725182
Contributor III

03-11-2022 12:26:57 AM

5 kudos

Hi @Tony Zhou , Were you able to run this after generating a new token?

5 kudos

03-11-2022 12:26:57 AM

5 More Replies

by Rb29 • New Contributor

03-10-2022 6:50:26 AM

427 Views
0 replies
0 kudos

Image Display in Dockerized Cluster

I am using a docker recipe for configuring my databricks cluster. It is working fine for everything else however when I tried to display any image data using any python utility such as matplotlib, PIL or Opencv etc. the image does not get displayed o...

Data Engineering

427 Views
0 replies
0 kudos

03-10-2022 6:50:26 AM

by imgaboy • New Contributor III

03-08-2022 10:11:01 AM

1443 Views
4 replies
3 kudos

Resolved! pySpark Dataframe to DeepLearning model

I have a large time series with many measuring stations recording the same 5 data (Temperature, Humidity, etc.) I want to predict a future moment with a time series model, for which I pass the data from all the measuring stations to the Deep Learning...

Data Engineering

1443 Views
4 replies
3 kudos

03-08-2022 10:11:01 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-08-2022 11:02:04 AM

3 kudos

df.groupBy("date").pivot("Node").agg(first("Temp"))It is converting to classic crosstable so pivot will help. Example above.

3 kudos

03-08-2022 11:02:04 AM

3 More Replies

by Sarvagna_Mahaka • New Contributor III

01-19-2022 12:11:05 AM

2529 Views
3 replies
1 kudos

Resolved! Unable to clone GitLab Enterprise Edition repo in Databricks

Below are the steps that I followed. I still get an error message.Create a repo in gitlab enterprise editionIn GitLab, create a personal access token that allows access to your repositories ( with read_repository and write_repository permissions)Save...

Data Engineering

2529 Views
3 replies
1 kudos

01-19-2022 12:11:05 AM

View Replies

Latest Reply

User16725394280
Contributor II

03-10-2022 3:24:58 AM

1 kudos

Hi @Sarvagna Mahakali the repository which you are trying to add might be behind the VPN, our service cannot access it since it has no access to the VPN network.You may need the Enterprise Git / VPC to connect to the repository.Kindly check and let...

1 kudos

03-10-2022 3:24:58 AM

2 More Replies

by shan_chandra • Honored Contributor III

03-09-2022 2:12:04 PM

11549 Views
1 replies
3 kudos

Resolved! dataframe - cast string to decimal when encountering zeros returns OE-16

The user is trying to cast string to decimal when encountering zeros. The cast function displays the '0' as '0E-16'. could you please let us know your thoughts on whether 0s can be displayed as 0s?from pyspark.sql import functions as F df = spark.s...

Data Engineering

11549 Views
1 replies
3 kudos

03-09-2022 2:12:04 PM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

03-09-2022 2:19:39 PM

3 kudos

If the scale of decimal type is greater than 6, scientific notation kicks in hence seeing 0E-16.This behavior is described in the existing OSS spark issue - https://issues.apache.org/jira/browse/SPARK-25177Kindly cast the column to a decimal type les...

3 kudos

03-09-2022 2:19:39 PM

by LukaszJ • Contributor III

02-28-2022 4:05:33 AM

2167 Views
7 replies
1 kudos

Resolved! Long time turning on another notebook

Hello,I want to run some notebooks from notebook "A".And regardless of the contents of the some notebook, it is run for a long time (20 seconds). It is constans value and I do not know why it takes so long.I tried run simple notebook with one input p...

Data Engineering

2167 Views
7 replies
1 kudos

02-28-2022 4:05:33 AM

View Replies

Latest Reply

LukaszJ
Contributor III

03-09-2022 1:10:08 AM

1 kudos

Okay I am not able to set the same session for the both notebooks (parent and children).So my result is to use %run ./notebook_name .I put all the code to functions and now I can use them.Example:# Children notebook def do_something(param1, param2): ...

1 kudos

03-09-2022 1:10:08 AM

6 More Replies

by dzlab • New Contributor

03-08-2022 11:43:45 AM

451 Views
0 replies
0 kudos

Determine what is the interval in a timestamp column

OK so I'm trying to determine if a timestamp column has a regular interval or not, i.e. the difference between each consecutive value is the same across the entire column.I tried something like thisval timeColumn: String = val groupByColumn: String...

Data Engineering

451 Views
0 replies
0 kudos

03-08-2022 11:43:45 AM

by Tahseen0354 • Contributor III

03-02-2022 12:45:12 AM

1576 Views
4 replies
4 kudos

Resolved! Databricks Ganglia

Hi, is there any way to get alert automatically from databricks ganglia ? That means that a developer don’t need to review the logs manually but would get notification that resources are underutilized for example.

Data Engineering

1576 Views
4 replies
4 kudos

03-02-2022 12:45:12 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-02-2022 11:28:24 PM

4 kudos

Hi @Md Tahseen Anam , You can install Datadog agents on cluster nodes to send Datadog metrics to your Datadog account. The following notebook demonstrates how to install a Datadog agent on a cluster using a cluster-scoped init script.To install the ...

4 kudos

03-02-2022 11:28:24 PM

3 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

How to read a compressed file in spark if the filename does not include the file extension for that compression format?

Resolved! How to query delta lake using SQL desktop tools like SSMS or DBVisualizer

Resolved! DBFS with Google Cloud Storage(GCS)

Resolved! Move/Migrate database from dbfs root (s3) to other mounted s3 bucket

Resolved! Not able to move files from local to dbfs through dbfs CLI

Resolved! concat_ws() throws AnalysisException when too many columns are supplied

Resolved! How do i changes the fonts and color of the title of widget and the background color of widget?

Resolved! How to migrate data from an existing workspace to a new workspace?

Image Display in Dockerized Cluster

Resolved! pySpark Dataframe to DeepLearning model

Resolved! Unable to clone GitLab Enterprise Edition repo in Databricks

Resolved! dataframe - cast string to decimal when encountering zeros returns OE-16

Resolved! Long time turning on another notebook

Determine what is the interval in a timestamp column

Resolved! Databricks Ganglia

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...