Data Engineering

Forum Posts

Sorted by:

by Tahseen0354 • Valued Contributor

03-21-2022 11:54:43 AM

4799 Views
4 replies
2 kudos

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Hi, I have seen it written in the documentation that standard cluster is recommended for a single user. But why ? What is meant by that ? Me and one of my colleagues were testing it on the same notebook. Both of us can use the same standard all purpo...

Data Engineering

4799 Views
4 replies
2 kudos

03-21-2022 11:54:43 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-21-2022 12:11:13 PM

2 kudos

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu. In standard cluster 1 person could utilize all worker cpus as you...

2 kudos

03-21-2022 12:11:13 PM

3 More Replies

by Raie • New Contributor III

03-18-2022 10:24:34 AM

12932 Views
3 replies
4 kudos

Resolved! How do I specify column's data type with spark dataframes?

What I am doing:spark_df = spark.createDataFrame(dfnew)spark_df.write.saveAsTable("default.test_table", index=False, header=True)This automatically detects the datatypes and is working right now. BUT, what if the datatype cannot be detected or detect...

Data Engineering

12932 Views
3 replies
4 kudos

03-18-2022 10:24:34 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-20-2022 7:47:29 AM

4 kudos

just create table earlier and set column types (CREATE TABLE ... LOCATION ( path path)in dataframe you need to have corresponding data types which you can make using cast syntax, just your syntax is incorrect, here is example of correct syntax:from p...

4 kudos

03-20-2022 7:47:29 AM

2 More Replies

by tomsyouruncle • New Contributor III

03-13-2022 4:20:01 PM

28823 Views
14 replies
3 kudos

How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

"Arbitrary files in Databricks Repos", allowing not just notebooks to be added to repos, is in Public Preview. I've tried to activate it following the instructions in the above link but the option doesn't appear in Admin Console. Minimum requirements...

Data Engineering

28823 Views
14 replies
3 kudos

03-13-2022 4:20:01 PM

View Replies

Latest Reply

kahing_cheung
Databricks Employee

03-14-2022 12:18:26 PM

3 kudos

What environment is your deployment in?

3 kudos

03-14-2022 12:18:26 PM

13 More Replies

by Sudeshna • Databricks Partner

03-15-2022 11:49:37 AM

16218 Views
6 replies
7 kudos

Resolved! I am new to Databricks SQL and want to create a variable which can hold calculations either from static values or from select queries similar to SQL Server. Is there a way to do so?

I was trying to create a variable and i got the following error -command - SET a = 5;Error -Error running queryConfiguration a is not available.

Data Engineering

16218 Views
6 replies
7 kudos

03-15-2022 11:49:37 AM

View Replies

Latest Reply

BilalAslamDbrx
Databricks Employee

03-20-2022 1:35:27 AM

7 kudos

@Sudeshna Bhakat what @Joseph Kambourakis described works on clusters but is restricted on Databricks SQL endpoints i.e. only a limited number of SET commands are allowed. I suggest you explore the curly-braces (e.g. {{ my_variable }}) in Databrick...

7 kudos

03-20-2022 1:35:27 AM

5 More Replies

by shelms • New Contributor II

03-14-2022 2:05:30 PM

36679 Views
2 replies
7 kudos

Resolved! SQL CONCAT returning null

Has anyone else experienced this problem? I'm attempting to SQL concat two fields and if the second field is null, the entire string appears as null. The documentation is unclear on the expected outcome, and contrary to how concat_ws operates.SELECT ...

Data Engineering

36679 Views
2 replies
7 kudos

03-14-2022 2:05:30 PM

View Replies

Latest Reply

BilalAslamDbrx
Databricks Employee

03-21-2022 3:37:14 AM

7 kudos

CONCAT is a function defined in the SQL standard and available across a wide variety of DBMS. With the exception of Oracle which uses VARCHAR2 semantic across the board, the function returns NULL on NULL input.CONCAT_WS() is not standard and is mostl...

7 kudos

03-21-2022 3:37:14 AM

1 More Replies

by cmotla • New Contributor III

03-18-2022 2:58:04 PM

3149 Views
1 replies
7 kudos

Issue with complex json based data frame select

We are getting the below error when trying to select the nested columns (string type in a struct) even though we don't have more than a 1000 records in the data frame. The schema is very complex and has few columns as struct type and few as array typ...

Data Engineering

3149 Views
1 replies
7 kudos

03-18-2022 2:58:04 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-20-2022 7:43:02 AM

7 kudos

Please share your code and some example of data.

7 kudos

03-20-2022 7:43:02 AM

by mikep • New Contributor II

12-15-2021 7:18:01 PM

7380 Views
4 replies
0 kudos

Resolved! Kubernetes or ZooKeeper for HA?

Hello. I am trying to understand High Availability in DataBricks. I understand that DB uses Kubernetes for the cluster manager and to manage Docker Containers. And while DB runs on top of AWS or Azure or GCP, is HA automatically provisioned when I st...

Data Engineering

7380 Views
4 replies
0 kudos

12-15-2021 7:18:01 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-19-2022 4:19:20 AM

0 kudos

0 kudos

03-19-2022 4:19:20 AM

3 More Replies

by george2020 • New Contributor II

03-18-2022 10:22:57 AM

1690 Views
0 replies
2 kudos

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

I am having an issue with Github Actions workflow using the Databricks Repos API. We want the API call in the Git Action to bring the Repo in our Databricks Repos Top-level folder to the latest version on a merge into the main branch.The Github Actio...

Data Engineering

1690 Views
0 replies
2 kudos

03-18-2022 10:22:57 AM

by RicksDB • Contributor III

02-02-2022 11:00:45 AM

7121 Views
3 replies
6 kudos

Resolved! Restricting file upload to DBFS

Hi,Is it possible to restrict upload files to dfbs root (Since everyone has access) ? The idea is to force users to use an ADLS2 mnt with credential passthrough for security reasons.Also, right now users use azure blob explorer to interact with ADLS2...

Data Engineering

7121 Views
3 replies
6 kudos

02-02-2022 11:00:45 AM

View Replies

Latest Reply

User16764241763
Databricks Employee

03-18-2022 8:49:27 AM

6 kudos

Hello @E H You can disable DBFS file browser in the workspace, if users directly upload from there. This will prevent uploads to DBFS.https://docs.databricks.com/administration-guide/workspace/dbfs-browser.html Please let us know if this solution wo...

6 kudos

03-18-2022 8:49:27 AM

2 More Replies

by wyzer • Contributor II

02-17-2022 5:59:29 AM

5486 Views
2 replies
3 kudos

Resolved! Insert data into an on-premise SQL Server

Hello,Is it possible to insert data from Databricks into an on-premise SQL Server ?Thanks.

Data Engineering

5486 Views
2 replies
3 kudos

02-17-2022 5:59:29 AM

View Replies

Latest Reply

wyzer
Contributor II

03-18-2022 8:19:18 AM

3 kudos

Hello,Yes we find out how to do it by installing a JDBC connector.It works fine.Thanks.

3 kudos

03-18-2022 8:19:18 AM

1 More Replies

by Soma • Valued Contributor

03-18-2022 3:39:16 AM

5039 Views
3 replies
5 kudos

Resolved! Enable custom Ipython Extension

How to enable custom Ipython Extension on Databricks Notebook Start

Data Engineering

5039 Views
3 replies
5 kudos

03-18-2022 3:39:16 AM

View Replies

Latest Reply

Soma
Valued Contributor

03-18-2022 5:24:01 AM

5 kudos

I want to load custom extensions which I create like custom call back events on cell runhttps://ipython.readthedocs.io/en/stable/config/callbacks.html

5 kudos

03-18-2022 5:24:01 AM

2 More Replies

by emanuele_maffeo • Databricks Partner

03-17-2022 7:55:24 AM

6360 Views
5 replies
8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

Data Engineering

6360 Views
5 replies
8 kudos

03-17-2022 7:55:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-17-2022 11:53:21 AM

8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

8 kudos

03-17-2022 11:53:21 AM

4 More Replies

by alonisser • Contributor II

03-14-2022 2:42:21 PM

4040 Views
3 replies
4 kudos

Resolved! How to migrate an existing workspace for an external metastore

Currently we're on an azure databricks workspace, we've setup during the POC, a long time ago. In the meanwhile we have built quite a production workload above databricks.Now we want to split workspaces - one for analysts and one for data engineeri...

Data Engineering

4040 Views
3 replies
4 kudos

03-14-2022 2:42:21 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-14-2022 3:30:36 PM

4 kudos

From databricks notebook just run mysqldump. Server address and details you can take from logs or configuration.I am including also link to example notebook https://docs.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/2016-election-tweets.h...

4 kudos

03-14-2022 3:30:36 PM

2 More Replies

by USHAK • New Contributor II

03-14-2022 6:07:46 PM

1564 Views
1 replies
0 kudos

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed ...

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed without entering voucher. I do not have voucher.Please help

Data Engineering

1564 Views
1 replies
0 kudos

03-14-2022 6:07:46 PM

View Replies

Latest Reply

USHAK
New Contributor II

03-17-2022 8:04:38 AM

0 kudos

Can someone Please respond to my above question ? Can i write certification test without Voucher ?

0 kudos

03-17-2022 8:04:38 AM

by Jeff1 • Contributor II

03-07-2022 11:53:52 AM

15961 Views
3 replies
4 kudos

Resolved! How to convert lat/long to geohash in databricks using geohashTools R library

I continues to receive a parsing error when attempting to convert lat/long data to a geohash in data bricks . I've tried two coding methods in R and get the same error.library(geohashTools)Method #1my_tbl$geo_hash <- gh_encode(my_tbl$Latitude, my_tbl...

Data Engineering

15961 Views
3 replies
4 kudos

03-07-2022 11:53:52 AM

View Replies

Latest Reply

Jeff1
Contributor II

03-17-2022 7:05:31 AM

4 kudos

The problem was I was trying to run the gh_encode function on a Spark dataframe. I needed to collect the date into a R dataframe then run the function.

4 kudos

03-17-2022 7:05:31 AM

2 More Replies

Databricks Community

Forum Posts

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Resolved! How do I specify column's data type with spark dataframes?

How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

Resolved! I am new to Databricks SQL and want to create a variable which can hold calculations either from static values or from select queries similar to SQL Server. Is there a way to do so?

Resolved! SQL CONCAT returning null

Issue with complex json based data frame select

Resolved! Kubernetes or ZooKeeper for HA?

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

Resolved! Restricting file upload to DBFS

Resolved! Insert data into an on-premise SQL Server

Resolved! Enable custom Ipython Extension

Resolved! Trigger.AvailableNow on scala - compile issue

Resolved! How to migrate an existing workspace for an external metastore

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed ...

Resolved! How to convert lat/long to geohash in databricks using geohashTools R library

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...