cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Tahseen0354
by Valued Contributor
  • 4799 Views
  • 4 replies
  • 2 kudos

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Hi, I have seen it written in the documentation that standard cluster is recommended for a single user. But why ? What is meant by that ? Me and one of my colleagues were testing it on the same notebook. Both of us can use the same standard all purpo...

  • 4799 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu. In standard cluster 1 person could utilize all worker cpus as you...

  • 2 kudos
3 More Replies
Raie
by New Contributor III
  • 12932 Views
  • 3 replies
  • 4 kudos

Resolved! How do I specify column's data type with spark dataframes?

What I am doing:spark_df = spark.createDataFrame(dfnew)spark_df.write.saveAsTable("default.test_table", index=False, header=True)This automatically detects the datatypes and is working right now. BUT, what if the datatype cannot be detected or detect...

  • 12932 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

just create table earlier and set column types (CREATE TABLE ... LOCATION ( path path)in dataframe you need to have corresponding data types which you can make using cast syntax, just your syntax is incorrect, here is example of correct syntax:from p...

  • 4 kudos
2 More Replies
tomsyouruncle
by New Contributor III
  • 28823 Views
  • 14 replies
  • 3 kudos

How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

"Arbitrary files in Databricks Repos", allowing not just notebooks to be added to repos, is in Public Preview. I've tried to activate it following the instructions in the above link but the option doesn't appear in Admin Console. Minimum requirements...

image repos
  • 28823 Views
  • 14 replies
  • 3 kudos
Latest Reply
kahing_cheung
Databricks Employee
  • 3 kudos

What environment is your deployment in?

  • 3 kudos
13 More Replies
Sudeshna
by Databricks Partner
  • 16218 Views
  • 6 replies
  • 7 kudos

Resolved! I am new to Databricks SQL and want to create a variable which can hold calculations either from static values or from select queries similar to SQL Server. Is there a way to do so?

I was trying to create a variable and i got the following error -command - SET a = 5;Error -Error running queryConfiguration a is not available.

  • 16218 Views
  • 6 replies
  • 7 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 7 kudos

@Sudeshna Bhakat​ what @Joseph Kambourakis​ described works on clusters but is restricted on Databricks SQL endpoints i.e. only a limited number of SET commands are allowed. I suggest you explore the curly-braces (e.g. {{ my_variable }}) in Databrick...

  • 7 kudos
5 More Replies
shelms
by New Contributor II
  • 36679 Views
  • 2 replies
  • 7 kudos

Resolved! SQL CONCAT returning null

Has anyone else experienced this problem? I'm attempting to SQL concat two fields and if the second field is null, the entire string appears as null. The documentation is unclear on the expected outcome, and contrary to how concat_ws operates.SELECT ...

Screen Shot 2022-03-14 at 4.00.53 PM
  • 36679 Views
  • 2 replies
  • 7 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 7 kudos

CONCAT is a function defined in the SQL standard and available across a wide variety of DBMS. With the exception of Oracle which uses VARCHAR2 semantic across the board, the function returns NULL on NULL input.CONCAT_WS() is not standard and is mostl...

  • 7 kudos
1 More Replies
cmotla
by New Contributor III
  • 3149 Views
  • 1 replies
  • 7 kudos

Issue with complex json based data frame select

We are getting the below error when trying to select the nested columns (string type in a struct) even though we don't have more than a 1000 records in the data frame. The schema is very complex and has few columns as struct type and few as array typ...

  • 3149 Views
  • 1 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 7 kudos

Please share your code and some example of data.

  • 7 kudos
mikep
by New Contributor II
  • 7380 Views
  • 4 replies
  • 0 kudos

Resolved! Kubernetes or ZooKeeper for HA?

Hello. I am trying to understand High Availability in DataBricks. I understand that DB uses Kubernetes for the cluster manager and to manage Docker Containers. And while DB runs on top of AWS or Azure or GCP, is HA automatically provisioned when I st...

  • 7380 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

  • 0 kudos
3 More Replies
george2020
by New Contributor II
  • 1690 Views
  • 0 replies
  • 2 kudos

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

I am having an issue with Github Actions workflow using the Databricks Repos API. We want the API call in the Git Action to bring the Repo in our Databricks Repos Top-level folder to the latest version on a merge into the main branch.The Github Actio...

  • 1690 Views
  • 0 replies
  • 2 kudos
RicksDB
by Contributor III
  • 7121 Views
  • 3 replies
  • 6 kudos

Resolved! Restricting file upload to DBFS

Hi,Is it possible to restrict upload files to dfbs root (Since everyone has access) ? The idea is to force users to use an ADLS2 mnt with credential passthrough for security reasons.Also, right now users use azure blob explorer to interact with ADLS2...

  • 7121 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16764241763
Databricks Employee
  • 6 kudos

Hello @E H​ You can disable DBFS file browser in the workspace, if users directly upload from there. This will prevent uploads to DBFS.https://docs.databricks.com/administration-guide/workspace/dbfs-browser.html Please let us know if this solution wo...

  • 6 kudos
2 More Replies
wyzer
by Contributor II
  • 5486 Views
  • 2 replies
  • 3 kudos

Resolved! Insert data into an on-premise SQL Server

Hello,Is it possible to insert data from Databricks into an on-premise SQL Server ?Thanks.

  • 5486 Views
  • 2 replies
  • 3 kudos
Latest Reply
wyzer
Contributor II
  • 3 kudos

Hello,Yes we find out how to do it by installing a JDBC connector.It works fine.Thanks.

  • 3 kudos
1 More Replies
Soma
by Valued Contributor
  • 5039 Views
  • 3 replies
  • 5 kudos

Resolved! Enable custom Ipython Extension

How to enable custom Ipython Extension on Databricks Notebook Start

  • 5039 Views
  • 3 replies
  • 5 kudos
Latest Reply
Soma
Valued Contributor
  • 5 kudos

I want to load custom extensions which I create like custom call back events on cell runhttps://ipython.readthedocs.io/en/stable/config/callbacks.html

  • 5 kudos
2 More Replies
emanuele_maffeo
by Databricks Partner
  • 6360 Views
  • 5 replies
  • 8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

  • 6360 Views
  • 5 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

  • 8 kudos
4 More Replies
alonisser
by Contributor II
  • 4040 Views
  • 3 replies
  • 4 kudos

Resolved! How to migrate an existing workspace for an external metastore

Currently we're on an azure databricks workspace, we've setup during the POC, a long time ago. In the meanwhile we have built quite a production workload above databricks.Now we want to split workspaces - one for analysts and one for data engineeri...

  • 4040 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

From databricks notebook just run mysqldump. Server address and details you can take from logs or configuration.I am including also link to example notebook https://docs.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/2016-election-tweets.h...

  • 4 kudos
2 More Replies
USHAK
by New Contributor II
  • 1564 Views
  • 1 replies
  • 0 kudos

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed ...

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed without entering voucher. I do not have voucher.Please help

  • 1564 Views
  • 1 replies
  • 0 kudos
Latest Reply
USHAK
New Contributor II
  • 0 kudos

Can someone Please respond to my above question ? Can i write certification test without Voucher ?

  • 0 kudos
Jeff1
by Contributor II
  • 15961 Views
  • 3 replies
  • 4 kudos

Resolved! How to convert lat/long to geohash in databricks using geohashTools R library

I continues to receive a parsing error when attempting to convert lat/long data to a geohash in data bricks . I've tried two coding methods in R and get the same error.library(geohashTools)Method #1my_tbl$geo_hash <- gh_encode(my_tbl$Latitude, my_tbl...

  • 15961 Views
  • 3 replies
  • 4 kudos
Latest Reply
Jeff1
Contributor II
  • 4 kudos

The problem was I was trying to run the gh_encode function on a Spark dataframe. I needed to collect the date into a R dataframe then run the function.

  • 4 kudos
2 More Replies
Labels