cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dusko
by Databricks Partner
  • 6406 Views
  • 6 replies
  • 1 kudos

How to access root mountPoint without "Access Denied"?

Hi, I’m trying to read file from S3 root bucket. I can ls all the files but I can’t read it because of access denied. When I mount the same S3 root bucket under some other mountPoint, I can touch and read all the files. I also see that this new mount...

  • 6406 Views
  • 6 replies
  • 1 kudos
Latest Reply
Dusko
Databricks Partner
  • 1 kudos

Hi @Atanu Sarkar​ , @Piper Wilson​ ,​thanks for the replies. Well I don't understand the fact about ownership. I believe that rootbucket is still under my ownership (I created it and I could upload/delete any files through browser without any problem...

  • 1 kudos
5 More Replies
fsm
by New Contributor II
  • 11831 Views
  • 4 replies
  • 2 kudos

Resolved! Implementation of a stable Spark Structured Streaming Application

Hi folks,I have an issue. It's not critical but's annoying.We have implemented a Spark Structured Streaming Application.This application will be triggered wire Azure Data Factory (every 8 minutes). Ok, this setup sounds a little bit weird and it's no...

  • 11831 Views
  • 4 replies
  • 2 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 2 kudos

@Markus Freischlad​  Looks like the spark driver was stuck. It will be good to capture the thread dump of the Spark driver to understand what operation is stuck

  • 2 kudos
3 More Replies
admo
by New Contributor III
  • 11962 Views
  • 4 replies
  • 7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

  • 11962 Views
  • 4 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

  • 7 kudos
3 More Replies
Mendes
by New Contributor
  • 4612 Views
  • 2 replies
  • 0 kudos
  • 4612 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Danilo Mendes​ , Table schema is stored in the default Azure Databricks internal metastore and you can also configure and use external metastores. Ingest data into Azure Databricks. Access data in Apache Spark formats and from external data sources....

  • 0 kudos
1 More Replies
Tahseen0354
by Valued Contributor
  • 5188 Views
  • 4 replies
  • 2 kudos

Resolved! A Standard cluster is recommended for a single user - what is meant by that ?

Hi, I have seen it written in the documentation that standard cluster is recommended for a single user. But why ? What is meant by that ? Me and one of my colleagues were testing it on the same notebook. Both of us can use the same standard all purpo...

  • 5188 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

High concurrency cluster just split resource between users more evenly. So when 4 people run notebooks in the same time on cluster with 4 cpu you can imagine that every will get 1 cpu. In standard cluster 1 person could utilize all worker cpus as you...

  • 2 kudos
3 More Replies
Raie
by New Contributor III
  • 13930 Views
  • 3 replies
  • 4 kudos

Resolved! How do I specify column's data type with spark dataframes?

What I am doing:spark_df = spark.createDataFrame(dfnew)spark_df.write.saveAsTable("default.test_table", index=False, header=True)This automatically detects the datatypes and is working right now. BUT, what if the datatype cannot be detected or detect...

  • 13930 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

just create table earlier and set column types (CREATE TABLE ... LOCATION ( path path)in dataframe you need to have corresponding data types which you can make using cast syntax, just your syntax is incorrect, here is example of correct syntax:from p...

  • 4 kudos
2 More Replies
tomsyouruncle
by New Contributor III
  • 31127 Views
  • 14 replies
  • 3 kudos

How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

"Arbitrary files in Databricks Repos", allowing not just notebooks to be added to repos, is in Public Preview. I've tried to activate it following the instructions in the above link but the option doesn't appear in Admin Console. Minimum requirements...

image repos
  • 31127 Views
  • 14 replies
  • 3 kudos
Latest Reply
kahing_cheung
Databricks Employee
  • 3 kudos

What environment is your deployment in?

  • 3 kudos
13 More Replies
Sudeshna
by Databricks Partner
  • 17026 Views
  • 6 replies
  • 7 kudos

Resolved! I am new to Databricks SQL and want to create a variable which can hold calculations either from static values or from select queries similar to SQL Server. Is there a way to do so?

I was trying to create a variable and i got the following error -command - SET a = 5;Error -Error running queryConfiguration a is not available.

  • 17026 Views
  • 6 replies
  • 7 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 7 kudos

@Sudeshna Bhakat​ what @Joseph Kambourakis​ described works on clusters but is restricted on Databricks SQL endpoints i.e. only a limited number of SET commands are allowed. I suggest you explore the curly-braces (e.g. {{ my_variable }}) in Databrick...

  • 7 kudos
5 More Replies
shelms
by New Contributor II
  • 40999 Views
  • 2 replies
  • 7 kudos

Resolved! SQL CONCAT returning null

Has anyone else experienced this problem? I'm attempting to SQL concat two fields and if the second field is null, the entire string appears as null. The documentation is unclear on the expected outcome, and contrary to how concat_ws operates.SELECT ...

Screen Shot 2022-03-14 at 4.00.53 PM
  • 40999 Views
  • 2 replies
  • 7 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 7 kudos

CONCAT is a function defined in the SQL standard and available across a wide variety of DBMS. With the exception of Oracle which uses VARCHAR2 semantic across the board, the function returns NULL on NULL input.CONCAT_WS() is not standard and is mostl...

  • 7 kudos
1 More Replies
cmotla
by New Contributor III
  • 3314 Views
  • 1 replies
  • 7 kudos

Issue with complex json based data frame select

We are getting the below error when trying to select the nested columns (string type in a struct) even though we don't have more than a 1000 records in the data frame. The schema is very complex and has few columns as struct type and few as array typ...

  • 3314 Views
  • 1 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 7 kudos

Please share your code and some example of data.

  • 7 kudos
mikep
by New Contributor II
  • 7757 Views
  • 4 replies
  • 0 kudos

Resolved! Kubernetes or ZooKeeper for HA?

Hello. I am trying to understand High Availability in DataBricks. I understand that DB uses Kubernetes for the cluster manager and to manage Docker Containers. And while DB runs on top of AWS or Azure or GCP, is HA automatically provisioned when I st...

  • 7757 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

  • 0 kudos
3 More Replies
george2020
by New Contributor II
  • 1812 Views
  • 0 replies
  • 2 kudos

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

I am having an issue with Github Actions workflow using the Databricks Repos API. We want the API call in the Git Action to bring the Repo in our Databricks Repos Top-level folder to the latest version on a merge into the main branch.The Github Actio...

  • 1812 Views
  • 0 replies
  • 2 kudos
RicksDB
by Contributor III
  • 7539 Views
  • 3 replies
  • 6 kudos

Resolved! Restricting file upload to DBFS

Hi,Is it possible to restrict upload files to dfbs root (Since everyone has access) ? The idea is to force users to use an ADLS2 mnt with credential passthrough for security reasons.Also, right now users use azure blob explorer to interact with ADLS2...

  • 7539 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16764241763
Databricks Employee
  • 6 kudos

Hello @E H​ You can disable DBFS file browser in the workspace, if users directly upload from there. This will prevent uploads to DBFS.https://docs.databricks.com/administration-guide/workspace/dbfs-browser.html Please let us know if this solution wo...

  • 6 kudos
2 More Replies
wyzer
by Contributor II
  • 5827 Views
  • 2 replies
  • 3 kudos

Resolved! Insert data into an on-premise SQL Server

Hello,Is it possible to insert data from Databricks into an on-premise SQL Server ?Thanks.

  • 5827 Views
  • 2 replies
  • 3 kudos
Latest Reply
wyzer
Contributor II
  • 3 kudos

Hello,Yes we find out how to do it by installing a JDBC connector.It works fine.Thanks.

  • 3 kudos
1 More Replies
Soma
by Valued Contributor
  • 5367 Views
  • 3 replies
  • 5 kudos

Resolved! Enable custom Ipython Extension

How to enable custom Ipython Extension on Databricks Notebook Start

  • 5367 Views
  • 3 replies
  • 5 kudos
Latest Reply
Soma
Valued Contributor
  • 5 kudos

I want to load custom extensions which I create like custom call back events on cell runhttps://ipython.readthedocs.io/en/stable/config/callbacks.html

  • 5 kudos
2 More Replies
Labels