cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jerry747847
by New Contributor III
  • 7106 Views
  • 6 replies
  • 11 kudos

Resolved! When to increase maximum bound vs when to increase cluster size?

Hello experts,For the below question, I am trying to understand why option C was selected instead of B? As B would also have resolved the issueQuestion 40A data analyst has noticed that their Databricks SQL queries are running too slowly. They claim ...

  • 7106 Views
  • 6 replies
  • 11 kudos
Latest Reply
JRL
New Contributor II
  • 11 kudos

On a sql server, there are wait states. Wait states occur when several processors (vCPUs) are processing and several threads are working through the processors. A longer running thread that has dependencies, can cause the thread that may have begun o...

  • 11 kudos
5 More Replies
190809
by Contributor
  • 1411 Views
  • 2 replies
  • 1 kudos

Resolved! Loading tables to gold, one loads and the other two fail but same process.

Hi team, I am still fairly new to working with delta tables. I have created a df by reading in data from existing silver tables in my lakehouse. I read in the silver tables usiung sql into a workbook, do some manipulation, unnest some fiels and then ...

  • 1411 Views
  • 2 replies
  • 1 kudos
Latest Reply
190809
Contributor
  • 1 kudos

Hi @Pravin Chaubey​ thanks for responding. I discovered the issue. I had to load them as unmanaged tables but had previously not specified a path when doing .saveAsTable() and so those two tables that were failing to load were in fact managed tables ...

  • 1 kudos
1 More Replies
weldermartins
by Honored Contributor
  • 4891 Views
  • 2 replies
  • 1 kudos

Resolved! How to make spark-submit work on windows?

I have Jupyter Notebook installed on my machine working normally. I tested running a Spark application by running the spark-submit command and it returns the message that the file was not found. What do you need to do to make it work?Below is a file ...

image
  • 4891 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, yet this is not tested in my lab, but could you please check and confirm if this works: https://stackoverflow.com/questions/37861469/how-to-submit-spark-application-on-cmd

  • 1 kudos
1 More Replies
sasidhar
by New Contributor II
  • 8460 Views
  • 4 replies
  • 8 kudos

custom python module not found while using dbx on pycharm

Am new to databricks and pyspark. Building a pyspark application using pycharm IDE. I have tested the code in local and wanted to run on databricks cluster from IDE itself. Following the dbx documentation and able to run the single python file succes...

  • 8460 Views
  • 4 replies
  • 8 kudos
Latest Reply
Meghala
Valued Contributor II
  • 8 kudos

Even I got error​

  • 8 kudos
3 More Replies
najmead
by Contributor
  • 2371 Views
  • 2 replies
  • 0 kudos

Error Creating Primary Key Constraint

I am trying to add a primary key constraint to an existing table, and I get the following error;Cannot create or update table because the child column(s) `my_primary_key` of primary key `pk` cannot be set to nullable. Either drop the constraint, or c...

  • 2371 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Could you please confirm if you are using the latest databricks-sql-connector ? (https://pypi.org/project/databricks-sql-connector/)

  • 0 kudos
1 More Replies
Bhanu1
by New Contributor III
  • 1514 Views
  • 2 replies
  • 0 kudos

The new horizontal view of tasks *****. Can we please have the option for vertical view of a workflow?

The new horizontal view of tasks *****. Can we please have the option for vertical view of a workflow?

  • 1514 Views
  • 2 replies
  • 0 kudos
Latest Reply
Bhanu1
New Contributor III
  • 0 kudos

Hi Debayan,This was how workflows used to look like before  These are now shown from left to right instead of from top to bottom. It is a pain to scroll through a long workflow now as mouses don't have the capability to scroll left and right.

  • 0 kudos
1 More Replies
data_explorer
by New Contributor II
  • 1060 Views
  • 1 replies
  • 0 kudos

Is there anyway to execute grant and revoke statements to a user for an object based on a condition?

SELECT if((select count(*) from information_schema.table_privileges where grantee = 'samo@test.com' and table_schema='demo_schema' and table_catalog='demo_catalog')==1, (select count(*) from demo_catalog.demo_schema.demo_table), (select count(*) from...

  • 1060 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, GRANT and REVOKE are privileges on an securable object to a principal. And a principal is a user, service principal, or group known to the metastore. Principals can be granted privileges and may own securable objects.Also, you can use REVOKE ON S...

  • 0 kudos
SaravananPalani
by New Contributor II
  • 23592 Views
  • 8 replies
  • 9 kudos

Is there any way to monitor the CPU, disk and memory usage of a cluster while a job is running?

I am looking for something preferably similar to Windows task manager which we can use for monitoring the CPU, memory and disk usage for local desktop.

  • 23592 Views
  • 8 replies
  • 9 kudos
Latest Reply
hitech88
New Contributor II
  • 9 kudos

Some important info to look in Gangalia UI in CPU, memory and server load charts to spot the problem:CPU chart :User %Idle %High percentage of user % indicates heavy CPU usage in the cluster.Memory chart : Use %Free %Swap % If you see purple line ove...

  • 9 kudos
7 More Replies
najmead
by Contributor
  • 18826 Views
  • 6 replies
  • 13 kudos

How to convert string to datetime with correct timezone?

I have a field stored as a string in the format "12/30/2022 10:30:00 AM"If I use the function TO_DATE, I only get the date part... I want the full date and time.If I use the function TO_TIMESTAMP, I get the date and time, but it's assumed to be UTC, ...

  • 18826 Views
  • 6 replies
  • 13 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 13 kudos

use from_utc_timestamp(to_timestam("<string>", <format>),<timezone>)

  • 13 kudos
5 More Replies
ironising84
by New Contributor II
  • 5147 Views
  • 3 replies
  • 6 kudos

Question on Databricks Spark online proctored exam

Some silly questions folks. I took online proctored Databricks spark certification couple of days back and my unofficial result was pass. I received a mail that it might https://speedtest.vet/ take upto one week to receive the certification, if awar...

  • 5147 Views
  • 3 replies
  • 6 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 6 kudos

better would have been to ask for permission before drinking. I can share my exp. My mobile alarm started buzzing during the exam, I requested the moderator, he then paused the exam and asked me to take my laptop to the mobile and then to switch off,...

  • 6 kudos
2 More Replies
lambarc
by New Contributor II
  • 12985 Views
  • 7 replies
  • 13 kudos

How to read file in pyspark with “]|[” delimiter

The data looks like this: pageId]|[page]|[Position]|[sysId]|[carId 0005]|[bmw]|[south]|[AD6]|[OP4 There are atleast 50 columns and millions of rows. I did try to use below code to read: dff = sqlContext.read.format("com.databricks.spark.csv").option...

  • 12985 Views
  • 7 replies
  • 13 kudos
Latest Reply
rohit199912
New Contributor II
  • 13 kudos

you might also try the blow option.1). Use a different file format: You can try using a different file format that supports multi-character delimiters, such as text JSON.2). Use a custom Row class: You can write a custom Row class to parse the multi-...

  • 13 kudos
6 More Replies
Marcel
by New Contributor III
  • 26862 Views
  • 4 replies
  • 2 kudos

Resolved! Set environment variables in global init scripts

Hi Databricks Community,I want to set environment variables for all clusters in my workspace.The goal is to have environment (dev, prod) specific environment variables values.Instead of set the environment variables for each cluster, a global script ...

  • 26862 Views
  • 4 replies
  • 2 kudos
Latest Reply
brickster
New Contributor II
  • 2 kudos

We have set the env variable at Global Init script as below,sudo echo DATAENV=DEV >> /etc/environmentand we try to access the variable in notebook that run with "Shared" cluster mode. import os print(os.getenv("DATAENV"))But the env variable is not a...

  • 2 kudos
3 More Replies
tecku71
by New Contributor III
  • 1880 Views
  • 3 replies
  • 3 kudos

How to publish Notebook Dashboard without possiblity to "exit" FullScreen?

Is there a way to remove the "exit" Button from the fullscreen within the sparks Notebook - Dashboard ?

  • 1880 Views
  • 3 replies
  • 3 kudos
Latest Reply
Prabakar
Databricks Employee
  • 3 kudos

Could you please share a screenshot of what you see. I dont see any exit button. Or I might be looking at a wrong place.

  • 3 kudos
2 More Replies
519776
by New Contributor III
  • 19868 Views
  • 15 replies
  • 2 kudos

Resolved! How to create connection between Databricks & BigQuery

Hi, I would like to connect our BigQuery env to Databricks, So I created a service account but where should I configure the service account in Databricks? I read databricks documention and it`s not clear at all. Thanks for your help

  • 19868 Views
  • 15 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@kfiry​ adding to @Werner Stinckens​ did you added projectid in read spark query , projectid should be one where big query instance running. also please follow best practices in terms of egress data cost spark.read.format("bigquery") \ .option("tabl...

  • 2 kudos
14 More Replies
yousry
by New Contributor II
  • 4596 Views
  • 2 replies
  • 2 kudos

Resolved! What is the best way to find deltalake version on OSS and Databricks at runtime?

To identify certain deltalake features available on a certain installation, it is important to have a robust way to identify deltalake version. For OSS, I found that the below Scala snippet will do the job.import io.delta println(io.delta.VERSION)Not...

  • 4596 Views
  • 2 replies
  • 2 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 2 kudos

@Yousry Mohamed​ - could you please check the DBR runtime release notes for the Delta lake API compatibility matrix section ( DBR version vs Delta lake compatible version) for the mapping.Reference: https://docs.databricks.com/release-notes/runtime/r...

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels