Data Engineering

Forum Posts

Sorted by:

by mattjones • New Contributor II

12-05-2022 11:47:52 AM

363 Views
0 replies
0 kudos

www.meetup.com

DEC 13 MEETUP: Arbitrary Stateful Stream Processing in PySparkFor folks in the Bay Area- Dr. Karthik Ramasamy, Databricks' Head of Streaming, will be joined by engineering experts on the streaming and PySpark teams at Databricks for this in-person me...

Data Engineering

363 Views
0 replies
0 kudos

12-05-2022 11:47:52 AM

by karthik_p • Esteemed Contributor

10-18-2022 3:52:48 PM

2169 Views
5 replies
7 kudos

How to properly convert DUB's consumed into Doller Amount in Databricks AWS/GCP/AZURE

HI Team,I have gone through Lot of articles, but it looks there is some gap on pricing. can anyone please let me know accurate way to calculate DBU Pricing into dollarsas per my understandingTotal DBU Cost: DBU /hour * total job ran in hours (Shows a...

Data Engineering

2169 Views
5 replies
7 kudos

10-18-2022 3:52:48 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-05-2022 8:44:24 AM

7 kudos

DBU is per VM, and every VM has a different DBU price

7 kudos

12-05-2022 8:44:24 AM

4 More Replies

by Christine • Contributor

12-05-2022 5:55:55 AM

7087 Views
1 replies
2 kudos

ADD COLUMN IF NOT EXISTS does not recognize "IF NOT EXIST". How do I add a column to an existing delta table with SQL if the column does not already exist?

How do I add a column to an existing delta table with SQL if the column does not already exist?I am using the following code: <%sqlALTER TABLE table_name ADD COLUMN IF NOT EXISTS column_name type; >but it prints the error: <[PARSE_SYNTAX_ERROR] Synta...

Data Engineering

7087 Views
1 replies
2 kudos

12-05-2022 5:55:55 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-05-2022 9:57:26 AM

2 kudos

Hi @Christine Pedersen I guess IF NOT EXISTS or IF EXISTS can be used in conjunction with DROP or PARTITIONS according to the documentation. If you want to do this the same checking way, you can do using a try catch block in pyspark or as per your l...

2 kudos

12-05-2022 9:57:26 AM

by Ovi • New Contributor III

10-18-2022 9:31:21 AM

1834 Views
5 replies
10 kudos

Construct Dataframe or RDD from S3 bucket with Delta tables

Hi all! I have an S3 bucket with Delta parquet files/folders with different schemas each. I need to create an RDD or DataFrame from all those Delta Tables that should contain the path, name and different schema of each.How could I do that?Thank you!P...

Data Engineering

1834 Views
5 replies
10 kudos

10-18-2022 9:31:21 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-05-2022 8:38:19 AM

10 kudos

You can mount S3 bucket or read directly from it.access_key = dbutils.secrets.get(scope = "aws", key = "aws-access-key") secret_key = dbutils.secrets.get(scope = "aws", key = "aws-secret-key") sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", ac...

10 kudos

12-05-2022 8:38:19 AM

4 More Replies

by aaronpetry • New Contributor III

10-13-2022 12:33:23 PM

2400 Views
2 replies
3 kudos

%run not printing notebook output when using 'Run All' command

I have been using the %run command to run auxiliary notebooks from an "orchestration" notebook. I like using %run over dbutils.notebook.run because of the variable inheritance, troubleshooting ease, and the printing of the output from the auxiliary n...

Data Engineering

2400 Views
2 replies
3 kudos

10-13-2022 12:33:23 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-24-2022 10:38:50 PM

3 kudos

Hi @Aaron Petry Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

3 kudos

11-24-2022 10:38:50 PM

1 More Replies

by Nayan7276 • Valued Contributor II

12-01-2022 2:11:06 AM

1776 Views
5 replies
29 kudos

Resolved! databricks community

I have points in databricks community 461 but in reward store only reflecting 23 points can any one look into this issue

Data Engineering

1776 Views
5 replies
29 kudos

12-01-2022 2:11:06 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-05-2022 5:59:00 AM

29 kudos

Hi rewards account needs to be created with same email id and points may take a week to reflect in your rewards account

29 kudos

12-05-2022 5:59:00 AM

4 More Replies

by isaac_gritz • Valued Contributor II

08-23-2022 1:10:44 AM

1544 Views
4 replies
8 kudos

Databricks Runtime Support

How Long are Databricks runtimes supported for? How often are they updated?You can learn more about the Databricks runtime support lifecycle here (AWS | Azure | GCP).Long Term Support (LTS) runtimes are released every 6 months and supported for 2 yea...

Data Engineering

1544 Views
4 replies
8 kudos

08-23-2022 1:10:44 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-05-2022 12:33:29 AM

8 kudos

Thanks for update

8 kudos

12-05-2022 12:33:29 AM

3 More Replies

by Saikrishna2 • New Contributor III

11-22-2022 8:03:53 AM

4069 Views
7 replies
11 kudos

Data bricks SQL is allowing 10 queries only ?

•Power BI is a publisher that uses AD group authentication to publish result sets. Since the publisher's credentials are maintained, the same user can access the data bricks database.•Number of the users are retrieving the data from the power bi or i...

Data Engineering

4069 Views
7 replies
11 kudos

11-22-2022 8:03:53 AM

View Replies

Latest Reply

VaibB
Contributor

12-02-2022 12:26:24 PM

11 kudos

I believe 10 is a limit as of now. See if you can increase the concurrency limit from the source.

11 kudos

12-02-2022 12:26:24 PM

6 More Replies

by User16835756816 • Valued Contributor

11-28-2022 12:04:54 PM

2711 Views
4 replies
11 kudos

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

Tip: These steps are built out for AWS accounts and workspaces that are using Delta Lake. If you would like to learn more watch this video and reach out to your Databricks sales representative for more information.Step 1: Create your own notebook or ...

Data Engineering

2711 Views
4 replies
11 kudos

11-28-2022 12:04:54 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-04-2022 11:02:29 PM

11 kudos

Thanks @Nithya Thangaraj

11 kudos

12-04-2022 11:02:29 PM

3 More Replies

by Ajay-Pandey • Esteemed Contributor III

12-02-2022 1:34:31 AM

604 Views
1 replies
11 kudos

configure Unity Catalog in Azure Databricks

Hi all,Please help me setup the unity catalog in azure databricks .Any docs and content will help .

Data Engineering

604 Views
1 replies
11 kudos

12-02-2022 1:34:31 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-04-2022 11:01:17 PM

11 kudos

Anyone have idea about this??

11 kudos

12-04-2022 11:01:17 PM

by him • New Contributor III

08-25-2022 12:08:00 AM

10297 Views
8 replies
5 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE", "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Data Engineering

10297 Views
8 replies
5 kudos

08-25-2022 12:08:00 AM

View Replies

Latest Reply

SANKET
New Contributor II

12-04-2022 8:44:40 PM

5 kudos

Use https://<databricks-instance>/api/2.1/jobs/runs/get?run_id=xxxx."get-output" gives the details of single run id which is associated with the task but not the Job.

5 kudos

12-04-2022 8:44:40 PM

7 More Replies

by chhavibansal • New Contributor II

11-18-2022 11:08:00 AM

2104 Views
4 replies
1 kudos

ANALYZE TABLE showing NULLs for all statistics in Spark

var df2 = spark.read .format("csv") .option("sep", ",") .option("header", "true") .option("inferSchema", "true") .load("src/main/resources/datasets/titanic.csv") df2.createOrReplaceTempView("titanic") spark.table("titanic").cach...

Data Engineering

2104 Views
4 replies
1 kudos

11-18-2022 11:08:00 AM

View Replies

Latest Reply

chhavibansal
New Contributor II

12-03-2022 11:12:25 PM

1 kudos

can you share what the *newtitanic* is I think that you would have done something similarspark.sql("create table newtitanic as select * from titanic")something like this works for me, but the issue is i first make a temp view then again create a tab...

1 kudos

12-03-2022 11:12:25 PM

3 More Replies

by Jain • New Contributor III

11-18-2022 2:36:30 AM

1554 Views
1 replies
0 kudos

How to install GDAL on Databricks Cluster ?

I am currently using Runtime 10.4 LTS.The options available on Maven Central does not work as well as on PyPi.I am running:try: from osgeo import gdal except ImportError: import gdalto validate but it throws ModuleNotFoundError: No module n...

Data Engineering

1554 Views
1 replies
0 kudos

11-18-2022 2:36:30 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-03-2022 10:54:15 PM

0 kudos

@Abhishek Jain I can understand your issue it happens to me also multiple times so solving this issue I used to install the init script in my clusterMajor reason is that your 10X version does not support your current library so you have to find rig...

0 kudos

12-03-2022 10:54:15 PM

by Slalom_Tobias • New Contributor III

08-22-2022 2:49:49 PM

6626 Views
1 replies
1 kudos

AttributeError: 'SparkSession' object has no attribute '_wrapped' when attempting CoNLL.readDataset()

I'm getting the error...AttributeError: 'SparkSession' object has no attribute '_wrapped'---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<command-2311820097584616> in <cell li...

Data Engineering

6626 Views
1 replies
1 kudos

08-22-2022 2:49:49 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-03-2022 10:35:36 PM

1 kudos

this can happen in 10X version try to use 7.3 LTS and share your observationand if it not working there try to create init script and load it to your databricks cluster so whenever your machine go up you can get advantage of that library because some...

1 kudos

12-03-2022 10:35:36 PM

by rammy • Contributor III

11-21-2022 10:41:03 PM

1146 Views
1 replies
5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

Data Engineering

1146 Views
1 replies
5 kudos

11-21-2022 10:41:03 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-03-2022 12:57:24 AM

5 kudos

Hi @Ramesh Bathini In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

5 kudos

12-03-2022 12:57:24 AM

User

Count

1602

737

348

285

247

Databricks Community

Forum Posts

www.meetup.com

How to properly convert DUB's consumed into Doller Amount in Databricks AWS/GCP/AZURE

ADD COLUMN IF NOT EXISTS does not recognize "IF NOT EXIST". How do I add a column to an existing delta table with SQL if the column does not already exist?

Construct Dataframe or RDD from S3 bucket with Delta tables

%run not printing notebook output when using 'Run All' command

Resolved! databricks community

Databricks Runtime Support

Data bricks SQL is allowing 10 queries only ?

How can I extract data from different sources and transform it into a fresh, reliable data pipeline?

configure Unity Catalog in Azure Databricks

i am getting the below error while making a GET request to job in databrick after successfully running it

ANALYZE TABLE showing NULLs for all statistics in Spark

How to install GDAL on Databricks Cluster ?

AttributeError: 'SparkSession' object has no attribute '_wrapped' when attempting CoNLL.readDataset()

Not able to parse .doc extension file using scala in databricks notebook?

Getting com.databricks.client.jdbc.Driver is not f...

Unit Testing DLT Pipelines

Retrieve job-level parameters in spark_python_task...

Cannot pass arrays to spark.sql() using named para...

unity catalog with external table and column maski...