Data Engineering

Forum Posts

Sorted by:

by jgrgn • New Contributor

08-15-2022 12:56:20 PM

1534 Views
0 replies
0 kudos

define notebook path from a parameter

Is there a way to define the notebook path based a parameter from the calling notebook using %run? I am aware of dbutils.notebook.run(), but would like to have all the functions defined in the reference notebook to be available in the calling noteboo...

Data Engineering

1534 Views
0 replies
0 kudos

08-15-2022 12:56:20 PM

by BradSheridan • Valued Contributor

08-15-2022 10:30:53 AM

2698 Views
0 replies
0 kudos

Workflow parameters

Hey everyone! I'm close but can't seem to figure this out. I'm trying to add 2 notebooks to a Databricks Job. Instead of the first command in both notebooks being a connection to an RDS/Redshift cluster, I'd prefer to make that connection once and ha...

Data Engineering

2698 Views
0 replies
0 kudos

08-15-2022 10:30:53 AM

by palzor • New Contributor III

08-14-2022 2:24:43 PM

1419 Views
0 replies
2 kudos

What is the best practice while loading delta table , do I infer the schema or provide the schema?

I am loading avro files into the detla tables. I am doing this for multiple tables and some files are big like (2-3GB) and most of them are small like in few MBs.I am using autoloader to load the data into the delta tables.My question is:What is the ...

Data Engineering

1419 Views
0 replies
2 kudos

08-14-2022 2:24:43 PM

by anisha_93 • New Contributor II

07-22-2021 4:58:39 AM

5703 Views
2 replies
1 kudos

Error in SQL statement: KeyProviderException: Failure to initialize configuration

I have a source delta table from which I have selectively granted access to a particular pool id(can be thought of a dummy user). From the pool id interface, whenever I am running a select on any of the tables, even though it has access to, is faili...

Data Engineering

5703 Views
2 replies
1 kudos

07-22-2021 4:58:39 AM

View Replies

Latest Reply

alicewong20
New Contributor II

08-13-2022 9:49:28 PM

1 kudos

Hello all,I got the same problem. Does anyone help?

1 kudos

08-13-2022 9:49:28 PM

1 More Replies

by Dicer • Valued Contributor

08-12-2022 2:16:46 AM

5264 Views
4 replies
3 kudos

Resolved! Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Data type:AAPL_Time: timestampAAPL_Close: floatRaw Data:AAPL_Time AAPL_Close 2015-05-11T08:00:00.000+0000 29.0344 2015-05-11T08:30:00.000+0000 29.0187 2015-05-11T09:00:00.000+0000 29.0346 2015-05-11T09:3...

Data Engineering

5264 Views
4 replies
3 kudos

08-12-2022 2:16:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

08-13-2022 3:50:10 PM

3 kudos

Another thing to try is the hour() and minute() functions will return integers.

3 kudos

08-13-2022 3:50:10 PM

3 More Replies

by _Orc • New Contributor

02-22-2022 10:02:34 AM

23521 Views
5 replies
3 kudos

Resolved! Precision and scale is getting changed in the dataframe while casting to decimal

When i run the below query in databricks sql the Precision and scale of the decimal column is getting changed.Select typeof(COALESCE(Cast(3.45 as decimal(15,6)),0));o/p: decimal(16,6)expected o/p: decimal(15,6)Any reason why the Precision and scale i...

Data Engineering

23521 Views
5 replies
3 kudos

02-22-2022 10:02:34 AM

View Replies

Latest Reply

berserkersap
Contributor

08-13-2022 12:05:19 PM

3 kudos

You can use typeof(COALESCE(Cast(3.45 as decimal(15,6)),0.0)); (instead of 0)

3 kudos

08-13-2022 12:05:19 PM

4 More Replies

by Stephen678 • New Contributor II

08-13-2022 6:18:46 AM

1747 Views
0 replies
0 kudos

Easy way to debug databricks code. Is there breakpoints in databricks or alternative way to achieve it?

I'm consuming multiple topics from confluent kafka and process each row with business rules using Spark structured streaming (.writestream and .foreach()). While doing that i call other notebook using %run and call the class via foreach while perform...

Data Engineering

1747 Views
0 replies
0 kudos

08-13-2022 6:18:46 AM

by Sha_1890 • New Contributor III

08-13-2022 4:46:50 AM

1989 Views
0 replies
3 kudos

Longer execution time to write into the SQL server table from Spark Dataframe

I have 8gb of XML data loaded into different dataframes, there are two dataframes which has 24 lakh and 82 lakh data to be written to a 2 SQL server tables which is taking so 2 hrs and 5 hrs of time to write it. I am using the below cluster configura...

Data Engineering

1989 Views
0 replies
3 kudos

08-13-2022 4:46:50 AM

by sage5616 • Valued Contributor

07-12-2022 8:40:36 AM

12515 Views
5 replies
7 kudos

Resolved! SQL Error when querying any tables/views on a Databricks cluster via Dbeaver.

I am able to connect to the cluster, browse its hive catalog, see tables/views and columns/datatypesRunning a simple select statement from a view on a parquet file produces this error and no other results:"SQL Error [500540] [HY000]: [Databricks][Dat...

Data Engineering

12515 Views
5 replies
7 kudos

07-12-2022 8:40:36 AM

View Replies

Latest Reply

sage5616
Valued Contributor

07-20-2022 7:37:37 AM

7 kudos

Update. I have tried SQL Workbench/J and encountered exactly the same error(s) as with Dbeaver. I have also tried JetBrains DataGrip and it worked flawlessly. Able to connect, browse the databases and query tables/views. https://docs.microsoft.com/en...

7 kudos

07-20-2022 7:37:37 AM

4 More Replies

by BradSheridan • Valued Contributor

08-12-2022 10:37:42 AM

3966 Views
1 replies
0 kudos

Resolved! Drop/Create tables in Redshift with PySpark

Happy Friday afternoon fellow Bricksters! Got another question for you... I have a pyspark notebook that reads from redshift into a DF, does some 'stuff', then writes back to redshift. All good here. What I'm trying to do with no luck yet is first DR...

Data Engineering

3966 Views
1 replies
0 kudos

08-12-2022 10:37:42 AM

View Replies

Latest Reply

BradSheridan
Valued Contributor

08-12-2022 12:07:21 PM

0 kudos

Answered my own question!! check this out:dropSQL = ("DROP TABLE IF EXISTS <tablename>;"). --note the semicolon at the end!createSQL = ("CREATE TABLE IF NOT EXISTS <tablename> (field1 int, field2 date, etc...);")preActionsSQL = dropSQL + createSQLth...

0 kudos

08-12-2022 12:07:21 PM

by KarimSegura • New Contributor III

08-12-2022 8:07:43 AM

3910 Views
2 replies
4 kudos

databricks-connect throws an exception when showing a dataframe with json content

I'm facing an issue when I want to show a dataframe with JSON content.All this happens when the script runs in databricks-connect from VS Code.Basically, I would like any help or guidance to get this run as it should be. Thanks in advance.This is how...

Data Engineering

3910 Views
2 replies
4 kudos

08-12-2022 8:07:43 AM

View Replies

Latest Reply

KarimSegura
New Contributor III

08-12-2022 11:41:40 AM

4 kudos

The code works fine on databricks cluster, but this code is part of a unit test in local env. then submitted to a branch->PR->merged into master branch.Thanks for the advice on using DBX. I will give DBX a try again even though I've already tried.I'l...

4 kudos

08-12-2022 11:41:40 AM

1 More Replies

by Cano • New Contributor III

08-12-2022 9:02:13 AM

1179 Views
1 replies
0 kudos

Hi,I'll like to know if it's possible to connect to Postgresql RDS from the Databricks SQL Warehouse.

Data Engineering

1179 Views
1 replies
0 kudos

08-12-2022 9:02:13 AM

View Replies

Latest Reply

Cano
New Contributor III

08-12-2022 9:09:12 AM

0 kudos

I should have posted this as a question and not a post. Please forgive me, I'm a newbie.

0 kudos

08-12-2022 9:09:12 AM

by nikgoel95 • New Contributor II

06-29-2022 11:22:37 AM

2112 Views
3 replies
1 kudos

What's the beat way to define the libraries for cluster as it always take a lot of time for me.

Data Engineering

2112 Views
3 replies
1 kudos

06-29-2022 11:22:37 AM

View Replies

Latest Reply

Sivaprasad1
Databricks Employee

08-12-2022 7:40:32 AM

1 kudos

@Nikunj Goel : Please refer to the below doc the workspace library might help on thishttps://docs.databricks.com/libraries/workspace-libraries.html#workspace-libraries

1 kudos

08-12-2022 7:40:32 AM

2 More Replies

by ahana • New Contributor III

08-11-2022 11:50:18 PM

1579 Views
0 replies
0 kudos

i tried to pull the report from QuickBase but it is giving error report too large

hii tried to pull the report from below query%pythondf = quickbasePull('b5zj8k_pbz5_0_cd5h4wbbp77n4nvp56b4u','bqmnP8jm7',24)but it is giving me error report too largethen i tried below%pythonimport pyqbfrom pyspark.sql import *import pandas as pdqbc ...

Data Engineering

1579 Views
0 replies
0 kudos

08-11-2022 11:50:18 PM

by pshah83 • New Contributor II

08-11-2022 4:20:55 PM

2689 Views
0 replies
2 kudos

Use output of SHOW PARTITION commands in Sub-Query/CTE/Function

I am using SHOW PARTITIONS <<table_name>> to get all the partitions of a table. I want to use max() on the output of this command to get the latest partition for the table.However, I am not able to use SHOW PARTITIONS <<table_name>> in a CTE/sub-quer...

Data Engineering

2689 Views
0 replies
2 kudos

08-11-2022 4:20:55 PM

Databricks Community

Forum Posts

define notebook path from a parameter

Workflow parameters

What is the best practice while loading delta table , do I infer the schema or provide the schema?

Error in SQL statement: KeyProviderException: Failure to initialize configuration

Resolved! Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Resolved! Precision and scale is getting changed in the dataframe while casting to decimal

Easy way to debug databricks code. Is there breakpoints in databricks or alternative way to achieve it?

Longer execution time to write into the SQL server table from Spark Dataframe

Resolved! SQL Error when querying any tables/views on a Databricks cluster via Dbeaver.

Resolved! Drop/Create tables in Redshift with PySpark

databricks-connect throws an exception when showing a dataframe with json content

Hi,I'll like to know if it's possible to connect to Postgresql RDS from the Databricks SQL Warehouse.

What's the beat way to define the libraries for cluster as it always take a lot of time for me.

i tried to pull the report from QuickBase but it is giving error report too large

Use output of SHOW PARTITION commands in Sub-Query/CTE/Function

Join Us as a Local Community Builder!

How to Stop Driver Node from Overloading When Usin...

Lakehouse Federation - fetch size parameter for op...

Best Practices for implementing DLT, Autoloader in...

Claude Access to Workspace and Catalog

Broadcast Join Failure in Streaming: Failed to sto...