cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jgrgn
by New Contributor
  • 1534 Views
  • 0 replies
  • 0 kudos

define notebook path from a parameter

Is there a way to define the notebook path based a parameter from the calling notebook using %run? I am aware of dbutils.notebook.run(), but would like to have all the functions defined in the reference notebook to be available in the calling noteboo...

  • 1534 Views
  • 0 replies
  • 0 kudos
BradSheridan
by Valued Contributor
  • 2698 Views
  • 0 replies
  • 0 kudos

Workflow parameters

Hey everyone! I'm close but can't seem to figure this out. I'm trying to add 2 notebooks to a Databricks Job. Instead of the first command in both notebooks being a connection to an RDS/Redshift cluster, I'd prefer to make that connection once and ha...

  • 2698 Views
  • 0 replies
  • 0 kudos
palzor
by New Contributor III
  • 1419 Views
  • 0 replies
  • 2 kudos

What is the best practice while loading delta table , do I infer the schema or provide the schema?

I am loading avro files into the detla tables. I am doing this for multiple tables and some files are big like (2-3GB) and most of them are small like in few MBs.I am using autoloader to load the data into the delta tables.My question is:What is the ...

  • 1419 Views
  • 0 replies
  • 2 kudos
anisha_93
by New Contributor II
  • 5703 Views
  • 2 replies
  • 1 kudos

Error in SQL statement: KeyProviderException: Failure to initialize configuration

I have a source delta table from which I have selectively granted access to a particular pool id(can be thought of a dummy user). From the pool id interface, whenever I am running a select on any of the tables, even though it has access to, is faili...

  • 5703 Views
  • 2 replies
  • 1 kudos
Latest Reply
alicewong20
New Contributor II
  • 1 kudos

Hello all,I got the same problem. Does anyone help?

  • 1 kudos
1 More Replies
Dicer
by Valued Contributor
  • 5264 Views
  • 4 replies
  • 3 kudos

Resolved! Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Data type:AAPL_Time: timestampAAPL_Close: floatRaw Data:AAPL_Time AAPL_Close 2015-05-11T08:00:00.000+0000 29.0344 2015-05-11T08:30:00.000+0000 29.0187 2015-05-11T09:00:00.000+0000 29.0346 2015-05-11T09:3...

  • 5264 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Another thing to try is the hour() and minute() functions will return integers.

  • 3 kudos
3 More Replies
_Orc
by New Contributor
  • 23521 Views
  • 5 replies
  • 3 kudos

Resolved! Precision and scale is getting changed in the dataframe while casting to decimal

When i run the below query in databricks sql the Precision and scale of the decimal column is getting changed.Select typeof(COALESCE(Cast(3.45 as decimal(15,6)),0));o/p: decimal(16,6)expected o/p: decimal(15,6)Any reason why the Precision and scale i...

  • 23521 Views
  • 5 replies
  • 3 kudos
Latest Reply
berserkersap
Contributor
  • 3 kudos

You can use typeof(COALESCE(Cast(3.45 as decimal(15,6)),0.0)); (instead of 0)

  • 3 kudos
4 More Replies
Stephen678
by New Contributor II
  • 1747 Views
  • 0 replies
  • 0 kudos

Easy way to debug databricks code. Is there breakpoints in databricks or alternative way to achieve it?

I'm consuming multiple topics from confluent kafka and process each row with business rules using Spark structured streaming (.writestream and .foreach()). While doing that i call other notebook using %run and call the class via foreach while perform...

  • 1747 Views
  • 0 replies
  • 0 kudos
sage5616
by Valued Contributor
  • 12515 Views
  • 5 replies
  • 7 kudos

Resolved! SQL Error when querying any tables/views on a Databricks cluster via Dbeaver.

I am able to connect to the cluster, browse its hive catalog, see tables/views and columns/datatypesRunning a simple select statement from a view on a parquet file produces this error and no other results:"SQL Error [500540] [HY000]: [Databricks][Dat...

  • 12515 Views
  • 5 replies
  • 7 kudos
Latest Reply
sage5616
Valued Contributor
  • 7 kudos

Update. I have tried SQL Workbench/J and encountered exactly the same error(s) as with Dbeaver. I have also tried JetBrains DataGrip and it worked flawlessly. Able to connect, browse the databases and query tables/views. https://docs.microsoft.com/en...

  • 7 kudos
4 More Replies
BradSheridan
by Valued Contributor
  • 3966 Views
  • 1 replies
  • 0 kudos

Resolved! Drop/Create tables in Redshift with PySpark

Happy Friday afternoon fellow Bricksters! Got another question for you... I have a pyspark notebook that reads from redshift into a DF, does some 'stuff', then writes back to redshift. All good here. What I'm trying to do with no luck yet is first DR...

  • 3966 Views
  • 1 replies
  • 0 kudos
Latest Reply
BradSheridan
Valued Contributor
  • 0 kudos

Answered my own question!! check this out:dropSQL = ("DROP TABLE IF EXISTS <tablename>;"). --note the semicolon at the end!createSQL = ("CREATE TABLE IF NOT EXISTS <tablename> (field1 int, field2 date, etc...);")preActionsSQL = dropSQL + createSQLth...

  • 0 kudos
KarimSegura
by New Contributor III
  • 3910 Views
  • 2 replies
  • 4 kudos

databricks-connect throws an exception when showing a dataframe with json content

I'm facing an issue when I want to show a dataframe with JSON content.All this happens when the script runs in databricks-connect from VS Code.Basically, I would like any help or guidance to get this run as it should be. Thanks in advance.This is how...

  • 3910 Views
  • 2 replies
  • 4 kudos
Latest Reply
KarimSegura
New Contributor III
  • 4 kudos

The code works fine on databricks cluster, but this code is part of a unit test in local env. then submitted to a branch->PR->merged into master branch.Thanks for the advice on using DBX. I will give DBX a try again even though I've already tried.I'l...

  • 4 kudos
1 More Replies
Cano
by New Contributor III
  • 1179 Views
  • 1 replies
  • 0 kudos

Hi,I&#39;ll like to know if it&#39;s possible to connect to Postgresql RDS from the Databricks SQL Warehouse.

Hi,I'll like to know if it's possible to connect to Postgresql RDS from the Databricks SQL Warehouse.

  • 1179 Views
  • 1 replies
  • 0 kudos
Latest Reply
Cano
New Contributor III
  • 0 kudos

I should have posted this as a question and not a post. Please forgive me, I'm a newbie.

  • 0 kudos
nikgoel95
by New Contributor II
  • 2112 Views
  • 3 replies
  • 1 kudos

What&#39;s the be​at way to define the libraries for cluster as it always take a lot of time for me.

What's the be​at way to define the libraries for cluster as it always take a lot of time for me.

  • 2112 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sivaprasad1
Databricks Employee
  • 1 kudos

@Nikunj Goel​ : Please refer to the below doc the workspace library might help on thishttps://docs.databricks.com/libraries/workspace-libraries.html#workspace-libraries

  • 1 kudos
2 More Replies
pshah83
by New Contributor II
  • 2689 Views
  • 0 replies
  • 2 kudos

Use output of SHOW PARTITION commands in Sub-Query/CTE/Function

I am using SHOW PARTITIONS <<table_name>> to get all the partitions of a table. I want to use max() on the output of this command to get the latest partition for the table.However, I am not able to use SHOW PARTITIONS <<table_name>> in a CTE/sub-quer...

  • 2689 Views
  • 0 replies
  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels