cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dataengineer_mm
by New Contributor
  • 3099 Views
  • 1 replies
  • 1 kudos

Surrogate key using identity column.

I want to create a surrogate in the delta table And i used the identity column id-Generated as DefaultCan i insert rows into the delta table using only spark.sql like Insert query ? or i can also use write delta format options? If i use the df.write ...

  • 3099 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hello @Menaka Murugesan​ ,If you are using the identity column, I believe you would have created the table as below, (starts with value 1 and step 1)CREATE TABLE my_table ( id INT IDENTITY (1, 1) PRIMARY KEY, value STRING )You can insert values i...

  • 1 kudos
sanjay
by Valued Contributor II
  • 9752 Views
  • 3 replies
  • 5 kudos

Resolved! PySpark UDF is taking long to process

Hi,I have UDF which runs for each spark dataframe row, does some complex processing and return string output. But it takes very long if data is 15000 rows. I have configured cluster with autoscaling, but its not spinning more servers.Please suggest h...

  • 9752 Views
  • 3 replies
  • 5 kudos
Latest Reply
Lakshay
Databricks Employee
  • 5 kudos

Hi @Sanjay Jain​ , Python UDFs are generally slower to process because it runs mostly in the driver which can also lead to OOM errors on Driver. To resolve this issue, please consider the below:Use spark built-in functions to do the same functionalit...

  • 5 kudos
2 More Replies
MShee
by New Contributor II
  • 2180 Views
  • 1 replies
  • 1 kudos
  • 2180 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hello @M Shee​ ,In a drop down you can select a value from a list of provided values, not type the values in. What you might be interested in is a combobox - It is combination of text and dropdown. It allows to select a value from a provided list or ...

  • 1 kudos
Lu_Wang_SA_DBX
by Databricks Employee
  • 7866 Views
  • 1 replies
  • 3 kudos

We will host the first Databricks Bay Area User Group meeting in the Databricks Mountain View office on March 14 2:45-5:00 pm PT.We'll have Dave M...

We will host the first Databricks Bay Area User Group meeting in the Databricks Mountain View office on March 14 2:45-5:00 pm PT.We'll have Dave Mariani - CTO & Founder at AtScale, and Riley Phillips - Enterprise Solution Engineer at Matillion to sha...

David Mariana - CTO, AtScale Riley Phillips - Enterprise Solution Engineer, Matillion
  • 7866 Views
  • 1 replies
  • 3 kudos
Latest Reply
amitabharora
Databricks Employee
  • 3 kudos

Looking forward.

  • 3 kudos
Everton_Costa
by New Contributor II
  • 2247 Views
  • 2 replies
  • 1 kudos
  • 2247 Views
  • 2 replies
  • 1 kudos
Latest Reply
Cami
Contributor III
  • 1 kudos

I hope it helps:SELECT DATEADD(DAY, rnk - 1, '{{StartDate}}') FROM ( WITH lv0(c) AS( SELECT 1 as c UNION ALL SELECT 1 ) , lv1 AS ( Select t1.c from lv0 t1 cross JOIN lv0 t2 ) , lv2 AS ( Select t1....

  • 1 kudos
1 More Replies
JacintoArias
by New Contributor III
  • 8706 Views
  • 5 replies
  • 1 kudos

Spark predicate pushdown on parquet files when using limit

Hi,While developing an ETL for a large dataset I want to get a sample of the top rows to check that my the pipeline "just runs", so I add a limit clause when reading the dataset.I'm surprised to see that instead of creating a single task as in a sho...

  • 8706 Views
  • 5 replies
  • 1 kudos
Latest Reply
JacekLaskowski
New Contributor III
  • 1 kudos

It's been a while since the question was asked, and in the meantime Delta Lake 2.2.0 hit the shelves with the exact feature the OP asked about, i.e. LIMIT pushdown:LIMIT pushdown into Delta scan. Improve the performance of queries containing LIMIT cl...

  • 1 kudos
4 More Replies
rsamant07
by New Contributor III
  • 2425 Views
  • 3 replies
  • 2 kudos

Serverless SQL Cluster giving error with Powerbi

Power bu Giving this error while accessing delta table using serverless sql endpoint. Error while using path /mnt/xyz/_delta_log/00000000000000000000.checkpoint for resolving path '/xyz/_delta_log/00000000000000000000.checkpoint' within mount at '/mn...

  • 2425 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Rahul Samant​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

  • 2 kudos
2 More Replies
maaaxx
by New Contributor III
  • 2065 Views
  • 3 replies
  • 4 kudos

A customized python library in cluster to access ADLS vis secret

Hello dear community,in our current project, we would like to develop a customized python library and deploy this library to all of the cluster to manage access control. You might ask why via a conventional way like external storage, well, we do not ...

  • 2065 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Yuan Gao​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we c...

  • 4 kudos
2 More Replies
andrcami1990
by New Contributor II
  • 7864 Views
  • 2 replies
  • 2 kudos

Resolved! Connect GraphQL to Data Bricks

Hi I am new to Databricks however I need to expose data found in the delta lake directly to GraphQL to be queried by several applications. Is there a connector or something similar to GraphQL that works with Databricks?

  • 7864 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Andrew Camilleri​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

  • 2 kudos
1 More Replies
maxutil
by New Contributor II
  • 7564 Views
  • 2 replies
  • 3 kudos

Resolved! SQL select string and turn it into a decimal

select col as original, col::double as val_double, col::float as val_float, col::decimal(10,4) as val_decimal, to_number(col, '99999.99999') as val_tonum from int_fx_conversion_rate;The original value of col is a string such as '1...

  • 7564 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Chris Chung​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback ...

  • 3 kudos
1 More Replies
JaiT
by New Contributor II
  • 2328 Views
  • 2 replies
  • 2 kudos

Resolved! DataBricks Workspace Environment

Hi, I am new to DataBricks and have started learning about it. I wanted to know if I can use the DataBricks workspace without the 3 Cloud Providers, i.e. AWS, Azure and GCP.If yes, then how?

  • 2328 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Jai Chitkara​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

  • 2 kudos
1 More Replies
Alyayman
by Contributor
  • 2608 Views
  • 3 replies
  • 2 kudos
  • 2608 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Aly Ayman​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 2 kudos
2 More Replies
Manuchito
by New Contributor
  • 2124 Views
  • 2 replies
  • 1 kudos

Resolved! Data Engineering with Databricks V2 not available in Partner

I cannot access the course anymore, it's shows it's under maintenance. For how long this will be? Is there any way to access it's videos for the Data Engineer Associate part?

  • 2124 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Juan Manuel Moviglia​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tel...

  • 1 kudos
1 More Replies
YSF
by New Contributor III
  • 16175 Views
  • 2 replies
  • 3 kudos

Resolved! How do I use the Python Logging Module in a Repo?

I have a repo that have python files that use the built in logging module. Additionally in some of the notebooks of the repo I want to use logging.debug()/logging.info() instead of print statements everywhere. However when I use the root logger or cr...

Screenshot 2023-02-28 143417
  • 16175 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Yusuf Khan​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 3 kudos
1 More Replies
uzairm
by New Contributor III
  • 17544 Views
  • 2 replies
  • 2 kudos

Resolved! ThreadPoolExecutor in Databricks

I am using a threadpool executor and running notebooks in parallel. However, these parallel notebooks are not using executors at all and all the load is going towards the driver node resulting in running out of memory for the driver node and eventual...

  • 17544 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @uzair mustafa​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels