cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nagini_sitarama
by New Contributor III
  • 1063 Views
  • 3 replies
  • 2 kudos

Error while optimizing the table . Failure of InSet.sql for UTF8String collection

count of the table : 1125089 for october month data , So I am optimizing the table. optimize table where batchday >="2022-10-01" and batchday<="2022-10-31"I am getting error like : GC overhead limit exceeded    at org.apache.spark.unsafe.types.UTF8St...

image.png
  • 1063 Views
  • 3 replies
  • 2 kudos
Latest Reply
Priyanka_Biswas
Valued Contributor
  • 2 kudos

Hi @Nagini Sitaraman​ To understand the issue better I would like to get some more information. Does the error occur at the driver side or executor side? Can you please share the full error stack trace? You may need to check the spark UI to find wher...

  • 2 kudos
2 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 3756 Views
  • 2 replies
  • 13 kudos

Understanding Rename in Databricks Now there are multiple ways to rename Spark Data Frame Columns or Expressions. We can rename columns or expressions...

Understanding Rename in DatabricksNow there are multiple ways to rename Spark Data Frame Columns or Expressions.We can rename columns or expressions using alias as part of selectWe can add or rename columns or expressions using withColumn on top of t...

  • 3756 Views
  • 2 replies
  • 13 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 13 kudos

Very informative, Thanks for sharing

  • 13 kudos
1 More Replies
AlexDavies
by Contributor
  • 1165 Views
  • 2 replies
  • 2 kudos

Issue connecting to SQL warehouse spark thrift server

we have a library that allows dotnet applications to talk to databricks clusters (https://github.com/clearbank/SparkSqlClient). This communicates with the clusters over the spark thrift serverAlthough this works great for clusters in the "data scienc...

  • 1165 Views
  • 2 replies
  • 2 kudos
Latest Reply
AlexDavies
Contributor
  • 2 kudos

I have tried those connection details however it they give me 400 errors when trying to connect directly using the hive thrift server contract (https://github.com/apache/hive/blob/master/service-rpc/if/TCLIService.thrift). I do not get the issues whe...

  • 2 kudos
1 More Replies
cristianc
by Contributor
  • 794 Views
  • 2 replies
  • 1 kudos

Unexpected workspace setup dialog in the account

Greetings,Recently we were doing cleanups in AWS and removed some Databricks related resources that were used only once for setting up our workspace and were not used since then.Since there is no plan to create any other workspaces the decision was t...

unexpected_workspace_create_dialog
  • 794 Views
  • 2 replies
  • 1 kudos
Latest Reply
cristianc
Contributor
  • 1 kudos

The resources that were cleaned up were just the ones that were used for the initial setup of the workspace, everything else important for the day to day operation are in place and we are actively using the workspace, therefore there is no plan to de...

  • 1 kudos
1 More Replies
ftc
by New Contributor II
  • 543 Views
  • 1 replies
  • 2 kudos

Can Databricks Certified Data Engineer Professional exam questions be short and easy to understand?

The Databricks Certified Data Engineer Professional exam most questions are too long for those English as second language. Not enough time to read through the questions and sometimes hard to comprehend

  • 543 Views
  • 1 replies
  • 2 kudos
Latest Reply
eimis_pacheco
Contributor
  • 2 kudos

I strongly agree with you. There is not a Spanish version of this exam. Those exam are long even for native speakers just imagine for people with English as a second language. For instance, since Amazon does not have a Spanish version, they took this...

  • 2 kudos
jonathan-dufaul
by Valued Contributor
  • 1367 Views
  • 4 replies
  • 5 kudos

Why is writing to MSSQL Server 12.0 so slow directly from spark but nearly instant when I write to a csv and read it back

I have a dataframe that inexplicably takes forever to write to an MS SQL Server even though other dataframes, even much larger ones, write nearly instantly. I'm using this code:my_dataframe.write.format("jdbc") .option("url",sqlsUrl) .optio...

  • 1367 Views
  • 4 replies
  • 5 kudos
Latest Reply
yueyue_tang
New Contributor II
  • 5 kudos

I meet the same problem and I don't know how to write dataFrame to MS sql server quickly​

  • 5 kudos
3 More Replies
BF
by New Contributor II
  • 3065 Views
  • 3 replies
  • 2 kudos

Resolved! Pyspark - How do I convert date/timestamp of format like /Date(1593786688000+0200)/ in pyspark?

Hi all, I've a dataframe with CreateDate column with this format:CreateDate/Date(1593786688000+0200)//Date(1446032157000+0100)//Date(1533904635000+0200)//Date(1447839805000+0100)//Date(1589451249000+0200)/and I want to convert that format to date/tim...

  • 3065 Views
  • 3 replies
  • 2 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 2 kudos

Hi @Bruno Franco​ ,Can you please try the below code, hope it might for you.from pyspark.sql.functions import from_unixtime from pyspark.sql import functions as F final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("Cr...

  • 2 kudos
2 More Replies
whh99
by New Contributor II
  • 946 Views
  • 3 replies
  • 1 kudos

Given user id, what API can we use to find out which cluster the user is connected to?

I want to know the cluster that user is connected to in databricks. It would be great if we can also get the duration that the user is connected.

  • 946 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Hui Hui Wong​  (Customer)​, We haven’t heard from you since the last response from @Daniel Sahal​ (Customer)​ , and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as...

  • 1 kudos
2 More Replies
SreedharVengala
by New Contributor III
  • 13199 Views
  • 18 replies
  • 9 kudos

PGP Encryption / Decryption in Databricks

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault. What libraries need to be used? Any code snippets? Links?

  • 13199 Views
  • 18 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

I am looking for similar requirements to explore various options to encrypt/decrypt the ADLS data using ADB pyspark. Please share list of options available.

  • 9 kudos
17 More Replies
190809
by Contributor
  • 412 Views
  • 1 replies
  • 1 kudos

What are the requirements in order for the event log to collect backlog metrics?

I am trying to use the event log to collect metrics on the 'flow_progess' under the 'event_type' field. In the the docs it suggests that this information may not be collected based on the data source and runtime used (see screenshot). Can anyone let ...

Screenshot 2022-12-07 at 11.30.43
  • 412 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16539034020
Contributor II
  • 1 kudos

Thanks for contacting Databricks Support! I understand that you're looking for information on unsupported data source types and runtimes for the backlog metrics. Unfortunately, we currently have not documented that information. It's possible that som...

  • 1 kudos
Ak3
by New Contributor III
  • 1717 Views
  • 5 replies
  • 6 kudos

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

  • 1717 Views
  • 5 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Databricks is the data lake / lakehouse and Azure SQL is the database.

  • 6 kudos
4 More Replies
hanish
by New Contributor II
  • 1377 Views
  • 3 replies
  • 2 kudos

Job cluster support in jobs/runs/submit API

We are using jobs/runs/submit API of databricks to create and trigger a one-time run with new_cluster and existing_cluster configuration. We would like to check if there is provision to pass "job_clusters" in this API to reuse the same cluster across...

  • 1377 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Hanish Bansal​ Shared job cluster for  jobs/runs/submit API is not supported at the moment.

  • 2 kudos
2 More Replies
horatiug
by New Contributor III
  • 1569 Views
  • 5 replies
  • 1 kudos

Databricks workspace with custom VPC using terraform in Google Cloud

I am working on Google Cloud and want to create Databricks workspace with custom VPC using terraform. Is that supported ? If yes is it similar to AWS way ?Thank youHoratiu

  • 1569 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @horatiu guja​ GCP Workspace provisioning using Terraform is public preview now. Please refer to the below doc for the steps.https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/gcp-workspace

  • 1 kudos
4 More Replies
johnb1
by New Contributor III
  • 2837 Views
  • 4 replies
  • 0 kudos

SELECT from table saved under path

Hi!I saved a dataframe as a delta table with the following syntax:(test_df .write .format("delta") .mode("overwrite") .save(output_path) )How can I issue a SELECT statement on the table?What do I need to insert into [table_name] below?SELECT ...

  • 2837 Views
  • 4 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @John B​ there is two way to access your delta table-SELECT * FROM delta.`your_delta_table_path`df.write.format("delta").mode("overwrite").option("path", "your_path").saveAsTable("table_name")Now you can use your select query-SELECT * FROM [table_...

  • 0 kudos
3 More Replies
xiaochong
by New Contributor III
  • 513 Views
  • 1 replies
  • 2 kudos

Is Delta Live Tables planned to be open source in the future?

Is Delta Live Tables planned to be open source in the future?

  • 513 Views
  • 1 replies
  • 2 kudos
Latest Reply
Priyanka_Biswas
Valued Contributor
  • 2 kudos

Hello there @G Z​  I would say "we have a history of open sourcing our biggest innovations but there's no concrete timeline for dlt. It's built on the open APIs of spark and delta, so the most important parts (your transformation logic and you data) ...

  • 2 kudos
Labels
Top Kudoed Authors