cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ftc
by New Contributor II
  • 1741 Views
  • 1 replies
  • 2 kudos

Can Databricks Certified Data Engineer Professional exam questions be short and easy to understand?

The Databricks Certified Data Engineer Professional exam most questions are too long for those English as second language. Not enough time to read through the questions and sometimes hard to comprehend

  • 1741 Views
  • 1 replies
  • 2 kudos
Latest Reply
eimis_pacheco
Contributor
  • 2 kudos

I strongly agree with you. There is not a Spanish version of this exam. Those exam are long even for native speakers just imagine for people with English as a second language. For instance, since Amazon does not have a Spanish version, they took this...

  • 2 kudos
BF
by Databricks Partner
  • 9403 Views
  • 3 replies
  • 2 kudos

Resolved! Pyspark - How do I convert date/timestamp of format like /Date(1593786688000+0200)/ in pyspark?

Hi all, I've a dataframe with CreateDate column with this format:CreateDate/Date(1593786688000+0200)//Date(1446032157000+0100)//Date(1533904635000+0200)//Date(1447839805000+0100)//Date(1589451249000+0200)/and I want to convert that format to date/tim...

  • 9403 Views
  • 3 replies
  • 2 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 2 kudos

Hi @Bruno Franco​ ,Can you please try the below code, hope it might for you.from pyspark.sql.functions import from_unixtime from pyspark.sql import functions as F final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("Cr...

  • 2 kudos
2 More Replies
whh99
by New Contributor II
  • 2875 Views
  • 2 replies
  • 1 kudos

Given user id, what API can we use to find out which cluster the user is connected to?

I want to know the cluster that user is connected to in databricks. It would be great if we can also get the duration that the user is connected.

  • 2875 Views
  • 2 replies
  • 1 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 1 kudos

You can track activity logs by activating audit logs.I'm not sure which cloud provider you're using, but ex. for Azure you can find a manual here:https://learn.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/audit-logs

  • 1 kudos
1 More Replies
190809
by Contributor
  • 1481 Views
  • 1 replies
  • 1 kudos

What are the requirements in order for the event log to collect backlog metrics?

I am trying to use the event log to collect metrics on the 'flow_progess' under the 'event_type' field. In the the docs it suggests that this information may not be collected based on the data source and runtime used (see screenshot). Can anyone let ...

Screenshot 2022-12-07 at 11.30.43
  • 1481 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16539034020
Databricks Employee
  • 1 kudos

Thanks for contacting Databricks Support! I understand that you're looking for information on unsupported data source types and runtimes for the backlog metrics. Unfortunately, we currently have not documented that information. It's possible that som...

  • 1 kudos
Ak3
by New Contributor III
  • 6268 Views
  • 5 replies
  • 7 kudos

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

  • 6268 Views
  • 5 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 7 kudos

Databricks is the data lake / lakehouse and Azure SQL is the database.

  • 7 kudos
4 More Replies
horatiug
by New Contributor III
  • 4540 Views
  • 4 replies
  • 1 kudos

Databricks workspace with custom VPC using terraform in Google Cloud

I am working on Google Cloud and want to create Databricks workspace with custom VPC using terraform. Is that supported ? If yes is it similar to AWS way ?Thank youHoratiu

  • 4540 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @horatiu guja​ GCP Workspace provisioning using Terraform is public preview now. Please refer to the below doc for the steps.https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/gcp-workspace

  • 1 kudos
3 More Replies
johnb1
by Contributor
  • 9129 Views
  • 4 replies
  • 0 kudos

SELECT from table saved under path

Hi!I saved a dataframe as a delta table with the following syntax:(test_df .write .format("delta") .mode("overwrite") .save(output_path) )How can I issue a SELECT statement on the table?What do I need to insert into [table_name] below?SELECT ...

  • 9129 Views
  • 4 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 0 kudos

Hi @John B​ there is two way to access your delta table-SELECT * FROM delta.`your_delta_table_path`df.write.format("delta").mode("overwrite").option("path", "your_path").saveAsTable("table_name")Now you can use your select query-SELECT * FROM [table_...

  • 0 kudos
3 More Replies
xiaochong
by New Contributor III
  • 1790 Views
  • 1 replies
  • 2 kudos

Is Delta Live Tables planned to be open source in the future?

Is Delta Live Tables planned to be open source in the future?

  • 1790 Views
  • 1 replies
  • 2 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 2 kudos

Hello there @G Z​  I would say "we have a history of open sourcing our biggest innovations but there's no concrete timeline for dlt. It's built on the open APIs of spark and delta, so the most important parts (your transformation logic and you data) ...

  • 2 kudos
joakon
by New Contributor III
  • 4477 Views
  • 4 replies
  • 4 kudos

Resolved! Databricks - Workflow- Jobs- Script to automate

Hi - I have created a Databricks job - under Workflow - its running fine without any issues . I would like to promote this job to other workspaces using a script.Is there a way to script the job definition and deploy it across multiple workspaces .I ...

  • 4477 Views
  • 4 replies
  • 4 kudos
Latest Reply
joakon
New Contributor III
  • 4 kudos

thank you @Landan George​ 

  • 4 kudos
3 More Replies
Dbks_Community
by New Contributor II
  • 2827 Views
  • 2 replies
  • 0 kudos

Cross region Databricks to SQL Connection

We are trying to connect Azure Databricks Cluster to Azure SQL database but the firewalls at SQL level is causing an issue.Whitelisting dbks subnet is not an option here as both the resources are in two different azure regions. Is there a secure way ...

  • 2827 Views
  • 2 replies
  • 0 kudos
Latest Reply
Cedric
Databricks Employee
  • 0 kudos

Hi @Timir Ranjan​,Have you tried looking into private endpoints? This allows you to expose your Azure SQL database from the Azure backbone and is cross-regional supported.https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overviewP...

  • 0 kudos
1 More Replies
StevenW
by New Contributor III
  • 11666 Views
  • 10 replies
  • 0 kudos

Resolved! Large MERGE Statements - 500+ lines of code!

I'm new to databricks. (Not new to DB's - 10+ year DB Developer).How do you generate a MERGE statement in DataBricks? Trying to manually maintain a 500+ or 1000+ lines in a MERGE statement doesn't make much sense? Working with Large Tables of between...

  • 11666 Views
  • 10 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

In my opinion, when possible MERGE statement should be on the primary key. If not possible you can create your own unique key (by concatenate some fields and eventually hashing them) and then use it in merge logic.

  • 0 kudos
9 More Replies
stephansmit
by New Contributor III
  • 6347 Views
  • 3 replies
  • 6 kudos

Why is my lineage extraction not showing up in the Unity Catalog

Im trying to get the lineage graph to work in Unity catalog, however nothing seems to appear even though I followed the docs. I did the following steps1. Created a Unity metastore and attached the workspace to that metastore.2. Created a Single user ...

  • 6347 Views
  • 3 replies
  • 6 kudos
Latest Reply
L_Favre
New Contributor II
  • 6 kudos

@Stephan Smit​ We finally got a solution from level 3 support (Databricks support).You may check your firewall logs.On our side, we had to open communication to "Event Hub endpoint".The destination depends on your workspace region: Azure Databricks r...

  • 6 kudos
2 More Replies
Anonymous
by Not applicable
  • 2205 Views
  • 1 replies
  • 0 kudos

Monitoring

Are there any event streams that are or could be exposed in AWS (such as Cloudwatch Eventbridge events or SNS messages? In particular I'm interested in events that detail jobs being run. The use case here would be for monitoring jobs from our web app...

  • 2205 Views
  • 1 replies
  • 0 kudos
Latest Reply
jessykoo32
New Contributor II
  • 0 kudos

Yes, there are several event streams in AWS that can be used to monitor jobs being run. Your Texas BenefitsCloudWatch Events: This service allows you to set up rules to automatically trigger actions in response to specific events in other AWS service...

  • 0 kudos
Johan_Van_Noten
by New Contributor III
  • 24714 Views
  • 19 replies
  • 10 kudos

Resolved! Correlated column exception in SQL UDF when using UDF parameters.

EnvironmentAzure Databricks 10.1, including Spark 3.2.0ScenarioI want to retrieve the average of a series of values between two timestamps, using a SQL UDF.The average is obviously just an example. In a real scenario, I would like to hide some additi...

  • 24714 Views
  • 19 replies
  • 10 kudos
Latest Reply
creastysomp
New Contributor II
  • 10 kudos

Thanks for your suggestion. The fact that I want to do this in SparkSQL is because there is no underlying SQLServer.

  • 10 kudos
18 More Replies
Labels