cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

weldermartins
by Honored Contributor
  • 23273 Views
  • 7 replies
  • 35 kudos

Resolved! pyspark - regexp_extract

hello everyone, I'm creating a regex expression to fetch only the value of a string, but some values ​​are negative. I am not able to create the rule to compose the negative value. can you help me?from pyspark.sql.functions import regexp_extract fro...

image
  • 23273 Views
  • 7 replies
  • 35 kudos
Latest Reply
ErinArmistead
New Contributor II
  • 35 kudos

Have you found the answer? If you are a student in college or school searching for free essay examples online, you may want to visit the website https://writinguniverse.com/free-essay-examples/soccer/ here you will find a vast collection of free essa...

  • 35 kudos
6 More Replies
Baumeister
by New Contributor II
  • 7469 Views
  • 2 replies
  • 0 kudos

Error when importing .dbc of a complete Workspace

I saved the content of an older Databricks Workspace by clicking on the Dropdown next to Workspace -> Export -> DBC Archive and saved it on my local machine.In a new Databricks Workspace, I now want to import That .DBC archive to restore the previous...

dbcerror
  • 7469 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Sebastian K​ :It looks like the error you are facing while importing the DBC archive could be due to the version incompatibility between the Databricks instance where you created the DBC archive and the one where you are trying to import it. Can you...

  • 0 kudos
1 More Replies
bluesky111
by New Contributor II
  • 2812 Views
  • 1 replies
  • 3 kudos

Resolved! I Input the wrong schedule time for the exams can it be reschedule ?

Helo today ,i think i was scheduled to do an exams at 2.15 PM but unfortunately i made a mistake put the time to 2.15 AM, could it be rescheduled? i already submit a ticket to https://help.databricks.com/s/contact-us?ReqType=training but no reply yet...

  • 2812 Views
  • 1 replies
  • 3 kudos
Latest Reply
APadmanabhan
Databricks Employee
  • 3 kudos

Hello @heron halim,​ If the exam time and date have already passed, we cannot help in the situation; we can only change the time/date of the exam if we are notified a minimum of 30 hours before the exam date/time. Test-takers must ensure they check t...

  • 3 kudos
Harun
by Honored Contributor
  • 6444 Views
  • 5 replies
  • 6 kudos

how to load structured stream data into delta table whose location is in ADLS Gen2

Hi All,I am working on a streaming data processing. As a intial step i have read the data from azure eventhub using readstream. now i want to writestream this into a delta table. My requirement is, The data should present in external location (adls g...

  • 6444 Views
  • 5 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

There are a couple ways to connect to ADLS Gen2. Please refer to below doc. For instance, if you decide to go by service principal method, you need to add below storage account configurations details to the cluster or notebooks. Same goes for storag...

  • 6 kudos
4 More Replies
KVNARK
by Honored Contributor II
  • 4472 Views
  • 1 replies
  • 4 kudos

Resolved! Deploying global parameters from lower to higher env in ADF

how can we deploy global parameters from dev to higher environments in ADF. Could anyone throw some light on this.I'm using GIT in DEV and deploying it to PROD using Azure CICD pipeline.

  • 4472 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@KVNARK .​ : To deploy global parameters from dev to higher environments in Azure Data Factory (ADF), you can follow these steps:In your DEV environment, create the global parameters in ADF and save them.Commit and push the changes to your Git reposi...

  • 4 kudos
alvaro_databric
by New Contributor III
  • 3476 Views
  • 1 replies
  • 2 kudos

Resolved! Fastest Azure VM for Databricks Big Data workload

Hi All,It is well known that Azure provides a wide variety of VM for Databricks, some of which provide powerful features such as Photon and Delta Caching. I would like to ask the community which do you think is the fastests cluster for performing Big...

  • 3476 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Alvaro Moure​ :The performance of a Databricks cluster for big data operations depends on many factors, such as the amount and structure of the data, the nature of the operations being performed, the configuration of the cluster, and the specific re...

  • 2 kudos
Sujitha
by Databricks Employee
  • 7927 Views
  • 0 replies
  • 2 kudos

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week. Databricks platform release notesMarch 13 -...

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week.Databricks platform release notesMarch 13 - 17, 2023Execute SQL cells in the notebook in parallelYou can now run SQL cells in Databricks noteboo...

  • 7927 Views
  • 0 replies
  • 2 kudos
Arunsundar
by New Contributor III
  • 4488 Views
  • 4 replies
  • 4 kudos

The possibility of finding the workload dynamically and spin up the cluster based on the workload

Hi Team,Good morning. I would like to understand if there is a possibility to determine the workload automatically through code (data load from a file to a table, determine the file size, kind of a benchmark that we can check), based on which we can ...

  • 4488 Views
  • 4 replies
  • 4 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 4 kudos

Hi @Arunsundar Muthumanickam​ , When you say workload, I believe you might be handling various volumes of data between Dev and Prod environment. If you are using Databricks cluster and do not have much idea on how the volumes might turn out in differ...

  • 4 kudos
3 More Replies
DipakBachhav
by New Contributor III
  • 17252 Views
  • 3 replies
  • 3 kudos

Resolved! Geting error Caused by: com.databricks.NotebookExecutionException: FAILED

I am trying to run the below notebook through databricks but getting the below error. I have tried to update the notebook timeout and the retry mechanism but still no luck yet.   NotebookData("/Users/mynotebook",9900, retry=3)   ]   res = parallelNot...

  • 17252 Views
  • 3 replies
  • 3 kudos
Latest Reply
sujai_sparks
New Contributor III
  • 3 kudos

Hi @Dipak Bachhav​, not sure if you have fixed the issue, but here are few things you can check: Is the path "/Users/mynotebook" correct? Maybe you are missing the dot in the beginning.Run the notebook using dbutils.notebook.run("/Users/mynotebook") ...

  • 3 kudos
2 More Replies
ossinova
by Contributor II
  • 2503 Views
  • 1 replies
  • 2 kudos

PIVOT on month and quarter

I want to simplify this query:SELECT year(EntryDate) Year, AccountNumber, sum(CreditBase - DebitBase) FILTER(WHERE month(EntryDate) = 1) AS jan_total, sum(CreditBase - DebitBase) FILTER(WHERE month(EntryDate) = 2) AS feb_total, sum(CreditBase - Debi...

  • 2503 Views
  • 1 replies
  • 2 kudos
Latest Reply
Lakshay
Databricks Employee
  • 2 kudos

Hi @Oscar Dyremyhr​ , PIVOT doesn't support two FOR clauses. You can PIVOT either on month or on quarter.https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-pivot.html

  • 2 kudos
Dale_Ware
by New Contributor III
  • 5481 Views
  • 2 replies
  • 3 kudos

Resolved! How to query a table with backslashes in the name.

I am trying to query a snowflake table from a databricks data frame similar to the following example.sql_query = "select * from Database.Schema.Table_/Name_/V"sqlContext.sql(f"{sql_query}" ) And I get an error like this.ParseException: [PARSE_SYNTAX_...

  • 5481 Views
  • 2 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

You can use Double Quotes to get the plan. Using quotes it is important to write the table names in capital letters.SELECT * FROM "/TABLE/NAME"

  • 3 kudos
1 More Replies
ramankr48
by Contributor II
  • 21351 Views
  • 5 replies
  • 8 kudos

Resolved! How to get all the tables name with a specific column or columns in a database?

let's say there is a database db in which 700 tables are there, and we need to find all the tables name in which column "project_id" is present.just an example for ubderstanding the questions.

  • 21351 Views
  • 5 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

databaseName = "db" desiredColumn = "project_id" database = spark.sql(f"show tables in {databaseName} ").collect() tablenames = [] for row in database: cols = spark.table(row.tableName).columns if desiredColumn in cols: tablenames.append(row....

  • 8 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 3610 Views
  • 3 replies
  • 1 kudos

Resolved! Upsert When the Origin NOT Exists, but you need to change status in the target

Hi guys,I have a question about upsert/merge ... What do you do when que origin NOT exists, but you need to change status in the target​For exemple:01/03 : source dataset [ id =1 and status = Active] ; target table [*not exists*] >> in this time the ...

  • 3610 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hello @William Scardua​ , Just adding to what @Vigneshraja Palaniraj​ replied.Reference: https://docs.databricks.com/sql/language-manual/delta-merge-into.htmlThanks & Regards,Nandini

  • 1 kudos
2 More Replies
Ovi
by New Contributor III
  • 6557 Views
  • 5 replies
  • 3 kudos

Resolved! Filter only Delta tables from an S3 folders list

Hello everyone,From a list of folders on s3, how can I filter which ones are Delta tables, without trying to read each one at a time?Thanks,Ovi

  • 6557 Views
  • 5 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

Hello @Ovidiu Eremia​ ,To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named _delta_log, which is located at the root of ...

  • 3 kudos
4 More Replies
Dataengineer_mm
by New Contributor
  • 3902 Views
  • 1 replies
  • 1 kudos

Surrogate key using identity column.

I want to create a surrogate in the delta table And i used the identity column id-Generated as DefaultCan i insert rows into the delta table using only spark.sql like Insert query ? or i can also use write delta format options? If i use the df.write ...

  • 3902 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hello @Menaka Murugesan​ ,If you are using the identity column, I believe you would have created the table as below, (starts with value 1 and step 1)CREATE TABLE my_table ( id INT IDENTITY (1, 1) PRIMARY KEY, value STRING )You can insert values i...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels