cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

180122
by New Contributor II
  • 2569 Views
  • 3 replies
  • 1 kudos

Data Engineering Professional - Practice exam?

Hi, when will we get Practice Exams for this the Data Engineering Professional Certification Exam? It seems like we already have it for a good amount of the associate exams, and this Professional exam seems more difficult than the associate ones, so ...

  • 2569 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @180122  Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too. Cheers!

  • 1 kudos
2 More Replies
bradleyjamrozik
by New Contributor III
  • 4263 Views
  • 3 replies
  • 3 kudos

Resolved! Questions about Lineage and DLT

Hey there!1. Does column lineage work across multiple catalogs and schemas?2. Do Delta Live Tables support lineage? If yes does that work across multiple pipelines or only with a single one?

  • 4263 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @bradleyjamrozik  We haven't heard from you since the last response from @Vinay_M_R and @erigaud , and I was checking back to see if her suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be he...

  • 3 kudos
2 More Replies
YS1
by Contributor
  • 2312 Views
  • 3 replies
  • 1 kudos

Updating tables from SQL Server to Databricks

Hi,I have SQL Server tables which are the primary location for all live transactions happen and currently I read them through pyspark as dataframes and overwrite them everyday to have the latest copy of them in Databricks. The problem is it takes lon...

  • 2312 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @YS1  Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too. Cheers!

  • 1 kudos
2 More Replies
samuraidjakk
by New Contributor II
  • 2046 Views
  • 2 replies
  • 1 kudos

Resolved! Lineage from Unity Catalog on GCP

We are in the prosess of trying to do a PoC of our pipelines using DLT. Normally, we use another tool and we have created a custom program to extract lineage. We want to try to get / display lineage using Unity Catalog.But.. we are on GCP, and it see...

  • 2046 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @samuraidjakk  Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best? If not, please tell us so we can help you. Thanks!

  • 1 kudos
1 More Replies
SRK
by Contributor III
  • 10072 Views
  • 6 replies
  • 5 kudos

Resolved! How to deploy Databricks SQL queries and SQL Alerts from lower environment to higher environment?

We are using Databricks SQL Alerts to handle one scenario. We have written the queries for the same, also we have created the SQL Alert. However, I was looking for the best way to deploy it on Higher Environments like Pre-Production and Production.I ...

  • 10072 Views
  • 6 replies
  • 5 kudos
Latest Reply
valeryuaba
New Contributor III
  • 5 kudos

Thanks!

  • 5 kudos
5 More Replies
erigaud
by Honored Contributor
  • 11973 Views
  • 7 replies
  • 5 kudos

Resolved! Autoloader Excel Files

Hello, I looked at the documentation but could not find what I wanted. Is there a way to load Excel files using an autoloader and if yes, what options should be given to specify format, sheet name etc ? Thank you friends !

  • 11973 Views
  • 7 replies
  • 5 kudos
Latest Reply
Hemant
Valued Contributor II
  • 5 kudos

Unfortunately, Databricks autoloader doesn't support Excel file types to incrementally load new files.Link:https://docs.databricks.com/ingestion/auto-loader/options.html If your excel file contains a single sheet then there is a workaround.

  • 5 kudos
6 More Replies
sumit23
by New Contributor
  • 1791 Views
  • 0 replies
  • 0 kudos

[Error] [SECRET_FUNCTION_INVALID_LOCATION]: While running secret function with create or replace

Hi, recently we made an upgrade to our databricks warehouse, transitioning from SQL Classic to SQL PRO.However, we started encountering the following error message when attempting to execute the "CREATE or REPLACE" table query with the secret functio...

  • 1791 Views
  • 0 replies
  • 0 kudos
BasavarajAngadi
by Contributor
  • 4771 Views
  • 4 replies
  • 1 kudos

Resolved! Question on Transaction logs and versioning in data bricks ?

Hi Experts ,No doubt data bricks supports ACID properties. What when it comes to versioning how much such versions will data bricks captures ? For Example : If i do any DML operations on top of Delta tables every time when i do it captures the tran...

  • 4771 Views
  • 4 replies
  • 1 kudos
Latest Reply
stefnhuy
New Contributor III
  • 1 kudos

Hey,As a data enthusiast myself, I find this topic quite intriguing. Data Bricks indeed does a fantastic job in supporting ACID properties, ensuring data integrity, and allowing for versioning.To address BasavarajAngadi's question, Data Bricks effici...

  • 1 kudos
3 More Replies
hamzatazib96
by New Contributor III
  • 87726 Views
  • 21 replies
  • 12 kudos

Resolved! Read file from dbfs with pd.read_csv() using databricks-connect

Hello all, As described in the title, here's my problem: 1. I'm using databricks-connect in order to send jobs to a databricks cluster 2. The "local" environment is an AWS EC2 3. I want to read a CSV file that is in DBFS (databricks) with pd.read_cs...

  • 87726 Views
  • 21 replies
  • 12 kudos
Latest Reply
so16
New Contributor II
  • 12 kudos

Please guys I need your help, I got the same issue still after readed all your comments.I am using Databricks-connect(version 13.1) on pycharm and trying to load file that are on the dbfs storage.spark = DatabricksSession.builder.remote( host=host...

  • 12 kudos
20 More Replies
dataengineer17
by New Contributor II
  • 24165 Views
  • 5 replies
  • 3 kudos

Databricks execution failed with error state: InternalError, error message: failed to update run

I am receiving this error Databricks execution failed with error state: InternalError, error message: failed to update run GlobalRunId(xx,RunId(yy))This is appears as an error message in azure data factory when I use it to schedule a databricks noteb...

  • 24165 Views
  • 5 replies
  • 3 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 3 kudos

@dataengineer17 It could be coming from the internal jobs service, If the issue persists I would recommend creating a support ticket.

  • 3 kudos
4 More Replies
charry
by New Contributor II
  • 12423 Views
  • 5 replies
  • 9 kudos

Creating a Spark DataFrame from a very large dataset

I am trying to create a DataFrame using Spark but am having some issues with the amount of data I'm using. I made a list with over 1 million entries through several API calls. The list was above the threshold for spark.rpc.message.maxSize and it was ...

  • 12423 Views
  • 5 replies
  • 9 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 9 kudos

Hey @charry Look at this KB article, this should help address the issue.https://kb.databricks.com/execution/spark-serialized-task-is-too-large

  • 9 kudos
4 More Replies
HariharaSam
by Contributor
  • 129576 Views
  • 6 replies
  • 3 kudos

Resolved! Alter Delta table column datatype

Hi ,I am having a delta table and table contains data and I need to alter the datatype for a particular column.For example :Consider the table name is A and column name is Amount with datatype Decimal(9,4).I need alter the Amount column datatype from...

  • 129576 Views
  • 6 replies
  • 3 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 3 kudos

Hi @HariharaSam The following documents the info about how to alter a Delta table schema.https://docs.databricks.com/delta/update-schema.html

  • 3 kudos
5 More Replies
Data_Engineer_3
by New Contributor III
  • 22206 Views
  • 12 replies
  • 4 kudos

FileNotFoundError: [Errno 2] No such file or directory: '/FileStore/tables/flight_data.zip' The data and file exists in location mentioned above

I am new to learning Spark and working on some practice; I have uploaded a zip file in DBFS /FileStore/tables directory and trying to run a python code to unzip the file; The python code is as: from zipfile import *with ZipFile("/FileStore/tables/fli...

  • 22206 Views
  • 12 replies
  • 4 kudos
Latest Reply
883022
New Contributor II
  • 4 kudos

What if changing the runtime is not an option? I'm experiencing a similar issue using the following:%pip install -r /dbfs/path/to/file.txtThis worked for a while, but now I'm getting the Errno 2 mentioned above. I am still able to print the same file...

  • 4 kudos
11 More Replies
Murthy1
by Contributor II
  • 3596 Views
  • 2 replies
  • 0 kudos

Terraform - Install egg file from S3

I am looking to install Python Egg files on all my clusters. The egg file is located in a S3 location. I tried using the following code which didn't work   resource "databricks_dbfs_file" "app" { source = "${S3_Path}/foo.egg" path = "/FileStore...

  • 3596 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Murthy1  Does @Retired_mod  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? We'd love to hear from you. Thanks!

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels