cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ashraf1395
by Honored Contributor
  • 314 Views
  • 1 replies
  • 0 kudos

Updating a streaming table in dlt

Can we update a streaming table in dlt my source and target will be same. That is the update should be made on the same table. If yes then can you guide me how.I tried append_flow but it just appends dataCDC I am not sure whether we can have both tar...

  • 314 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You can define the CDC flow to update the streaming table. This involves reading from the same table and applying changes. @Dlt.table( name="my_streaming_table", comment="This table is updated using CDC", table_properties={"quality": "sil...

  • 0 kudos
roshanjoebenny
by New Contributor III
  • 236 Views
  • 1 replies
  • 0 kudos

How to create separate groups like datascience(example) in unity catalog

How to create separate groups like datascience(example) in unity catalog.How to assign different roles like dataanalyst etc to the login users in unity catalog and based on their role I can give privileage

  • 236 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

In databricks there is no specific roles that contain a set of permissions by default that you can give to your users. You can indeed create a group called datascience and dataanalyst on your workspace via Settings > Identity and Access > Groups, but...

  • 0 kudos
ashraf1395
by Honored Contributor
  • 784 Views
  • 3 replies
  • 0 kudos

Getting error while using Live.target_table in dlt pipeline

I have created a target table in the same dlt pipeline. But when I read that table in different block of notebook with Live.table_path. It is not able to read it Here is my code block 1 Creating a streaming table # Define metadata tables catalog = sp...

  • 784 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Cant we use Live.table_name on a target dlt table with @Dlt.append_flow decorator.If yes can you share the code bcz when I tried I am getting error.

  • 0 kudos
2 More Replies
h_h_ak
by Contributor
  • 1138 Views
  • 5 replies
  • 2 kudos

Frequent “GetPathStatus” and “GetBlobProperties” PathNotFound Errors on Azure Storage in Databricks

We are encountering frequent GetPathStatus and GetBlobProperties errors when trying to access Azure Data Lake Storage (ADLS) paths through our Databricks environment. The errors consistently return a 404 PathNotFound status for paths that should be a...

h_h_ak_0-1729867508203.png
Data Engineering
ADLSG2
spark
  • 1138 Views
  • 5 replies
  • 2 kudos
Latest Reply
h_h_ak
Contributor
  • 2 kudos

Adding answer from MSFT Support Team:Why is there _delta_log being checked when the function used is parquet.The _delta_log directory is being checked because the system is designed to scan directories and their parent directories to look for a Delta...

  • 2 kudos
4 More Replies
ashraf1395
by Honored Contributor
  • 707 Views
  • 2 replies
  • 2 kudos

Resolved! Old files also getting added in dlt autoloader

So , I am using autoloader in a dlt pipeline for my data ingestion. I am using @Dlt.append_flow because I have data to load from multiple sources.When I load a new file say x it has 3 rows my target gets 3 rows. But next even if I don't load any file...

  • 707 Views
  • 2 replies
  • 2 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 2 kudos

Hi @ashraf1395, Just a few comments about your question: The cloudFiles source in Databricks is designed for incremental file processing. However, it depends on the checkpoint directory to track which files have been processed. The cloudFiles.include...

  • 2 kudos
1 More Replies
AlexeyEgorov
by New Contributor II
  • 1161 Views
  • 1 replies
  • 0 kudos

foreach execution faulty with number of partitions >= worker cores

In order to download multiple wikipedia dumps, I collected the links in the list and wanted to use foreach method to iterate over those links and apply a UDF that downloads the data in the previously created volume structure. However, I ran into an i...

  • 1161 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It seems like the issue you're encountering with incomplete file downloads when using the foreach method and a UDF in Spark might be related to the number of partitions and how tasks are distributed across them. Here are a few points to consider: Ta...

  • 0 kudos
kirkj
by New Contributor
  • 1539 Views
  • 1 replies
  • 0 kudos

Can Databricks write query results to s3 in another account via the API

I work for a company where we are trying to create a Databrick's integration in node using the @DataBricks/sql package to query customers clusters or warehouses.  I see documentation of being able to load data via a query from s3 using STS tokens whe...

  • 1539 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Have you been able to get a response on this topic, based on the information I can see it might not be supported to write on an S3 outside your account

  • 0 kudos
jeremy98
by Contributor III
  • 402 Views
  • 1 replies
  • 0 kudos

Resolved! unvalidated the primary and foreign keys constraints?

Hello community,I'm inserting in a table defined (with primary key and foreign key set) some records in overwrite mode every moment I run a workflow where the task is defined. Why after inserting those records the DDL schema changes? Why I have my pr...

  • 402 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @jeremy98, When you use the "insert overwrite" mode in Databricks, it can lead to the schema being reset, which includes the removal of primary and foreign key constraints. This happens because the "insert overwrite" operation essentially replaces...

  • 0 kudos
Dhanushn
by New Contributor
  • 11340 Views
  • 1 replies
  • 0 kudos

Concurrent issue on delta lake insert update

Hey team! I need your help on delta lake let me explain the scenario of mine.Scenario: ive a table in delta lake and ive 2 databricks workflows running parallely which has insert and update tasks to do.My delta table is partitioned with country codeM...

  • 11340 Views
  • 1 replies
  • 0 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 0 kudos

Hi, @Dhanushn In response to your question, the community contains the following information:https://community.databricks.com/t5/community-platform-discussions/concurrent-update-to-delta-throws-error/td-p/65599https://kb.databricks.com/en_US/delta/in...

  • 0 kudos
aupres
by New Contributor III
  • 419 Views
  • 1 replies
  • 0 kudos

how to generate log files on specific folders

Hello! My environments are like below, OS : Windows 11 Spark : spark-4.0.0-preview2-bin-hadoop3 And the configuration of spark files 'spark-defaults.conf' and 'log4j2.properties'spark-defaults.conf spark.eventLog.enabled true spark.event...

  • 419 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @aupres, Do you see any failures in spark logs? Few things to validate: It appears that the log files are not being generated in the specified directory due to a misconfiguration in your log4j2.properties fil   Check the Appender Configuration: En...

  • 0 kudos
vanshikagupta
by New Contributor II
  • 8246 Views
  • 3 replies
  • 0 kudos

conversion of code from scala to python

does databricks community edition provides with databricks ML visualization for pyspark, just the same as provided in this link for scala. https://docs.azuredatabricks.net/_static/notebooks/decision-trees.html also please help me to convert this lin...

  • 8246 Views
  • 3 replies
  • 0 kudos
Latest Reply
thelogicplus
Contributor
  • 0 kudos

you may explore the tool and services from Travinto Technologies . They have very good tools. We had explored their tool for our code coversion from  Informatica, Datastage and abi initio to DATABRICKS , pyspark. Also we used for SQL queries, stored ...

  • 0 kudos
2 More Replies
LightUp
by New Contributor III
  • 8598 Views
  • 3 replies
  • 4 kudos

Converting SQL Code to SQL Databricks

I am new to Databricks. Please excuse my ignorance. My requirement is to convert the SQL query below into Databricks SQL. The query comes from EventLog table and the output of the query goes into EventSummaryThese queries can be found hereCREATE TABL...

image
  • 8598 Views
  • 3 replies
  • 4 kudos
Latest Reply
thelogicplus
Contributor
  • 4 kudos

you may explore the tool and services from Travinto Technologies . They have very good tools. We had explored their tool for our code coversion from  Informatica, Datastage and abi initio to DATABRICKS , pyspark. Also we used for SQL queries, stored ...

  • 4 kudos
2 More Replies
MartinIsti
by New Contributor III
  • 3377 Views
  • 2 replies
  • 0 kudos

Python UDF in Unity Catalog - spark.sql error

I'm trying to utilise the option to create UDFs in Unity Catalog. That would be a great way to have functions available in a fairly straightforward manner without e.g. putting the function definitions in an extra notebook that I %run to make them ava...

Data Engineering
function
udf
  • 3377 Views
  • 2 replies
  • 0 kudos
Latest Reply
Linglin
New Contributor III
  • 0 kudos

I came across the same problem. inside unity catalog UDF creation, spark.sql or spark.table doesn't work.Adding from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() into the session doesn't work as wellDon't know how to sol...

  • 0 kudos
1 More Replies
Tahseen0354
by Valued Contributor
  • 25467 Views
  • 9 replies
  • 5 kudos

Resolved! Getting "Job aborted due to stage failure" SparkException when trying to download full result

I have generated a result using SQL. But whenever I try to download the full result (1 million rows), it is throwing SparkException. I can download the preview result but not the full result. Why ? What happens under the hood when I try to download ...

  • 25467 Views
  • 9 replies
  • 5 kudos
Latest Reply
ac567
New Contributor III
  • 5 kudos

Job aborted due to stage failure: Task 6506 in stage 46.0 failed 4 times, most recent failure: Lost task 6506.3 in stage 46.0 (TID 12896) (10.**.***.*** executor 12): java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct buffer memory (a...

  • 5 kudos
8 More Replies
udays22222
by New Contributor II
  • 5363 Views
  • 6 replies
  • 1 kudos

Error writing data to Google Bigquery

Hi,I am able to read data from a Bigquery table ,But am getting error writing data to a table in BigqueryFollowed instuctions in this document.Connecting Databricks to BigQuery | Google Cloud%scalaimport scala.io.Sourceval contentCred = "/dbfs/FileSt...

  • 5363 Views
  • 6 replies
  • 1 kudos
Latest Reply
GeoPer
New Contributor III
  • 1 kudos

@udays22222 did you find any solution on this one? I face the same problem when I use Shared (Access mode) cluster. I can read but I cannot write with the error you mentioned.

  • 1 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels