cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

David_Billa
by New Contributor III
  • 3083 Views
  • 8 replies
  • 3 kudos

Extract datetime value from the file name

I've the filename as below and I want to extract the datetime values and convert to datetime data type. This_is_new_file_2024_12_06T11_00_49_AM.csvHere I want to extract only '2024_12_06T11_00_49' and convert to datetime value in new field. I tried S...

  • 3083 Views
  • 8 replies
  • 3 kudos
Latest Reply
Walter_C
Databricks Employee
  • 3 kudos

Unfortunately I am not able to make it work with SQL functions

  • 3 kudos
7 More Replies
peritus
by Databricks Partner
  • 8673 Views
  • 5 replies
  • 5 kudos

Synchronize SQLServer tables to Databricks

I'm new to Databricks and, I'm looking to get data from an external database into Databricks and keep it synchronized when changes occur in the source tables. It seems like I may be able to some form of change data capture and the delta live tables. ...

  • 8673 Views
  • 5 replies
  • 5 kudos
Latest Reply
john533
New Contributor III
  • 5 kudos

To synchronize data from an external database into Databricks with change data capture (CDC), you can use Delta Live Tables (DLT). Start by configuring a JDBC connection to your source database and use a CDC tool (like Debezium or database-native CDC...

  • 5 kudos
4 More Replies
GowthamR
by New Contributor II
  • 1434 Views
  • 2 replies
  • 0 kudos

Regarding Unity Catalog Self Assume Capabilities

Hi Team, Good Day! Recently in a Credentials section under Catalog , we have to add the self assume capabilities in the IAM role right.. Is it only for the Roles associated with Unity Catalog or for all the roles?. Thanks, Gowtham

  • 1434 Views
  • 2 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hi @GowthamR,I believe it's only for the roles associated with UC. I was going through this community post on including self-assume capabilities for AWS IAM roles and it's mentioned that this change does not affect storage credentials that are not cr...

  • 0 kudos
1 More Replies
skanapuram
by New Contributor II
  • 2940 Views
  • 4 replies
  • 0 kudos

Error com.databricks.common.client.DatabricksServiceHttpClientException 403 Invalid access token

Hi I got this error "com.databricks.common.client.DatabricksServiceHttpClientException: 403: Invalid access token" during the run of a workflow job. It has been working for a while without error. Nothing has changed in regards to code or cluster. And...

  • 2940 Views
  • 4 replies
  • 0 kudos
Latest Reply
john533
New Contributor III
  • 0 kudos

When the access token used for authentication is invalid or has expired, the error "com.databricks.common.client.DatabricksServiceHttpClientException: 403: Invalid access token" usually appears. Have you looked at the task cluster's driver logs? It m...

  • 0 kudos
3 More Replies
theron
by New Contributor
  • 1394 Views
  • 1 replies
  • 0 kudos

Liquid Clustering - Implementing with Spark Streaming’s foreachBatch Upsert

Hi there!I’d like to use Liquid Clustering in a Spark Streaming process with foreachBatch(upsert). However, I’m not sure of the correct approach.The Databricks documentation suggests using .clusterBy(key) when writing streaming data. In my case, I'm ...

  • 1394 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hi @theron If you have enabled LC while table creation, you must have already specified the cluster column. Thus I don't see a reason to mention  .clusterBy(key) again.Let me know if any questionsCheers!If you want to create a brand new table with LC...

  • 0 kudos
David_Billa
by New Contributor III
  • 2126 Views
  • 1 replies
  • 0 kudos

Create delta table with space in field names per source CSV file

There are few fields in the source CSV file which has spaces in the field names. When I tried to create the table like `SOURCE 1 NAME` as string for few fields, I got an error message like 'INVALID_COLUMN_NAME_AS_PATH' error.  Runtime version is 10.2...

  • 2126 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hi @David_Billa Before ingesting the csv data into the delta table, you could create delta table using the table properties as shown below:CREATE TABLE catalog_name.schema_name.table_name (`a` string, `b c` string) TBLPROPERTIES ( 'delta.minReade...

  • 0 kudos
lauraxyz
by Contributor
  • 2651 Views
  • 4 replies
  • 0 kudos

%run command: Pass Notebook path as a parameter

Hi team!I have a Notebook (notebook A) in workspace and I'd like to execute it with %run command from another Notebook (notebook B).  It works perfect with command: %run /workspace/path/to/notebook/ANow, i want to specify above path in a variable, an...

  • 2651 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

These global temp views are available to all workloads running against a compute resource, but they do not persist beyond the lifecycle of the cluster or the session that created them.You can refer to https://docs.databricks.com/en/views/index.html#t...

  • 0 kudos
3 More Replies
jeremy98
by Honored Contributor
  • 6442 Views
  • 2 replies
  • 0 kudos

How to migrate the data from Postgres to Databricks?

Hello Community,I have a question about migrating data from PostgreSQL to Databricks. My PostgreSQL database receives new data every hour, and I want to synchronize these hourly inserts with the bronze layer in my Databricks catalog.Currently, I’m us...

  • 6442 Views
  • 2 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

Hello Walter,Thank you for your help - you're amazing. I wanted to explain my current challenge in more detail:We have a platform that stores data in PostgreSQL, with a pipeline ingesting millions of rows every hour. We're trying to migrate this data...

  • 0 kudos
1 More Replies
lauraxyz
by Contributor
  • 2070 Views
  • 6 replies
  • 1 kudos

refresh online table: How to get update_id and check status of a specific update

Hi!I have a workflow job to trigger a refresh of an online table. How can I get the update_id with this specific refresh?Also, is it possible to get the status from this specific update_id?Thanks!

  • 2070 Views
  • 6 replies
  • 1 kudos
Latest Reply
lauraxyz
Contributor
  • 1 kudos

Another qq: Since online table has 3 sync mode: Snapshot, Triggered, and Continuous.   when refreshing the online table with w.pipelines.start_update(pipeline_id='{pipeline_id}', full_refresh=True) which sync mode is used by default? 

  • 1 kudos
5 More Replies
Karthik_2
by New Contributor
  • 3846 Views
  • 1 replies
  • 0 kudos

ODBC driver-System.Data.Odbc.OdbcException: 'ERROR [IM002] [Microsoft][ODBC Driver Manager] Data sou

Hi there,I’m working on a POC to connect a C# application to query tables from Unity Catalog using the ODBC connector. Currently, I’m testing this locally using Visual Studio. I followed the steps in the ODBC documentation, but I’m encountering the f...

  • 3846 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The error message you encountered, "ERROR [ODBC Driver Manager] Data source name not found and no default driver specified," typically indicates that the ODBC driver manager cannot find the specified data source name (DSN) or that no default driver i...

  • 0 kudos
jeremy98
by Honored Contributor
  • 1514 Views
  • 1 replies
  • 0 kudos

Resolved! Move on DLT Pipelines or CDF Delta Tables?

Hello Community,I have a basic question that I’ve been thinking about lately. Is it better to use DLT Pipelines or CDF Delta Tables for handling a medallion architecture?I understand that DLT Pipelines offer some shortcuts, but are they a good choice...

  • 1514 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @jeremy98, When deciding between using Delta Live Tables (DLT) Pipelines and Change Data Feed (CDF) Delta Tables for handling a medallion architecture, there are several factors to consider.   DLT Pipelines:   Automation and Management: DLT Pipeli...

  • 0 kudos
minhhung0507
by Valued Contributor
  • 4785 Views
  • 5 replies
  • 3 kudos

Resolved! Handling Dropped Records in Delta Live Tables with Watermark - Need Optimization Strategy

Hi Databricks Community,I'm encountering an issue with watermarks in Delta Live Tables that's causing data loss in my streaming pipeline. Let me explain my specific problem:Current SituationI've implemented watermarks for stateful processing in my De...

  • 4785 Views
  • 5 replies
  • 3 kudos
Latest Reply
minhhung0507
Valued Contributor
  • 3 kudos

 Dear @VZLA, @Walter_C ,I wanted to take a moment to express my sincere gratitude for your incredibly detailed explanation and thoughtful suggestions. Your guidance has been immensely valuable and has provided us with a clear path forward in addressi...

  • 3 kudos
4 More Replies
Fikrat
by Databricks Partner
  • 3637 Views
  • 6 replies
  • 0 kudos

Resolved! Can SQL task pass its outputs to ForEach task?

Hi there,If I understood correctly, Roland said output SQL task can be used as input to ForEach task in Workflows. I tried that and used the expression sqlTaskName.output.rows, but Databricks rejected that expression. Anyone know how to do that? 

  • 3637 Views
  • 6 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Our internal teams has confirmed that this is currently not working on your side as this feature is currently in Private preview we will need to wait for some time until it is fully released.

  • 0 kudos
5 More Replies
jorperort
by Contributor
  • 20104 Views
  • 8 replies
  • 4 kudos

Resolved! [Databricks Assets Bundles] no deployment state

Good morning, I'm trying to run: databricks bundle run --debug -t dev integration_tests_job My bundle looks: bundle: name: x include: - ./resources/*.yml targets: dev: mode: development default: true workspace: host: x r...

Data Engineering
Databricks Assets Bundles
Deployment Error
pid=265687
  • 20104 Views
  • 8 replies
  • 4 kudos
Latest Reply
jtberman
Databricks Partner
  • 4 kudos

Hello, Reopening this ticket in hopes that either of you had some luck in resolving your bug.  I am currently facing the same issue where I can deploy an asset bundle via the local CLI without issue (by deploy I mean the bundle code is written to my ...

  • 4 kudos
7 More Replies
tinai_long
by New Contributor III
  • 16248 Views
  • 12 replies
  • 6 kudos

Resolved! How to refresh a single table in Delta Live Tables?

Suppose I have a Delta Live Tables framework with 2 tables: Table 1 ingests from a json source, Table 2 reads from Table 1 and runs some transformation.In other words, the data flow is json source -> Table 1 -> Table 2. Now if I find some bugs in the...

  • 16248 Views
  • 12 replies
  • 6 kudos
Latest Reply
cpayne_vax
New Contributor III
  • 6 kudos

Answering my own question: nowadays (February 2024) this can all be done via the UI.When viewing your DLT pipeline there is a "Select tables for refresh" button in the header. If you click this, you can select individual tables, and then in the botto...

  • 6 kudos
11 More Replies
Labels