cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

AnkithP
by New Contributor
  • 1360 Views
  • 1 replies
  • 1 kudos

Infer schema eliminating leading zeros.

Upon reading a CSV file with schema inference enabled, I've noticed that a column originally designated as string datatype contains numeric values with leading zeros. However, upon reading the data to Pyspark data frame, it undergoes automatic conver...

  • 1360 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

if you set .option("inferSchema", "false") all columns will be read as string.You will have to cast all the other columns to their appropriate type though.  So passing a schema seems easier to me.

  • 1 kudos
zmsoft
by New Contributor II
  • 479 Views
  • 1 replies
  • 0 kudos

Why is Dlt pipeline processing streaming data so slow?

Running a single table is fast, but running 80 tables at the same time takes a long time, is it serial queued execution? Isn't it concurrent?

  • 479 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @zmsoft,  The processing power of the nodes running your Dlt pipeline matters. Using more powerful node types can significantly impact performance.Consider using a more robust node type, such as the Standard_E16ds_v4 or Standard_E32ds_v4.

  • 0 kudos
PrebenOlsen
by New Contributor III
  • 1203 Views
  • 2 replies
  • 0 kudos

Job stuck while utilizing all workers

Hi!Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.I can find one Failed Stage that reads ...

Data Engineering
job failed
Job froze
need help
  • 1203 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.The data is read when you apply an action (write f.e.).That being said:  I have no knowledge of a bug in Databricks on clusters...

  • 0 kudos
1 More Replies
laurenskuiper97
by New Contributor
  • 980 Views
  • 1 replies
  • 0 kudos

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Hi everybody,I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this co...

Data Engineering
clusters
JDBC
spark
SSH
  • 980 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I doubt it is possible.The driver runs the program, and sends tasks to the executors.  But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

  • 0 kudos
Jotav93
by New Contributor II
  • 1105 Views
  • 2 replies
  • 1 kudos

Move a delta table from a non UC metastore to a UC metastore preserving history

Hi, I am using Azure databricks and we recently enabled UC in our workspace. We have some tables in our non UC metastore that we want to move to a UC enabled metastore. Is there any way we can move these tables without loosing the delta table history...

Data Engineering
delta
unity
  • 1105 Views
  • 2 replies
  • 1 kudos
Latest Reply
ThomazRossito
Contributor
  • 1 kudos

Hello,It is possible to have the expected result with dbutils.fs.cp("Origin location", "Destination location", True) and then create the table with the LOCATION of the Destination locationHope this helps

  • 1 kudos
1 More Replies
MathewDRitch
by New Contributor II
  • 1351 Views
  • 3 replies
  • 1 kudos

Connecting from Databricks to Network Path

Hi All,Will appreciate if someone can help me with some references links on connecting from Databricks to external network path. I have Databricks on AWS and previously used to connect to files on external network path using Mount method. Now Databri...

  • 1351 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I don't think that it is possible at the moment.  UC focuses on cloud data.You might want to try to use Minio, but apparently UC does not support Minio yetPity, because that would be an awesome solution.

  • 1 kudos
2 More Replies
Dp15
by Contributor
  • 974 Views
  • 2 replies
  • 2 kudos

Using UDF in an insert command

Hi,I am trying to use a UDF to get the last day of the month and use the boolean result of the function in an insert command. Please find herewith the function and the my query.function:import calendarfrom datetime import datetime, date, timedeltadef...

  • 974 Views
  • 2 replies
  • 2 kudos
Latest Reply
Dp15
Contributor
  • 2 kudos

Thank you @Kaniz_Fatma for your detailed explanation

  • 2 kudos
1 More Replies
Kroy
by Contributor
  • 6386 Views
  • 8 replies
  • 1 kudos

Resolved! What is difference between streaming and streaming live table

Can anyone explain in layman what is difference between Streaming and streaming live table ?

  • 6386 Views
  • 8 replies
  • 1 kudos
Latest Reply
CharlesReily
New Contributor III
  • 1 kudos

Streaming, in a broad sense, refers to the continuous flow of data over a network. It allows you to watch or listen to content in real-time without having to download the entire file first.  A "Streaming Live Table" might refer to a specific type of ...

  • 1 kudos
7 More Replies
kiko_roy
by Contributor
  • 9137 Views
  • 5 replies
  • 3 kudos

Resolved! Permission error loading dataframe from azure unity catalog to GCS bucket

I am creating a data frame by reading a table's data residing in Azure backed unity catalog. I need to write the df or file to GCS bucket. I have configured the spark cluster config using the GCP service account json values.on running : df1.write.for...

Data Engineering
GCS bucket
permission error
  • 9137 Views
  • 5 replies
  • 3 kudos
Latest Reply
ruloweb
New Contributor II
  • 3 kudos

Hi, is there any terraform resource to apply this GRANT or this have to be done always manually?

  • 3 kudos
4 More Replies
leireroman
by New Contributor III
  • 614 Views
  • 1 replies
  • 0 kudos

Bootstrap Timeout during job cluster start

My job was not able to start because I got this problem in the job cluster.This job is running on a Azure Databricks workspace that has been deployed for almost a year and I have not had this error before. It is deployed in North Europe.After getting...

leireroman_0-1713160992292.png
  • 614 Views
  • 1 replies
  • 0 kudos
Latest Reply
lukasjh
New Contributor II
  • 0 kudos

We have the same problem randomly occurring since yesterday in two workspaces.The cluster started fine today in the morning at 08:00, but failed again from around 09:00 on. 

  • 0 kudos
Anske
by New Contributor III
  • 2108 Views
  • 1 replies
  • 0 kudos

One-time backfill for DLT streaming table before apply_changes

Hi,absolute Databricks noob here, but I'm trying to set up a DLT pipeline that processes cdc records from an external sql server instance to create a mirrored table in my databricks delta lakehouse. For this, I need to do some initial one-time backfi...

Data Engineering
Delta Live Tables
  • 2108 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anske
New Contributor III
  • 0 kudos

So since nobody responded, I decided to try my own suggestion and hack the snapshot data into the table that gathers the change data capture. After some straying I ended up with the notebook as attached.The notebook first creates 2 dlt tables (lookup...

  • 0 kudos
cubanDataDude
by New Contributor II
  • 646 Views
  • 1 replies
  • 1 kudos

Job Claiming NotebooKNotFound Incorrectly (seemingly)

I have the code captured below in the screenshot. When I run this individually it works just fine, when I JOB runs this it fails out with 'ResourceNotFound' - not sure what the issue is... - Checked 'main' branch, which is where this job is pulling f...

  • 646 Views
  • 1 replies
  • 1 kudos
Latest Reply
cubanDataDude
New Contributor II
  • 1 kudos

Figured it out:ecw_staging_nb_List = ['nb_UPSERT_stg_ecw_insurance','nb_UPSERT_stg_ecw_facilitygroups']Works just fine.

  • 1 kudos
dannythermadom
by New Contributor III
  • 4084 Views
  • 6 replies
  • 7 kudos

Dbutils.notebook.run command not working with /Repos/

I have two github repo configured in Databricks Repos folder. repo_1 is run using a job and repo_2 is run/called from repo_1 using Dbutils.notebook.run command. dbutils.notebook.run("/Repos/repo_2/notebooks/notebook", 0, args)i am getting the follo...

  • 4084 Views
  • 6 replies
  • 7 kudos
Latest Reply
cubanDataDude
New Contributor II
  • 7 kudos

I am having a similar issue...  ecw_staging_nb_List = ['/Workspace/Repos/PRIMARY/UVVC_DATABRICKS_EDW/silver/nb_UPSERT_stg_ecw_insurance',                 '/Repos/PRIMARY/UVVC_DATABRICKS_EDW/silver/nb_UPSERT_stg_ecw_facilitygroups'] Adding workspace d...

  • 7 kudos
5 More Replies
pragarwal
by New Contributor II
  • 1374 Views
  • 2 replies
  • 0 kudos

Export Users and Groups from Unity Catalog

Hi,I am trying to export the list of users and groups from Unity catalog through databricks workspace but i am seeing only the users/groups created inside the workspace instead of the groups and users coming through scim in unity catalog.How can i ge...

  • 1374 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Honored Contributor
  • 0 kudos

Hello when you refer to the users and groups in Unity Catalog, do you refer to the ones created at the Account Level?If this is the case you need to run the API call at the account level and not workspace level, you can see the API doc for account le...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels