cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

BaburamShrestha
by New Contributor II
  • 5393 Views
  • 7 replies
  • 4 kudos

File Arrival Trigger

We are using Databricks in combination with Azure platforms, specifically working with Azure Blob Storage (Gen2). We frequently mount Azure containers in the Databricks file system and leverage external locations and volumes for Azure containers.Our ...

  • 5393 Views
  • 7 replies
  • 4 kudos
Latest Reply
Panda
Valued Contributor
  • 4 kudos

@KrzysztofPrzyso Yes, we are now following this approach as an alternative solution, which involves combining Autoloader with Databricks File Arrival. In this case, you don't need to run cluster all the time instead use below Autoloader config. Pass ...

  • 4 kudos
6 More Replies
DmitriyLamzin
by New Contributor II
  • 7631 Views
  • 6 replies
  • 0 kudos

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...

Data Engineering
pandas udf
  • 7631 Views
  • 6 replies
  • 0 kudos
Latest Reply
stifen
New Contributor II
  • 0 kudos

its good

  • 0 kudos
5 More Replies
ruoyuqian
by New Contributor II
  • 2843 Views
  • 6 replies
  • 1 kudos

Where are materialized view generated by Delta Live Table stored?

I am trying to compare tables created by DBT in Catalog vs the materialized view generated by Delta Live Table, and I noticed that the dbt generated table has Storage Location information and It points to a physical storage location however the mater...

  • 2843 Views
  • 6 replies
  • 1 kudos
Latest Reply
TamD
Contributor
  • 1 kudos

It would be good if Databricks would confirm the behaviour, rather than requiring us to make assumptions like this.

  • 1 kudos
5 More Replies
ShresthaBaburam
by New Contributor III
  • 1317 Views
  • 1 replies
  • 1 kudos

Resolved! File Arrival Trigger in Azure Databricks

We are using Databricks in combination with Azure platforms, specifically working with Azure Blob Storage (Gen2). We frequently mount Azure containers in the Databricks file system and leverage external locations and volumes for Azure containers.Our ...

  • 1317 Views
  • 1 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

@ShresthaBaburam We inquired about this a few days ago and checked with Databricks. They were working on the issue, but no ETA was provided. You can find more details here: Databricks Community Link.However, to address this use case, we followed the ...

  • 1 kudos
Yarden
by New Contributor
  • 970 Views
  • 1 replies
  • 0 kudos

Sync table A to table B, triggered by any change in table A.

Hey,I'm trying to find a way to sync table A to table B whenever table A is written to. just with a trigger on write.I want to avoid using any continuous runs or schedules.Trying to get this to work inside Databricks, without having to use any outsid...

  • 970 Views
  • 1 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@Yarden For this use case, Databricks does not have built-in triggers directly tied to Delta table write operations, as seen in traditional databases. However, you can achieve this functionality using one of the following approaches:Approach 1: File ...

  • 0 kudos
Constantine
by Contributor III
  • 15358 Views
  • 3 replies
  • 7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like   %python   from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType   def get_sell_price(sale_prices): return sale_...

  • 15358 Views
  • 3 replies
  • 7 kudos
Latest Reply
villi77
New Contributor II
  • 7 kudos

I had a similar situation where I was trying to order the days of the week from Monday to Sunday.  I saw solutions that use Python but was wanting to do it all in SQL.  My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...

  • 7 kudos
2 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 13540 Views
  • 4 replies
  • 4 kudos

Workflow timeout

Always set a timeout for your jobs! It not only safeguards against unforeseen hang-ups but also optimizes resource utilization. Equally essential is to consider having a threshold warning. This can alert you before a potential failure, allowing proac...

ezgif-2-283506cee0.gif
  • 13540 Views
  • 4 replies
  • 4 kudos
Latest Reply
sparkplug
New Contributor III
  • 4 kudos

We already have a policy and users are using clusters created with those to run their jobs. Since the policies are not based on job compute but on Power user compute, I am not able to set the job timeout_seconds. 

  • 4 kudos
3 More Replies
SS_RATH
by New Contributor
  • 3352 Views
  • 3 replies
  • 0 kudos

I have a notebook in workspace, how to know in which job this particular notebook is referenced.

I have a notebook in workspace, how to know in which job this particular notebook is referenced.

  • 3352 Views
  • 3 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@SS_RATH @TamD There are couple of waysCall Databricks REST API  - Use the /api/2.1/jobs/list API to list and search through all jobs. Example: -  import requests workspace_url = "https://<databricks-instance>" databricks_token = "<your-databricks-t...

  • 0 kudos
2 More Replies
anonymous_567
by New Contributor II
  • 1638 Views
  • 1 replies
  • 0 kudos

Retrieve file size from azure in databricks

Hello, I am running a job that requires reading in files of different sizes, each one representing a different dataset, and loading them into a delta table. Some files are as big as 100Gib and others as small as 500 MiB. I want to repartition each fi...

  • 1638 Views
  • 1 replies
  • 0 kudos
Latest Reply
LindasonUk
New Contributor III
  • 0 kudos

 You could try utilise the dbutils files service like this:from pyspark.sql.functions import col, desc, input_file_name, regexp_replacedirectory = 'abfss://<container>@<storage-account>.dfs.core.windows.net/path/to/data/root'files_list = dbutils.fs.l...

  • 0 kudos
CE
by New Contributor II
  • 1031 Views
  • 2 replies
  • 0 kudos

Resolved! how to read reuqiremnet.txt in databrick workspace

Dear databrick team,I want to know if there is a method in Databricks equivalent to pip install -r requirements.txtThere are packages I want to install in this path: /Workspace/Users/xxx@domain.com/databrick_requirement.txtI have referred to the foll...

CE_0-1728877664241.png
  • 1031 Views
  • 2 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@CE You can't directly access /Workspace paths like a traditional filesystem path. When you specify /Workspace/Users/xxx@domain.com/databrick_requirement.txt, %pip install cannot interpret it because the %pip magic command works with DBFS paths. Foll...

  • 0 kudos
1 More Replies
plakshmi
by New Contributor II
  • 1729 Views
  • 4 replies
  • 0 kudos

Resolved! Unable to read data from onprem sql server to databricks

I am trying to read data into data frame of data Bricks from on Prem SQL server but facing com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host HYDNB875, port 1433 has failed. Error: "HYDNB875. Verify the connection prop...

  • 1729 Views
  • 4 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@plakshmi Along with what @Rishabh-Pandey mentioned, follow these additional stepIf HYDNB875 can't be resolved, try using the server's IP address.Check for network routing issues between Databricks and the SQL Server using traceroute or ping.Review t...

  • 0 kudos
3 More Replies
Brad
by Contributor II
  • 695 Views
  • 1 replies
  • 0 kudos

Why there are many offsets in checkpoint

Hi team, I'm using trigger=availableNow to read delta table daily. The delta table itself is loaded by structured streaming from kinesis. I noticed there are many offsets under checkpoint, and when the job starting to run to get data from delta table...

  • 695 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 0 kudos

@Brad  When you see the batch IDs listed in the logs (e.g., 186, 187, 188,...), these correspond to the batches of data that have been processed. Each batch ID represents a specific point in time in the streaming process, where the data was ingested,...

  • 0 kudos
elgeo
by Valued Contributor II
  • 5415 Views
  • 6 replies
  • 8 kudos

Clean up _delta_log files

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...

  • 5415 Views
  • 6 replies
  • 8 kudos
Latest Reply
Brad
Contributor II
  • 8 kudos

Awesome, thanks for response.

  • 8 kudos
5 More Replies
TinasheChinyati
by New Contributor III
  • 20500 Views
  • 6 replies
  • 4 kudos

Is databricks capable of housing OLTP and OLAP?

Hi data experts.I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requ...

  • 20500 Views
  • 6 replies
  • 4 kudos
Latest Reply
Ben_dHont
New Contributor II
  • 4 kudos

@ChrisCkx and @bsanoopDatabricks is currently building a OLTP database functionality which is currently in private preview. It is a serverless PostgREST database. Documentation can be found here: [EXTERNAL] Online Tables REST - Private Preview Docume...

  • 4 kudos
5 More Replies
DevGeek
by New Contributor
  • 884 Views
  • 1 replies
  • 0 kudos

Better Alternatives to ReadyAPI for API Testing?

I’m currently using ReadyAPI, mainly for API testing and some automation workflows, but I’m considering switching to something else. Has anyone here tried Apidog, Postman, or similar tools? I’m especially interested in how they compare in terms of pe...

  • 884 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi @DevGeek,How are you doing today?Consider trying Postman if you're looking for a robust tool with a wide range of features for API testing and automation. It’s known for its user-friendly interface and handles complex APIs and large datasets well,...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels