cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tonylax6
by New Contributor
  • 1604 Views
  • 0 replies
  • 0 kudos

Azure Databricks to Adobe Experience Platform

I'm using Azure databricks and am attempting to stream near real-time data from databricks into the Adobe Experience Platform to ingest into the AEP schema for profile enrichment.We are running into an issue with the API and streaming, so we are curr...

Data Engineering
Adobe
Adobe Experience Platform
CDP integration
  • 1604 Views
  • 0 replies
  • 0 kudos
andreasmherman
by New Contributor II
  • 4380 Views
  • 4 replies
  • 1 kudos

DLT SCD type 2 in bronze, silver and gold? Is it possible?

I have a question related to when we are using  #DLT. Let me try to describe the DLT problem: Objective: Process data end-to-end (bronze, silver gold) using DLTWant bronze to hold a complete raw replica of the data, leveraging apply_changes SCD to wr...

  • 4380 Views
  • 4 replies
  • 1 kudos
Latest Reply
thedatacrew
New Contributor III
  • 1 kudos

Hi,At the moment, my process is:-I am using ETL (Data Factory) to land paquet files in a raw landing zone. I keep all the source data here. So I can fully rebuild the data if I need to.i.e.source_system/schema/table/loadon_year=2024/loadon_month=08/l...

  • 1 kudos
3 More Replies
Sangeethagk
by New Contributor
  • 3515 Views
  • 1 replies
  • 0 kudos

TypeError: ColSpec.__init__() got an unexpected keyword argument 'required'

Hi Team, one of my customer is facing the below issue.. Anyone faced this issue before ? Any help would be appreciated.import mlflowmlflow.set_registry_uri("databricks-uc")catalog_name = "system"embed = mlflow.pyfunc.spark_udf(spark, f"models:/system...

  • 3515 Views
  • 1 replies
  • 0 kudos
Latest Reply
viksuper555
New Contributor II
  • 0 kudos

Upgrade the version of the mlflow package. In 2.7.1 there is no such parameter. https://mlflow.org/docs/2.7.1/python_api/mlflow.types.htmlWhile in the latest (2.17.0) there is such https://mlflow.org/docs/2.17.0/python_api/mlflow.types.html%pip insta...

  • 0 kudos
garf
by New Contributor III
  • 875 Views
  • 2 replies
  • 1 kudos

Resolved! autoloader error using unity catalog

Hello! I'm new on Databricks and I'm exploring some of its features.I've successfully configured a workspace with unity catalog, one external storage location (ADLSg2) and the associated storage credential. I provided all privileges for all account u...

  • 875 Views
  • 2 replies
  • 1 kudos
Latest Reply
Mo
Databricks Employee
  • 1 kudos

hey @garf  could you please try to create an external volume using your external location and then use the file path in the volume as the input file path?

  • 1 kudos
1 More Replies
Sahil0007
by New Contributor II
  • 790 Views
  • 2 replies
  • 1 kudos

Access issue in iPad Air m2

I am trying to run databricks community edition in iPad Air m2. It’s showing me blank page while logging, but in my android phone it’s working fine. Is there any compatibility issue with ios ? Please help me

  • 790 Views
  • 2 replies
  • 1 kudos
Latest Reply
JörgMalter_
New Contributor II
  • 1 kudos

Thank you for your response. While I understand that mobile browsers are currently not officially supported by Databricks, I believe this approach does not align with the expectations and needs of modern users. In today's digital landscape, it is ess...

  • 1 kudos
1 More Replies
herbblinebury
by New Contributor II
  • 634 Views
  • 3 replies
  • 0 kudos

Call Azure Cognitive Services API from Notebook using Azure Entra ID that is logged on to Databricks

I would like to call Azure Cognitive Services API from Notebook using Azure Entra ID that is logged on to Databricks. The cognitive services key is not available as local authentication for cognitive services is not enables.

  • 634 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

and what if you do not use a token but pass the credential in the cognitive services client (like  Text Analytics Client)?It could also be a networking/firewall setting that prevents you from calling ACS.Permissions can also be the cause.

  • 0 kudos
2 More Replies
BaburamShrestha
by New Contributor II
  • 4048 Views
  • 7 replies
  • 4 kudos

File Arrival Trigger

We are using Databricks in combination with Azure platforms, specifically working with Azure Blob Storage (Gen2). We frequently mount Azure containers in the Databricks file system and leverage external locations and volumes for Azure containers.Our ...

  • 4048 Views
  • 7 replies
  • 4 kudos
Latest Reply
Panda
Valued Contributor
  • 4 kudos

@KrzysztofPrzyso Yes, we are now following this approach as an alternative solution, which involves combining Autoloader with Databricks File Arrival. In this case, you don't need to run cluster all the time instead use below Autoloader config. Pass ...

  • 4 kudos
6 More Replies
DmitriyLamzin
by New Contributor II
  • 5996 Views
  • 6 replies
  • 0 kudos

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...

Data Engineering
pandas udf
  • 5996 Views
  • 6 replies
  • 0 kudos
Latest Reply
stifen
New Contributor II
  • 0 kudos

its good

  • 0 kudos
5 More Replies
ruoyuqian
by New Contributor II
  • 2212 Views
  • 6 replies
  • 1 kudos

Where are materialized view generated by Delta Live Table stored?

I am trying to compare tables created by DBT in Catalog vs the materialized view generated by Delta Live Table, and I noticed that the dbt generated table has Storage Location information and It points to a physical storage location however the mater...

  • 2212 Views
  • 6 replies
  • 1 kudos
Latest Reply
TamD
Contributor
  • 1 kudos

It would be good if Databricks would confirm the behaviour, rather than requiring us to make assumptions like this.

  • 1 kudos
5 More Replies
ShresthaBaburam
by New Contributor III
  • 1064 Views
  • 1 replies
  • 1 kudos

Resolved! File Arrival Trigger in Azure Databricks

We are using Databricks in combination with Azure platforms, specifically working with Azure Blob Storage (Gen2). We frequently mount Azure containers in the Databricks file system and leverage external locations and volumes for Azure containers.Our ...

  • 1064 Views
  • 1 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

@ShresthaBaburam We inquired about this a few days ago and checked with Databricks. They were working on the issue, but no ETA was provided. You can find more details here: Databricks Community Link.However, to address this use case, we followed the ...

  • 1 kudos
Yarden
by New Contributor
  • 783 Views
  • 1 replies
  • 0 kudos

Sync table A to table B, triggered by any change in table A.

Hey,I'm trying to find a way to sync table A to table B whenever table A is written to. just with a trigger on write.I want to avoid using any continuous runs or schedules.Trying to get this to work inside Databricks, without having to use any outsid...

  • 783 Views
  • 1 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@Yarden For this use case, Databricks does not have built-in triggers directly tied to Delta table write operations, as seen in traditional databases. However, you can achieve this functionality using one of the following approaches:Approach 1: File ...

  • 0 kudos
Constantine
by Contributor III
  • 14080 Views
  • 3 replies
  • 7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like   %python   from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType   def get_sell_price(sale_prices): return sale_...

  • 14080 Views
  • 3 replies
  • 7 kudos
Latest Reply
villi77
New Contributor II
  • 7 kudos

I had a similar situation where I was trying to order the days of the week from Monday to Sunday.  I saw solutions that use Python but was wanting to do it all in SQL.  My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...

  • 7 kudos
2 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 13152 Views
  • 4 replies
  • 4 kudos

Workflow timeout

Always set a timeout for your jobs! It not only safeguards against unforeseen hang-ups but also optimizes resource utilization. Equally essential is to consider having a threshold warning. This can alert you before a potential failure, allowing proac...

ezgif-2-283506cee0.gif
  • 13152 Views
  • 4 replies
  • 4 kudos
Latest Reply
sparkplug
New Contributor III
  • 4 kudos

We already have a policy and users are using clusters created with those to run their jobs. Since the policies are not based on job compute but on Power user compute, I am not able to set the job timeout_seconds. 

  • 4 kudos
3 More Replies
SS_RATH
by New Contributor
  • 2841 Views
  • 3 replies
  • 0 kudos

I have a notebook in workspace, how to know in which job this particular notebook is referenced.

I have a notebook in workspace, how to know in which job this particular notebook is referenced.

  • 2841 Views
  • 3 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@SS_RATH @TamD There are couple of waysCall Databricks REST API  - Use the /api/2.1/jobs/list API to list and search through all jobs. Example: -  import requests workspace_url = "https://<databricks-instance>" databricks_token = "<your-databricks-t...

  • 0 kudos
2 More Replies
anonymous_567
by New Contributor II
  • 1244 Views
  • 1 replies
  • 0 kudos

Retrieve file size from azure in databricks

Hello, I am running a job that requires reading in files of different sizes, each one representing a different dataset, and loading them into a delta table. Some files are as big as 100Gib and others as small as 500 MiB. I want to repartition each fi...

  • 1244 Views
  • 1 replies
  • 0 kudos
Latest Reply
LindasonUk
New Contributor III
  • 0 kudos

 You could try utilise the dbutils files service like this:from pyspark.sql.functions import col, desc, input_file_name, regexp_replacedirectory = 'abfss://<container>@<storage-account>.dfs.core.windows.net/path/to/data/root'files_list = dbutils.fs.l...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels