cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

hari
by Contributor
  • 3041 Views
  • 4 replies
  • 3 kudos

Multiple streaming sources to the same delta table

Is it possible to have two streaming sources doing Merge into the same delta table with each source setting a different set of fields?We are trying to create a single table which will be used by the service layer for queries. The table can be populat...

  • 3041 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Zachary Higgins​ ​, We haven’t heard from you on the last response from @Harikrishnan P H​ , and I was checking back to see if you have a resolution. If you have any solution, please do share the same with the community as it can be helpful to ot...

  • 3 kudos
3 More Replies
MeghashreeM
by New Contributor III
  • 2415 Views
  • 3 replies
  • 5 kudos

org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets

org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets

  • 2415 Views
  • 3 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @ MeghashreeM! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 5 kudos
2 More Replies
chanansh
by Contributor
  • 3952 Views
  • 9 replies
  • 9 kudos

copy files from azure to s3

I am trying to copy files from azure to s3. I've created a solution by comparing file lists and copy manually to a temp file and upload. However, I just found AutoLoader and I would like to use that https://docs.databricks.com/ingestion/auto-loader/i...

  • 3952 Views
  • 9 replies
  • 9 kudos
Latest Reply
Falokun
New Contributor II
  • 9 kudos

Just use tools like Goodsync and Gs Richcopy 360 to copy directly from blob to S3, I think you will never face problems like that ​

  • 9 kudos
8 More Replies
Nhan_Nguyen
by Valued Contributor
  • 1324 Views
  • 1 replies
  • 6 kudos

Resolved! Logic execute when we create an View?

Hi all,I have a small curious about VIEW on Databricks. Could anyone please help me clarify this?Normal database like Postgres or MS SQL, when we define a view, the logic still not execute that time, only run when we query that VIEW.Not sure how VIEW...

  • 1324 Views
  • 1 replies
  • 6 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 6 kudos

@Nhan Nguyen​ It works the same. CREATE VIEW constructs a virtual table that has no physical data.

  • 6 kudos
Archana
by New Contributor
  • 3145 Views
  • 3 replies
  • 0 kudos

What are the metrics to be considered for monitoring the Databricks

I am very new to Databricks, just setting up with things. I would like to explore various features of Databricks and start playing around with the environment.I am curious to know what are the metrics should be considered for monitoring the complete ...

  • 3145 Views
  • 3 replies
  • 0 kudos
Latest Reply
jessykoo32
New Contributor II
  • 0 kudos

Databricks is a powerful platform for data engineering, machine learning, and analytics, and it is important to monitor the performance and health of your Databricks environment to ensure that it is running smoothly.Here are a few key metrics that yo...

  • 0 kudos
2 More Replies
Karthick
by New Contributor III
  • 2573 Views
  • 3 replies
  • 6 kudos

java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.ref.RFRasterSource$

After installing the latest pyrasterframes (v0.10.1) on Azure databricks 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I can create a spark session, read the raster data and print the schema. However, when I try to perform any actions on the dat...

  • 2573 Views
  • 3 replies
  • 6 kudos
Latest Reply
Nobusuke_Hanaga
New Contributor II
  • 6 kudos

Hi @Karthick Narendran​ I got the same error as you.I tried every branch in the locationtech repository but failed. Then luckily I found a rasterframe branch for databricks here.https://github.com/mjohns-databricks/rasterframes/tree/0.10.2-databricks...

  • 6 kudos
2 More Replies
najmead
by Contributor
  • 1208 Views
  • 2 replies
  • 2 kudos

SQL Warehouse Configuration Tweaking

I'm new to setting up a DB environment, and have accumulated a couple of questions around configuring a SQL Warehouse1. When creating a SQL warehouse, the smallest size is 2X-Small, which is 4DBU. The pricing calculator (for Azure) implies you can c...

  • 1208 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Docs do show that it uses E8d as you wrote.SQL Warehouses are a different type of compute than All Purpose or Jobs clusters. The SQL warehouses always use Photon. All purpose and Jobs clusters are used for things such as notebooks or Delta Live Tab...

  • 2 kudos
1 More Replies
AmithAdiraju16
by New Contributor II
  • 1600 Views
  • 4 replies
  • 1 kudos

How to read feature table without target_df / online inference based on filter_condition in databricks feature store

I'm using databricks feature store == 0.6.1. After I register my feature table with `create_feature_table` and write data with `write_Table` I want to read that feature_table based on filter conditions ( may be on time stamp column ) without calling ...

  • 1600 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

create_training_set is just a simple Select from delta tables. All feature tables are just registered delta tables. Here is an example code that I used to handle that: customer_features_df = spark.sql("SELECT * FROM recommender_system.customer_fea...

  • 1 kudos
3 More Replies
Gim
by Contributor
  • 3439 Views
  • 2 replies
  • 1 kudos

Resolved! How to use SQL UDFs for Delta Live Table pipelines?

I've been searching for a way to use a SQL UDF for our DLT pipeline. In this case it is to convert a time duration string into INT seconds. How exactly do we use/apply UDFs in this case?

  • 3439 Views
  • 2 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@GimYou can create Python UDF and then use it in SQL.https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html#use-python-udfs-in-sql

  • 1 kudos
1 More Replies
Bartek
by Contributor
  • 2669 Views
  • 0 replies
  • 1 kudos

How to pass all dag_run.conf parameters to python_wheel_task

I want to trigger Databricks job from Airflow using DatabricksSubmitRunDeferrableOperator and I need to pass configuration params. Here is excerpt from my code (definition is not complete, only crucial properties):from airflow.providers.databricks.op...

  • 2669 Views
  • 0 replies
  • 1 kudos
mala
by New Contributor III
  • 2193 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to reproduce Kmeans Clustering results even after setting seed and tolerance

Hi I have been trying to reproduce Kmeans results with no luckHere is my code snippet:from pyspark.ml.clustering import KMeansKMeans(featuresCol=featuresCol, k=clusters, maxIter=40, seed=1, tol = .00001) Can anyone help?

  • 2193 Views
  • 3 replies
  • 2 kudos
Latest Reply
mala
New Contributor III
  • 2 kudos

This issue was due to spark parallelization which doesn't guarantee the same data is assigned to each partition. I was able to resolve this by making sure the same data is assigned to the same partitions :df.repartition(num_partitions, "ur_col_id")d...

  • 2 kudos
2 More Replies
antoooks
by New Contributor III
  • 5345 Views
  • 6 replies
  • 10 kudos

Resolved! Databricks clusters stuck on Pending and Terminating state indefinitely

Hi everyone,Our company is using Databricks on GKE. It works fine until suddenly when we try to create and terminate clusters today, it got stuck on Pending and Terminating state for hours (now more than 6 hours). There is no conclusion can be drawn ...

screenshot
  • 5345 Views
  • 6 replies
  • 10 kudos
Latest Reply
Databricks_Buil
New Contributor III
  • 10 kudos

Hi @Kurnianto Trilaksono Sutjipto​ : Figured out after multiple connects that This is typically a cloud provider issue. You can file a support ticket if the issue persists.

  • 10 kudos
5 More Replies
Anonymous
by Not applicable
  • 10302 Views
  • 3 replies
  • 1 kudos

Cluster in Pending State for long time

Pending for a long time at this stage “Finding instances for new nodes, acquiring more instances if necessary”. How can this be fixed?

  • 10302 Views
  • 3 replies
  • 1 kudos
Latest Reply
Databricks_Buil
New Contributor III
  • 1 kudos

Figured out after multiple connects that This is typically a cloud provider issue. You can file a support ticket if the issue persists.

  • 1 kudos
2 More Replies
elgeo
by Valued Contributor II
  • 4099 Views
  • 3 replies
  • 3 kudos

Resolved! Trigger on a table

Hello! Is there an equivalent of Create trigger on a table in Databricks sql?CREATE TRIGGER [schema_name.]trigger_nameON table_nameAFTER {[INSERT],[UPDATE],[DELETE]}[NOT FOR REPLICATION]AS{sql_statements}Thank you in advance!

  • 4099 Views
  • 3 replies
  • 3 kudos
Latest Reply
AdrianLobacz
Contributor
  • 3 kudos

You can try Auto Loader: Auto Loader supports two modes for detecting new files: directory listing and file notification.Directory listing: Auto Loader identifies new files by listing the input directory. Directory listing mode allows you to quickly ...

  • 3 kudos
2 More Replies
829023
by New Contributor
  • 522 Views
  • 1 replies
  • 0 kudos

Fail to load excel data(timeout) in databricks sample notebook

Im working with the sample notebook named '1_Customer Lifetimes.py' in https://github.com/databricks-industry-solutions/customer-lifetime-valueIn notebook, there is the code like this `%run "./config/Data Extract"`This load excel data however it occu...

  • 522 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Seungsu Lee​  It could be a destination host issue, configuration issue or network issue.Hard to guess, first check if your cluster has an access to the public internet by running this command:%sh ping -c 2 google.com

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels