cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Chris_Konsur
by New Contributor III
  • 8647 Views
  • 1 replies
  • 0 kudos

Resolved! configuring the Databricks JobAPIs and I get Error 403 User not authorized.

 I’m configuring the Databricks JobAPIs and I get Error 403 User not authorized.I found out the issue is that I need to apply a rule and set API permissions for AzureDatabricksAzure Portal>Azure Databricks>Azure Databricks Service>Access control (IAM...

  • 8647 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

for the particular jobs the user who is trying to start the job he should have access permission or run permission for that jobs , please give required permission and it will work for sure

  • 0 kudos
asif5494
by New Contributor III
  • 2295 Views
  • 3 replies
  • 0 kudos

preAction in databricks while writing into Google Big Query Table?

I am writing into Google Big Query table using append mode. I need to delete current day data before writing new data. I just want to know if there is any preActions parameter can be used to first delete data before writing into table? Below is the s...

  • 2295 Views
  • 3 replies
  • 0 kudos
Latest Reply
Cami
Contributor III
  • 0 kudos

Can you use override mode instead append?

  • 0 kudos
2 More Replies
Neli
by New Contributor III
  • 4377 Views
  • 2 replies
  • 0 kudos

How to add Current date as one of the column in Databricks

I am trying to create new column "Ingest_date" in table which should contain current date. I am getting error "Current date cannot be used in a generated column". Can you please review and suggest alternative to get the current date in delta table.

image image
  • 4377 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

A generation expression can use any SQL functions in Spark that always return the same result when given the same argument valuesSource: https://docs.delta.io/latest/delta-batch.html#use-generated-columnsIt means that it's intended to not work.You ca...

  • 0 kudos
1 More Replies
hari
by Contributor
  • 4448 Views
  • 3 replies
  • 3 kudos

Multiple streaming sources to the same delta table

Is it possible to have two streaming sources doing Merge into the same delta table with each source setting a different set of fields?We are trying to create a single table which will be used by the service layer for queries. The table can be populat...

  • 4448 Views
  • 3 replies
  • 3 kudos
Latest Reply
hari
Contributor
  • 3 kudos

Hi @Zachary Higgins​ Thanks for the replyCurrently, we are also using Trigger.once so that we can handle the merge stream dependencies properly. But was wondering whether we can scale our pipeline to be streaming by changing the Trigger duration in t...

  • 3 kudos
2 More Replies
chanansh
by Contributor
  • 6306 Views
  • 9 replies
  • 9 kudos

copy files from azure to s3

I am trying to copy files from azure to s3. I've created a solution by comparing file lists and copy manually to a temp file and upload. However, I just found AutoLoader and I would like to use that https://docs.databricks.com/ingestion/auto-loader/i...

  • 6306 Views
  • 9 replies
  • 9 kudos
Latest Reply
Falokun
New Contributor II
  • 9 kudos

Just use tools like Goodsync and Gs Richcopy 360 to copy directly from blob to S3, I think you will never face problems like that ​

  • 9 kudos
8 More Replies
Nhan_Nguyen
by Valued Contributor
  • 1782 Views
  • 1 replies
  • 6 kudos

Resolved! Logic execute when we create an View?

Hi all,I have a small curious about VIEW on Databricks. Could anyone please help me clarify this?Normal database like Postgres or MS SQL, when we define a view, the logic still not execute that time, only run when we query that VIEW.Not sure how VIEW...

  • 1782 Views
  • 1 replies
  • 6 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 6 kudos

@Nhan Nguyen​ It works the same. CREATE VIEW constructs a virtual table that has no physical data.

  • 6 kudos
Archana
by New Contributor
  • 4771 Views
  • 2 replies
  • 0 kudos

What are the metrics to be considered for monitoring the Databricks

I am very new to Databricks, just setting up with things. I would like to explore various features of Databricks and start playing around with the environment.I am curious to know what are the metrics should be considered for monitoring the complete ...

  • 4771 Views
  • 2 replies
  • 0 kudos
Latest Reply
jessykoo32
New Contributor II
  • 0 kudos

Databricks is a powerful platform for data engineering, machine learning, and analytics, and it is important to monitor the performance and health of your Databricks environment to ensure that it is running smoothly.Here are a few key metrics that yo...

  • 0 kudos
1 More Replies
Karthick
by New Contributor III
  • 3536 Views
  • 3 replies
  • 6 kudos

java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.ref.RFRasterSource$

After installing the latest pyrasterframes (v0.10.1) on Azure databricks 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I can create a spark session, read the raster data and print the schema. However, when I try to perform any actions on the dat...

  • 3536 Views
  • 3 replies
  • 6 kudos
Latest Reply
Nobusuke_Hanaga
New Contributor II
  • 6 kudos

Hi @Karthick Narendran​ I got the same error as you.I tried every branch in the locationtech repository but failed. Then luckily I found a rasterframe branch for databricks here.https://github.com/mjohns-databricks/rasterframes/tree/0.10.2-databricks...

  • 6 kudos
2 More Replies
najmead
by Contributor
  • 1886 Views
  • 2 replies
  • 2 kudos

SQL Warehouse Configuration Tweaking

I'm new to setting up a DB environment, and have accumulated a couple of questions around configuring a SQL Warehouse1. When creating a SQL warehouse, the smallest size is 2X-Small, which is 4DBU. The pricing calculator (for Azure) implies you can c...

  • 1886 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Docs do show that it uses E8d as you wrote.SQL Warehouses are a different type of compute than All Purpose or Jobs clusters. The SQL warehouses always use Photon. All purpose and Jobs clusters are used for things such as notebooks or Delta Live Tab...

  • 2 kudos
1 More Replies
AmithAdiraju16
by New Contributor II
  • 2390 Views
  • 4 replies
  • 1 kudos

How to read feature table without target_df / online inference based on filter_condition in databricks feature store

I'm using databricks feature store == 0.6.1. After I register my feature table with `create_feature_table` and write data with `write_Table` I want to read that feature_table based on filter conditions ( may be on time stamp column ) without calling ...

  • 2390 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

create_training_set is just a simple Select from delta tables. All feature tables are just registered delta tables. Here is an example code that I used to handle that: customer_features_df = spark.sql("SELECT * FROM recommender_system.customer_fea...

  • 1 kudos
3 More Replies
Gim
by Contributor
  • 4664 Views
  • 2 replies
  • 1 kudos

Resolved! How to use SQL UDFs for Delta Live Table pipelines?

I've been searching for a way to use a SQL UDF for our DLT pipeline. In this case it is to convert a time duration string into INT seconds. How exactly do we use/apply UDFs in this case?

  • 4664 Views
  • 2 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@GimYou can create Python UDF and then use it in SQL.https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html#use-python-udfs-in-sql

  • 1 kudos
1 More Replies
Bartek
by Contributor
  • 3069 Views
  • 0 replies
  • 1 kudos

How to pass all dag_run.conf parameters to python_wheel_task

I want to trigger Databricks job from Airflow using DatabricksSubmitRunDeferrableOperator and I need to pass configuration params. Here is excerpt from my code (definition is not complete, only crucial properties):from airflow.providers.databricks.op...

  • 3069 Views
  • 0 replies
  • 1 kudos
mala
by New Contributor III
  • 3129 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to reproduce Kmeans Clustering results even after setting seed and tolerance

Hi I have been trying to reproduce Kmeans results with no luckHere is my code snippet:from pyspark.ml.clustering import KMeansKMeans(featuresCol=featuresCol, k=clusters, maxIter=40, seed=1, tol = .00001) Can anyone help?

  • 3129 Views
  • 3 replies
  • 2 kudos
Latest Reply
mala
New Contributor III
  • 2 kudos

This issue was due to spark parallelization which doesn't guarantee the same data is assigned to each partition. I was able to resolve this by making sure the same data is assigned to the same partitions :df.repartition(num_partitions, "ur_col_id")d...

  • 2 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels