cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jonxu
by New Contributor III
  • 3576 Views
  • 2 replies
  • 1 kudos

Resolved! streaming vs batch, unbounded vs bound

Any one can help me understand why cannot we unify streaming with batch and unbounded with bounded, if we regard the streaming/unbounded as mini-version of batch/bounded, please?i.e., if I set one second as the frequency for batch processing, will it...

  • 3576 Views
  • 2 replies
  • 1 kudos
Latest Reply
jonxu
New Contributor III
  • 1 kudos

Many thanks for the clarification!

  • 1 kudos
1 More Replies
PremPrakash
by New Contributor II
  • 1505 Views
  • 2 replies
  • 1 kudos

Resolved! Using instance profile for sns message publish with PassRole

Hi, I want to attach instance profile to compute and  publish message on SNS without using credentials. Is that possible? has anyone used it. Will Boto3 support it?

  • 1505 Views
  • 2 replies
  • 1 kudos
Latest Reply
PremPrakash
New Contributor II
  • 1 kudos

Yes, I have tried it, it is working. 

  • 1 kudos
1 More Replies
stiaangerber
by Databricks Partner
  • 1389 Views
  • 1 replies
  • 0 kudos

Simba ODBC for ARM-based Linux

HiIs there an ARM build of the Simba ODBC driver available for Linux? I've seen this thread (for Mac)https://community.databricks.com/t5/data-engineering/problems-connecting-simba-odbc-with-a-m1-macbook-pro/td-p/20566but it seems that there are only ...

  • 1389 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@stiaangerber unfortunately not. The odbc driver for linux is technically CPU agnostic, as long as it's amd64/x86_64, it should work, but we don't have one for ARM.

  • 0 kudos
sandeephenkel23
by New Contributor III
  • 2726 Views
  • 3 replies
  • 0 kudos

QuantileDiscretizer is not whiteliested erro!!

Dear Team,We observed that while attempting to use the following import:from pyspark.sql import functions as Ffrom pyspark.ml.feature import QuantileDiscretizerwe are encountering the following error:Py4JSecurityException: QuantileDiscretizer is not ...

  • 2726 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@sandeephenkel23 I've run the same code in DBR 13.3 LTS, and 1) It is successfully imported, 2) I can confirm it is in the whitelisted libs. Hence I'm wondering if there's anything else particular in your use case, triggering this? Is your use case a...

  • 0 kudos
2 More Replies
RoelofvS
by New Contributor III
  • 3241 Views
  • 5 replies
  • 0 kudos

Schema evolution in Autoloader not evolving beyond version 0

I am working through the current version of the standard AutoLoader demo, i.e.    dbdemos.install('auto-loader')I.e. data gets read into a dataframe, but never written to a target table.Notebook is "01-Auto-loader-schema-evolution-Ingestion"Compute i...

  • 3241 Views
  • 5 replies
  • 0 kudos
Latest Reply
RoelofvS
New Contributor III
  • 0 kudos

Hello @Brahmareddy,I have tried the above, without success.> enable detailed logging to trace schema evolution stepsPlease can you giude me with the steps or a URL? We are on AWS.Kind regards - Roelof 

  • 0 kudos
4 More Replies
Rishabh-Pandey
by Databricks MVP
  • 2466 Views
  • 1 replies
  • 1 kudos

Enhanced Cost Management for Serverless Compute

Budget policies include tags that are applied to serverless compute activities incurred by users assigned. These tags are recorded in your billing records, allowing you to allocate specific serverless usage to designated budgets. For more information...

  • 2466 Views
  • 1 replies
  • 1 kudos
Latest Reply
Rafael-Sousa
Contributor II
  • 1 kudos

Thanks for sharing.

  • 1 kudos
AIDENEMAN
by New Contributor
  • 1860 Views
  • 1 replies
  • 0 kudos

Passing values

Hello how I can pass a value of a parameter between two jobs that are not nested and have separated terraform and notebooks

  • 1860 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rafael-Sousa
Contributor II
  • 0 kudos

Currently, there is no dedicated parameter store or variables section in Databricks for easily sharing values between jobs. One approach is to save the parameter in the Hive metastore or an S3 bucket, then retrieve it in each job. Alternatively, you ...

  • 0 kudos
User16826994223
by Databricks Employee
  • 8609 Views
  • 5 replies
  • 7 kudos

How to access final delta table for web application or interface.

I have a final layer of the gold delta table, that has final aggregated data from silver data . I want to access this final layer of data through the WEB interfaceI think I need to write a web script that would run the spark SQL behind to get the d...

  • 8609 Views
  • 5 replies
  • 7 kudos
Latest Reply
h_h_ak
Contributor
  • 7 kudos

You can also use direct statement execution from databricks: https://docs.databricks.com/api/workspace/statementexecution

  • 7 kudos
4 More Replies
olivier-soucy
by Contributor
  • 2950 Views
  • 5 replies
  • 2 kudos

Resolved! Spark Structured Streaming foreachBatch with databricks-connect

Hello!I'm trying to use the foreachBatch method of a Spark Streaming DataFrame with databricks-connect. Given that spark connect supported was added to  `foreachBatch` in 3.5.0, I was expecting this to work.Configuration:- DBR 15.4 (Spark 3.5.0)- dat...

  • 2950 Views
  • 5 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

Thanks for sharing the solution! Just curious, was the original error message reported in this post in the Driver log as well?

  • 2 kudos
4 More Replies
alvaro_databric
by New Contributor III
  • 3801 Views
  • 2 replies
  • 2 kudos

How to access hard disk attached to cluster?

Hi,I am using the VM family Lasv3, which incorporate a NVMe SSD. I would like to take advantage of this huge amount of space but I cannot find where this disk is mounted. Does someone know where this disk is mounted and if it can be used as local dri...

  • 3801 Views
  • 2 replies
  • 2 kudos
Latest Reply
JosiahJohnston
New Contributor III
  • 2 kudos

Great question; I've been trying to hunt that down also. `/local_disk0` looks like a good candidate, but it has restricted access and I can't confirm or use.Would love to learn a solution someday. This is a big need for hybrid workflows & libraries c...

  • 2 kudos
1 More Replies
Anand4
by New Contributor II
  • 3260 Views
  • 1 replies
  • 2 kudos

Resolved! Delta Table - Partitioning

Created a streaming job with delta table as a target.  The table did not have a partition when created earlier, however i would like to add an existing column as a partition column.I am getting the following error.com.databricks.sql.transaction.tahoe...

  • 3260 Views
  • 1 replies
  • 2 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 2 kudos

Hi @Anand4,Delta Lake does not support altering the partitioning of an existing table directly. Therefore, the way forward is to rewrite the entire table with the new partition column

  • 2 kudos
mvmiller
by New Contributor III
  • 10934 Views
  • 4 replies
  • 2 kudos

Troubleshooting _handle_rpc_error GRPC Error

I am trying to run the following chunk of code in the cell of a Databricks notebook (using Databricks runtime 14.3 LTS, Apache spark 3.5.0, scala 2.12): spark.sql("CREATE OR REPLACE table sample_catalog.sample_schema.sample_table_tmp AS SELECT * FROM...

  • 10934 Views
  • 4 replies
  • 2 kudos
Latest Reply
kunalmishra9
Contributor
  • 2 kudos

Following. Also having this issue, but within the context of pivoting a DF, then aggregating by *

  • 2 kudos
3 More Replies
ChristianRRL
by Honored Contributor
  • 2806 Views
  • 7 replies
  • 3 kudos

DLT Potential Bug: File Reprocessing Issue with "cloudFiles.allowOverwrites": "true"

Hi there, I ran into a peculiar case and I'm wondering if anyone else has run into this and can offer an explanation. We have a DLT process to pull CSV files from a landing location and insert (append) them into target tables. We have the setting "cl...

  • 2806 Views
  • 7 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

Apologies, that could be the internet or networking issue. So, in DLT you will be able to change the DBR but will have to use custom image, it may be tricky if you have not done it earlier.  By default, photon will be used in serverelss. It may be a ...

  • 3 kudos
6 More Replies
FabianGutierrez
by Contributor
  • 4259 Views
  • 3 replies
  • 1 kudos

Issue with DAB (Databricks Asset Bundle) requesting Terraform files

Hi community,Since recently (2 days ago) we have been receiving the following error when validating and deploying our DAB (Databricks Asset Bundle):"Error: error downloading Terraform: Get "https://releases.hashicorp.com/terraform/1.5.5/index.json": ...

  • 4259 Views
  • 3 replies
  • 1 kudos
Latest Reply
FabianGutierrez
Contributor
  • 1 kudos

Some update, we cannot get the FW cleared on time so we need to go for the offline optiion, that is download everything form Terraform and DB templated but it is not as clear or intuitive as describe. Using their Container unfortunately not a option ...

  • 1 kudos
2 More Replies
pjv
by New Contributor III
  • 2320 Views
  • 1 replies
  • 0 kudos

How to ensure pyspark udf execution is distributed across worker nodes

Hi,I have the following databricks notebook code defined: pyspark_dataframe = create_pyspark_dataframe(some input data)MyUDF = udf(myfunc, StringType())pyspark_dataframe = pyspark_dataframe.withColumn('UDFOutput', DownloadUDF(input data columns))outp...

  • 2320 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@pjv Can you please try the following, you'll basically want to have more than a single partition: from pyspark.sql import SparkSession from pyspark.sql.functions import udf from pyspark.sql.types import StringType # Initialize Spark session (if not...

  • 0 kudos
Labels