cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Databrickguy
by New Contributor II
  • 1370 Views
  • 1 replies
  • 0 kudos

How to use Java MaskFormatter in sparksql?

I create a function based on Java MaskFormatter function in Databricks/Scala.But when I call it from sparksql, I received error messageError in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/t...

  • 1370 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Tim zhang​ :The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register t...

  • 0 kudos
chanansh
by Contributor
  • 1388 Views
  • 1 replies
  • 0 kudos

stream from azure credentials

I am trying to read stream from azure:(spark.readStream .format("cloudFiles") .option('cloudFiles.clientId', CLIENT_ID) .option('cloudFiles.clientSecret', CLIENT_SECRET) .option('cloudFiles.tenantId', TENTANT_ID) .option("header", "true") .opti...

  • 1388 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Hanan Shteingart​ :It looks like you're using the Azure Blob Storage connector for Spark to read data from Azure. The error message suggests that the credentials you provided are not being used by the connector.To specify the credentials, you can se...

  • 0 kudos
fhmessas
by New Contributor II
  • 3328 Views
  • 1 replies
  • 0 kudos

Resolved! Autoloader stream with EventBridge message

Hi All,I have a few streaming jobs running but we have been facing an issue related to messaging. We have multiple feeds within the same root rolder i.e. logs/{accountId}/CloudWatch|CloudTrail|vpcflow/yyyy-mm-dd/logs. Hence, the SQS allows to setup o...

  • 3328 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Fernando Messas​ :Yes, you can configure Autoloader to consume messages from an SQS queue using EventBridge. Here are the steps you can follow:Create an EventBridge rule to filter messages from the SQS queue based on a specific criteria (such as the...

  • 0 kudos
bchaubey
by Contributor II
  • 4286 Views
  • 1 replies
  • 0 kudos

unable to connect with Azure Storage with Scala

Hi Team, I am unable to connect Storage account with scala in Databricks, getting bellow error.AbfsRestOperationException: Status code: -1 error code: null error message: Cannot resolve hostname: ptazsg5gfcivcrstrlrs.dfs.core.windows.netCaused by: Un...

  • 4286 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Bhagwan Chaubey​ :The error message suggests that the hostname for your Azure Storage account could not be resolved. This could happen if there is a network issue, or if the hostname is incorrect.Here are some steps you can try to resolve the issue:...

  • 0 kudos
Data_Sam
by New Contributor II
  • 1045 Views
  • 1 replies
  • 1 kudos

Streaming data apply change error not function with incoming files

Hi all,When I design a streaming data pipeline with incoming moving files and used apply chnge function on silver table comparing change between bronze and silver for removing duplicates based on key columns, do you know why I got ignore change to tr...

  • 1045 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Raymond Huang​ :The error message "ignore changes to true" typically occurs when you are trying to apply changes to a table using Delta Lake's change data capture (CDC) feature, but you have set the option ignoreChanges to true. This option tells De...

  • 1 kudos
NakedSnake
by New Contributor III
  • 1154 Views
  • 1 replies
  • 0 kudos

Connect to resource in another AWS account using transit gateway, not working

I`m trying to reach a service hosted in another AWS account through transit gateway. Databricks environment was created using Terraform, from the template available in the official documentation.Placing a VM in Databricks` private subnets makes us ab...

  • 1154 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Thomaz Moreira​ :It sounds like there might be an issue with the network configuration of your Databricks cluster. Here are a few things you can check:Make sure that your Databricks cluster is in the same VPC as your service in the other AWS account...

  • 0 kudos
anonturtle
by New Contributor
  • 1707 Views
  • 1 replies
  • 0 kudos

How does automl classify which feature is numeric or categorical?

When running automl on its UI, it classifies a feature "local_convenience_store" as both a numeric and categorical column. This affects the result as for numeric columns a scaler is used while in a categorical column it is one hot encoded. For contex...

  • 1707 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@hr then​ :The approach taken by AutoML to classify features as numeric or categorical depends on the specific AutoML framework or library being used, as different implementations may use different methods or heuristics to make this determination.In ...

  • 0 kudos
Llop
by New Contributor II
  • 1756 Views
  • 1 replies
  • 0 kudos

Delta Live Tables CDC doubts

We are trying to migrate to Delta Live Tables an Azure Data Factory pipeline which loads CSV files and outputs Delta Tables in Databricks.The pipeline is triggered on demand via an external application which places the files in a Storage folder and t...

  • 1756 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Enric Llop​ :When using Delta Live Tables to perform a "rip and replace" operation, where you want to replace the existing data in a table with new data, there are a few things to keep in mind.First, the apply_changes function is used to apply chang...

  • 0 kudos
190809
by Contributor
  • 2227 Views
  • 1 replies
  • 0 kudos

Trying to figure out what is causing non-null values in my bronze tables to be returned as NULL in silver tables.

I have a process which loads data from json to a bronze table. It then adds a couple of columns and creates a silver table. But the silver table has NULL values where there were values in the bronze tables. Process as follows:def load_to_silver(sourc...

  • 2227 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Rachel Cunningham​ :One possible reason for this issue could be a data type mismatch between the bronze and silver tables. It is possible that the column in the bronze table has a non-null value, but the data type of that column is different from th...

  • 0 kudos
Harsh_Paliwal
by New Contributor
  • 3447 Views
  • 1 replies
  • 0 kudos

java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, kernel exited with exit code 1.

I am running a parameterized autoloader notebook in a workflow.This notebook is being called 29 times in parallel, and FYI UC is also enabled.I am facing this error:java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, ke...

image
  • 3447 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Harsh Paliwal​ :The error message suggests that there might be a conflict with the xtables lock.One thing you could try is to add the -w option as suggested by the error message. You can add the following command to the beginning of your notebook t...

  • 0 kudos
Chris_Konsur
by New Contributor III
  • 2857 Views
  • 1 replies
  • 0 kudos

Unit test with Nutter

When I run the simple test in a notebook, it works fine, but when I run it from the Azure ADO pipeline, it fails with the error.code;def __init__(self):  NutterFixture.__init__(self)  from runtime.nutterfixture import NutterFixture, tagclass uTestsDa...

  • 2857 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Chris Konsur​ :The error message suggests that there is an issue with the standard output buffer when the Python interpreter is shutting down, which could be related to daemon threads. This error is not specific to Databricks or Azure ADO pipeline, ...

  • 0 kudos
danniely
by New Contributor II
  • 12341 Views
  • 1 replies
  • 2 kudos

Pyspark RDD fails with pytest

when I call RDD Apis during pytest, it seems like module "serializer.py" cannot find any other modules under pyspark.I've already looked up on the internet, and it seems like pyspark modules are not properly importing other referring modules.I see ot...

  • 12341 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@hyunho lee​ : It sounds like you are encountering an issue with PySpark's serializer not being able to find the necessary modules during testing with Pytest. One solution you could try is to set the PYTHONPATH environment variable to include the pat...

  • 2 kudos
quakenbush
by Contributor
  • 6966 Views
  • 1 replies
  • 0 kudos

Is there something like Oracle's VPD-Feature in Databricks?

Since I am porting some code from Oracle to Databricks, I have another specific question.In Oracle there's something called Virtual Private Database, VPD. It's a simple security feature used to generate a WHERE-clause which the system will add to a u...

  • 6966 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Roger Bieri​ :In Databricks, you can use the UserDefinedFunction (UDF) feature to create a custom function that will be applied to a DataFrame. You can use this feature to add a WHERE clause to a DataFrame based on the user context. Here's an exampl...

  • 0 kudos
Fed
by New Contributor III
  • 8048 Views
  • 1 replies
  • 0 kudos

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

Tree-based estimators in pyspark.ml have an argument called checkpointIntervalcheckpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will ...

  • 8048 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Federico Trifoglio​ :If sc.getCheckpointDir() returns None, it means that no checkpoint directory is set in the SparkContext. In this case, the checkpointInterval argument will indeed be ignored. To set a checkpoint directory, you can use the SparkC...

  • 0 kudos
Phani1
by Valued Contributor II
  • 4215 Views
  • 1 replies
  • 0 kudos

best practices/steps for hive meta store backup and restore.

Hi Team,Could you share with us the best practices/steps for hive meta store backup and restore?Regards,Phanindra

  • 4215 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Janga Reddy​ :Certainly! Here are the steps for Hive metastore backup and restore on Databricks:Backup:Stop all running Hive services and jobs on the Databricks cluster.Create a backup directory in DBFS (Databricks File System) where the metadata fi...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels