cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

429957
by New Contributor
  • 1155 Views
  • 1 replies
  • 0 kudos

DeltaColumnMappingUnsupportedException' when performing 'Full refresh all' on DLT pipeline

Trigger:Perform 'Full refresh all' on a DLT pipeline (new or existing). The existing DLT table already existed beforehand.Issue:Getting the error 'DeltaColumnMappingUnsupportedException' during "Setting up tables" stage.```com.databricks.sql.transact...

  • 1155 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Raeger Tay​ :The error message indicates that a schema change has been detected while changing the column mapping mode. It seems like you are trying to change the column mapping mode from the default (position) to the "name" mode, which maps columns...

  • 0 kudos
KVNARK
by Honored Contributor II
  • 2811 Views
  • 1 replies
  • 5 kudos

accessing power bi dataset using MDX query using windows is working but the same not working using python Linux server.

trying to access the SSAS POIWER BI dataset using MDX query from python LInux server. We are hitting roadblock. The existing setup works as expected in windows system due to adodb.dll but unable to connect in Linux. Any help would be much appreciated...

  • 2811 Views
  • 1 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@KVNARK .​ :One potential solution would be to use an open-source MDX library for Python that can connect to SSAS, such as OLAP-XMLA for Python. This library can be used to execute MDX queries against a SSAS server, including Power BI datasets.Here's...

  • 5 kudos
Indra
by New Contributor
  • 1963 Views
  • 1 replies
  • 0 kudos

Performance issue with Simba ODBC Driver to perform simple insert command to Deltalake

Hi,Our team is using Simba ODBC to perform data loading to Deltalake, and For a table with 3 columns it took around 55 seconds to insert 15 records. How to improve transactional loading into Deltalake? is there any option from the Simba ODBC driver t...

  • 1963 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Indra Limena​ :There are several ways to improve transactional loading into Delta Lake:Use Delta Lake's native Delta JDBC/ODBC connector instead of a third-party ODBC driver like Simba. The native connector is optimized for Delta Lake and supports b...

  • 0 kudos
Istuti
by Contributor
  • 3233 Views
  • 1 replies
  • 2 kudos
  • 3233 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Istuti Gupta​ :There are several algorithms you can use to mask a column in Databricks in a way that is compatible with SQL Server. One commonly used algorithm is called pseudonymization or tokenization.Here's an example of how you can implement pse...

  • 2 kudos
Databrickguy
by New Contributor II
  • 1505 Views
  • 1 replies
  • 0 kudos

How to use Java MaskFormatter in sparksql?

I create a function based on Java MaskFormatter function in Databricks/Scala.But when I call it from sparksql, I received error messageError in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/t...

  • 1505 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Tim zhang​ :The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register t...

  • 0 kudos
chanansh
by Contributor
  • 1506 Views
  • 1 replies
  • 0 kudos

stream from azure credentials

I am trying to read stream from azure:(spark.readStream .format("cloudFiles") .option('cloudFiles.clientId', CLIENT_ID) .option('cloudFiles.clientSecret', CLIENT_SECRET) .option('cloudFiles.tenantId', TENTANT_ID) .option("header", "true") .opti...

  • 1506 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Hanan Shteingart​ :It looks like you're using the Azure Blob Storage connector for Spark to read data from Azure. The error message suggests that the credentials you provided are not being used by the connector.To specify the credentials, you can se...

  • 0 kudos
fhmessas
by New Contributor II
  • 3583 Views
  • 1 replies
  • 0 kudos

Resolved! Autoloader stream with EventBridge message

Hi All,I have a few streaming jobs running but we have been facing an issue related to messaging. We have multiple feeds within the same root rolder i.e. logs/{accountId}/CloudWatch|CloudTrail|vpcflow/yyyy-mm-dd/logs. Hence, the SQS allows to setup o...

  • 3583 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Fernando Messas​ :Yes, you can configure Autoloader to consume messages from an SQS queue using EventBridge. Here are the steps you can follow:Create an EventBridge rule to filter messages from the SQS queue based on a specific criteria (such as the...

  • 0 kudos
bchaubey
by Contributor II
  • 4507 Views
  • 1 replies
  • 0 kudos

unable to connect with Azure Storage with Scala

Hi Team, I am unable to connect Storage account with scala in Databricks, getting bellow error.AbfsRestOperationException: Status code: -1 error code: null error message: Cannot resolve hostname: ptazsg5gfcivcrstrlrs.dfs.core.windows.netCaused by: Un...

  • 4507 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Bhagwan Chaubey​ :The error message suggests that the hostname for your Azure Storage account could not be resolved. This could happen if there is a network issue, or if the hostname is incorrect.Here are some steps you can try to resolve the issue:...

  • 0 kudos
Data_Sam
by New Contributor II
  • 1125 Views
  • 1 replies
  • 1 kudos

Streaming data apply change error not function with incoming files

Hi all,When I design a streaming data pipeline with incoming moving files and used apply chnge function on silver table comparing change between bronze and silver for removing duplicates based on key columns, do you know why I got ignore change to tr...

  • 1125 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Raymond Huang​ :The error message "ignore changes to true" typically occurs when you are trying to apply changes to a table using Delta Lake's change data capture (CDC) feature, but you have set the option ignoreChanges to true. This option tells De...

  • 1 kudos
NakedSnake
by New Contributor III
  • 1285 Views
  • 1 replies
  • 0 kudos

Connect to resource in another AWS account using transit gateway, not working

I`m trying to reach a service hosted in another AWS account through transit gateway. Databricks environment was created using Terraform, from the template available in the official documentation.Placing a VM in Databricks` private subnets makes us ab...

  • 1285 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Thomaz Moreira​ :It sounds like there might be an issue with the network configuration of your Databricks cluster. Here are a few things you can check:Make sure that your Databricks cluster is in the same VPC as your service in the other AWS account...

  • 0 kudos
anonturtle
by New Contributor
  • 1896 Views
  • 1 replies
  • 0 kudos

How does automl classify which feature is numeric or categorical?

When running automl on its UI, it classifies a feature "local_convenience_store" as both a numeric and categorical column. This affects the result as for numeric columns a scaler is used while in a categorical column it is one hot encoded. For contex...

  • 1896 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@hr then​ :The approach taken by AutoML to classify features as numeric or categorical depends on the specific AutoML framework or library being used, as different implementations may use different methods or heuristics to make this determination.In ...

  • 0 kudos
Llop
by New Contributor II
  • 1900 Views
  • 1 replies
  • 0 kudos

Delta Live Tables CDC doubts

We are trying to migrate to Delta Live Tables an Azure Data Factory pipeline which loads CSV files and outputs Delta Tables in Databricks.The pipeline is triggered on demand via an external application which places the files in a Storage folder and t...

  • 1900 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Enric Llop​ :When using Delta Live Tables to perform a "rip and replace" operation, where you want to replace the existing data in a table with new data, there are a few things to keep in mind.First, the apply_changes function is used to apply chang...

  • 0 kudos
190809
by Contributor
  • 2430 Views
  • 1 replies
  • 0 kudos

Trying to figure out what is causing non-null values in my bronze tables to be returned as NULL in silver tables.

I have a process which loads data from json to a bronze table. It then adds a couple of columns and creates a silver table. But the silver table has NULL values where there were values in the bronze tables. Process as follows:def load_to_silver(sourc...

  • 2430 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Rachel Cunningham​ :One possible reason for this issue could be a data type mismatch between the bronze and silver tables. It is possible that the column in the bronze table has a non-null value, but the data type of that column is different from th...

  • 0 kudos
Harsh_Paliwal
by New Contributor
  • 3822 Views
  • 1 replies
  • 0 kudos

java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, kernel exited with exit code 1.

I am running a parameterized autoloader notebook in a workflow.This notebook is being called 29 times in parallel, and FYI UC is also enabled.I am facing this error:java.lang.Exception: Unable to start python kernel for ReplId-79217-e05fc-0a4ce-2, ke...

image
  • 3822 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Harsh Paliwal​ :The error message suggests that there might be a conflict with the xtables lock.One thing you could try is to add the -w option as suggested by the error message. You can add the following command to the beginning of your notebook t...

  • 0 kudos
Chris_Konsur
by New Contributor III
  • 3044 Views
  • 1 replies
  • 0 kudos

Unit test with Nutter

When I run the simple test in a notebook, it works fine, but when I run it from the Azure ADO pipeline, it fails with the error.code;def __init__(self):  NutterFixture.__init__(self)  from runtime.nutterfixture import NutterFixture, tagclass uTestsDa...

  • 3044 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Chris Konsur​ :The error message suggests that there is an issue with the standard output buffer when the Python interpreter is shutting down, which could be related to daemon threads. This error is not specific to Databricks or Azure ADO pipeline, ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels