cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

FarBo
by New Contributor III
  • 2474 Views
  • 4 replies
  • 5 kudos

Spark issue handling data from json when the schema DataType mismatch occurs

Hi,I have encountered a problem using spark, when creating a dataframe from a raw json source.I have defined an schema for my data and the problem is that when there is a mismatch between one of the column values and its defined schema, spark not onl...

  • 2474 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Farzad Bonabi​ :Thank you for reporting this issue. It seems to be a known bug in Spark when dealing with malformed decimal values. When a decimal value in the input JSON data is not parseable by Spark, it sets not only that column to null but also ...

  • 5 kudos
3 More Replies
Tam
by New Contributor III
  • 5084 Views
  • 2 replies
  • 2 kudos

Delta Table on AWS Glue Catalog

I have set up Databricks cluster to work with AWS Glue Catalog by enabling the spark.databricks.hive.metastore.glueCatalog.enabled to true. However, when I create a Delta table on Glue Catalog, the schema reflected in the AWS Glue Catalog is incorrec...

Tam_0-1700157256870.png Tam_1-1700157262740.png
  • 5084 Views
  • 2 replies
  • 2 kudos
Latest Reply
monometa
New Contributor II
  • 2 kudos

Hi, could you please refer to something or explain in more detail your point about querying Delta Lake files directly instead of through the AWS Glue catalog and why it was highlighted as a best practice?

  • 2 kudos
1 More Replies
NDK_1
by New Contributor II
  • 397 Views
  • 1 replies
  • 0 kudos

I would like to Create a schedule in Databricks that runs a job on 1st working day of every month

I would like to create a schedule in Databricks that runs a job on the first working day of every month (working days referring to Monday through Friday). I tried using Cron syntax but didn't have any luck. Is there any way we can schedule this in Da...

  • 397 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

@NDK_1 - Cron syntax won't allow the combination of day of month and day of week. you can try creating two different schedules  - one for the first day, second day of the month and then add custom logic to check if it is an working day and then trigg...

  • 0 kudos
Constantine
by Contributor III
  • 8488 Views
  • 3 replies
  • 6 kudos

Resolved! CREATE TEMP TABLE FROM CTE

I have written a CTE in Spark SQL WITH temp_data AS (   ......   )   CREATE VIEW AS temp_view FROM SELECT * FROM temp_view; I get a cryptic error. Is there a way to create a temp view from CTE using Spark SQL in databricks?

  • 8488 Views
  • 3 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

In the CTE you can't do a CREATE. It expects an expression in the form of expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( query )where expression_name specifies a name for the common table expression.If you want to create a view from a CTE, y...

  • 6 kudos
2 More Replies
cpayne_vax
by New Contributor III
  • 3809 Views
  • 5 replies
  • 2 kudos

Resolved! Delta Live Tables: dynamic schema

Does anyone know if there's a way to specify an alternate Unity schema in a DLT workflow using the @Dlt.table syntax? In my case, I’m looping through folders in Azure datalake storage to ingest data. I’d like those folders to get created in different...

  • 3809 Views
  • 5 replies
  • 2 kudos
Latest Reply
data-engineer-d
New Contributor III
  • 2 kudos

@cpayne_vax now that we are at end of Q1-24, do we have the ability to write to any schema dynamically?

  • 2 kudos
4 More Replies
test_123
by New Contributor
  • 283 Views
  • 1 replies
  • 0 kudos

Autoloader not detecting changes/updated values for xml file

if i update the value in xml then autoloader not detecting the changes.same for delete/remove column or property in xml.  So request to you please help me to fix this issue

  • 283 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Valued Contributor II
  • 0 kudos

It seems that the issue you're experiencing with Autoloader not detecting changes in XML files might be related to how Autoloader handles schema inference and evolution. Autoloader can automatically detect the schema of loaded XML data, allowing you...

  • 0 kudos
SyedGhouri
by New Contributor III
  • 2633 Views
  • 2 replies
  • 0 kudos

Cannot create jobs with jobs api - Azure databricks - private network

HiI'm trying to deploy the databricks jobs from dev to prod environment. I have jobs in dev environment and using azure devops, I deployed the jobs in the code format to prod environment. Now when I use the post method to create the job programmatica...

  • 2633 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@SyedGhouri You need to setup self-hosted Azure DevOps Agent inside your VNET.

  • 0 kudos
1 More Replies
pshuk
by New Contributor II
  • 737 Views
  • 2 replies
  • 0 kudos

Copying files from dev environment to prod environment

Hi,Is there a quick and easy way to copy files between different environments? I have copied a large number of files on my dev environment (unity catalog) and want to copy them over to production environment. Instead of doing it from scratch, can I j...

  • 737 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

If you want to copy files in Azure, ADF is usually the fastest option (for example TB of csvs, parquets). If you want to copy tables, just use CLONE. If it is files with code just use Repos and branches.

  • 0 kudos
1 More Replies
MarinD
by New Contributor II
  • 834 Views
  • 2 replies
  • 0 kudos

Asset bundle pipelines - target schema and catalog

Do asset bundles support DLT pipelines unity catalog as a destination? How to specify catalog and target schema?

  • 834 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @MarinD, Delta Live Tables (DLT) pipelines can indeed use Unity Catalog as a destination. Here’s how you can specify the catalog and target schema: Create a DLT Pipeline with Unity Catalog: When creating a DLT pipeline, in the UI, select “Uni...

  • 0 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 367 Views
  • 1 replies
  • 1 kudos

Resolved! How to no round formating

Hy guys,I need to format the decimal values but I can`t round thenhave any idea ?thank you â€ƒ

Screenshot 2024-03-20 at 10.01.52.png
  • 367 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @William_Scardua, In Databricks, you can format decimal values without rounding them using a couple of approaches. Let’s explore some options: Using substring: You can use the substring function to extract a specific number of decimal places f...

  • 1 kudos
dbx-user7354
by New Contributor III
  • 694 Views
  • 3 replies
  • 1 kudos

Pyspark Dataframes orderby only orders within partition when having multiple worker

I came across a pyspark issue when sorting the dataframe by a column. It seems like pyspark only orders the data within partitions when having multiple worker, even though it shouldn't.  from pyspark.sql import functions as F import matplotlib.pyplot...

dbxuser7354_0-1711014288660.png dbxuser7354_1-1711014300462.png
  • 694 Views
  • 3 replies
  • 1 kudos
Latest Reply
MarkusFra
New Contributor II
  • 1 kudos

@Kaniz Sorry if I have to ask again, but I am a bit confused by this.I thought, that pysparks `orderBy()` and `sort()` do a shuffle operation before the sorting for exact this reason. There is another command `sortWithinPartitions()` that does not do...

  • 1 kudos
2 More Replies
ac0
by New Contributor III
  • 322 Views
  • 1 replies
  • 0 kudos

Get size of metastore specifically

Currently my Databricks Metastore is in the the same location as the data for my production catalog. We are moving the data to a separate storage account. In advance of this, I'm curious if there is a way to determine the size of the metastore itself...

  • 322 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @ac0,  Let’s explore how you can determine the size of your Databricks Metastore and estimate the storage requirements for the Azure Storage Account hosting the metastore. Metastore Size: The metastore in Unity Catalog is the top-level contain...

  • 0 kudos
Dikshant
by New Contributor
  • 476 Views
  • 1 replies
  • 0 kudos

SchemaEvolutionMode exception in Databricks 14.2

I am unable to display the below stream after reading it.df= spark.readStream.format("cloudFiles")\.option("cloudFiles.format", "csv")\.option("header", "true")\.option("delimiter", "\t")\.option("inferSchema", "true")\.option("cloudFiles.connectionS...

Data Engineering
schemaEvolutionMode
  • 476 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Dikshant,  Unfortunately, stateful streaming queries do not support schema evolution. This means that once a query starts with a particular schema, you cannot change it during query restarts.To resolve this issue, you can set the cloudFiles.schem...

  • 0 kudos
IshaBudhiraja
by New Contributor II
  • 393 Views
  • 1 replies
  • 0 kudos

Installation of external libraries(wheel file) in Data bricks through synapse using new job cluster

Aim-Installation of external libraries(wheel file) in Data bricks through synapse using new job clusterSolution- I have followed the below steps:I have created a pipeline in synapse that consists of a notebook activity that is using a new job cluster...

  • 393 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @IshaBudhiraja,  Confirm that the library version matches the one you intended to install.Ensure that the library is installed in the same Python environment where your notebook or script is running.

  • 0 kudos
Labels
Top Kudoed Authors