cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vannipart
by New Contributor III
  • 2315 Views
  • 1 replies
  • 1 kudos

Resolved! SparkOutOfMemoryError when merging data into a table that already has data

Hello, There is an issue with merging data from a dataframe into a table 2024 databricksJob aborted due to stage failure: Task 17 in stage 1770.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1770.0 (TID 1669) (1x.xx.xx.xx executor 8):...

  • 2315 Views
  • 1 replies
  • 1 kudos
karthika
by New Contributor II
  • 1628 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks associate certification

 I encountered this experience while attempting my 1st DataBricks certification. Abruptly, Proctor asked me to show my desk, after showing he/she asked multiple times.. . My test got paused multiple times even when I am looking at my screenI want to ...

  • 1628 Views
  • 1 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

@Cert-TeamOPS @Cert-Team  Please help this person For Now @karthika  use this  for filing a ticket with our support team. Please allow the support team 24-48 hours for a resolution. In the meantime, you can review the following documentation:Room req...

  • 0 kudos
hari-prasad
by Valued Contributor II
  • 9186 Views
  • 8 replies
  • 2 kudos

Spark read GZ file as corrupted data, when file extension having .GZ in upper case

if file is renamed with file_name.sv.gz (lower case extension) is working fine, if file_name.sv.GZ (upper case extension) the data is read as corrupted, means it simply reading compressed file as is. 

hprasad_0-1705667590987.png
Data Engineering
gzip files
spark-csv
spark.read.csv
  • 9186 Views
  • 8 replies
  • 2 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 2 kudos

Recently I restarted look at a solution for this issue, I found out we can add few exception for allowing "GZ" in hadoop library as GzipCodec is invoked from there.

  • 2 kudos
7 More Replies
vjani
by New Contributor III
  • 3084 Views
  • 4 replies
  • 5 kudos

Resolved! Global init script not running

Hello Databricks Community,I am trying to connect databricks with datadog and have added datadog agent script in global init but it did not worked. Just to make sure if init script is working or not I have added below two lined of code in global init...

  • 3084 Views
  • 4 replies
  • 5 kudos
Latest Reply
vjani
New Contributor III
  • 5 kudos

Thanks Slash for the reply. That seems to be a reason. I was following https://docs.datadoghq.com/integrations/databricks/?tab=driveronly and missed that configuration.

  • 5 kudos
3 More Replies
anand_k
by New Contributor II
  • 1106 Views
  • 1 replies
  • 1 kudos

Variant Support in SQL Alchemy

Databricks now supports the VARIANT data type, which works well in the UI and within Spark environments. However, when working with SQLAlchemy, the VARIANT type doesn't seem to be fully implemented in the latest databricks-sql-connector[sqlalchemy]. ...

  • 1106 Views
  • 1 replies
  • 1 kudos
Latest Reply
Witold
Databricks Partner
  • 1 kudos

This is actually an open source project. By looking at the code, it seems that VARIANT is not yet supported. Depending on your knowledge of the code base, you could create an own PR. Or just open an issue there, and wait for the support of the devs.

  • 1 kudos
RobCox
by New Contributor II
  • 1447 Views
  • 1 replies
  • 1 kudos

Unable to Analyze External Delta tables due to failed to initialize filesystem

Hello,I've recently noticed we've never been using Analyze Table, after doing z-ordering / liquid clustering investigations and noticing the query plans for our delta tables were not considering these paths.I'm trying to execute the following command...

  • 1447 Views
  • 1 replies
  • 1 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 1 kudos

Hi @RobCox, This might be due to incorrect configuration settings or insufficient permissions. Ensure that the fs.azure.account.key configuration is accurate and that the service principal or identity running the command has the necessary permissions...

  • 1 kudos
jenshumrich
by Contributor
  • 3299 Views
  • 4 replies
  • 3 kudos

Databricks resets notebook all the time

Whenever I run my script it resets the notebook state:"The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.at com.databricks.spark.chauffeur.Chauffeur.onDriverStateChange(Chauffeur.scala:1467)"T...

  • 3299 Views
  • 4 replies
  • 3 kudos
Latest Reply
jenshumrich
Contributor
  • 3 kudos

To get closer to the error:There is same mystical size limit.

  • 3 kudos
3 More Replies
reachrishav
by New Contributor II
  • 3804 Views
  • 2 replies
  • 0 kudos

XML to Parquet files

I have a requirement where I need to ingest large xml files and flatten the data before saving it as parquet files. I have created a python function to flatten the complex types (array & struct) from the ingested xml dataframe. I'm using the spark-xm...

  • 3804 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @reachrishav ,Since 14.3 there is a native support for read and write XML files. Maybe check if it works faster than the library that you've used:Read and write XML files | Databricks on AWSAnd you've mentioned that you write python function to fl...

  • 0 kudos
1 More Replies
YS1
by Contributor
  • 1868 Views
  • 2 replies
  • 0 kudos

DLT - Importing Python Package

Hello,I'm creating a DLT pipeline where I read a Kafka stream, perform transformations using UDFs, and save the data in multiple tables. When I define the functions directly in the same notebook, the code works fine. However, if I move the code into ...

YS1_1-1723071739598.png YS1_0-1723071683421.png
  • 1868 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @YS1 ,Have you added the python file in the Pipeline settings, in the list of source code?     

  • 0 kudos
1 More Replies
skolukmar
by New Contributor
  • 1681 Views
  • 2 replies
  • 0 kudos

Delta Live Tables: control microbatch size

A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?I am thinking about a solution used by spark structured streaming that enables control of batch size using:.optio...

  • 1681 Views
  • 2 replies
  • 0 kudos
Latest Reply
lprevost
Contributor III
  • 0 kudos

One other thought -- if you are considering using pandas_udf api, there is a way to control batch size there:pandas_udf guide   note the comments there about arrow batch size params.

  • 0 kudos
1 More Replies
gpierard
by New Contributor III
  • 25333 Views
  • 3 replies
  • 1 kudos

Resolved! how to list all spark session config variables

In databricks I can set a config variable at session level, but it is not found in the context variables:spark.conf.set(f"dataset.bookstore", '123') #dataset_bookstore spark.conf.get(f"dataset.bookstore")#123 scf = spark.sparkContext.getConf() allc =...

  • 25333 Views
  • 3 replies
  • 1 kudos
Latest Reply
RyanHager
Contributor
  • 1 kudos

A while back I think I found a way to get python to list all the config values.  I was not able to re-create it.  Just make one of your notebook code sections scala (first line) and use the second line: %scala(spark.conf.getAll).foreach(println)

  • 1 kudos
2 More Replies
Twilight
by Contributor
  • 2026 Views
  • 2 replies
  • 3 kudos

web terminal accessing /Workspace/Users under tmux

I found this old post (https://community.databricks.com/t5/data-engineering/databricks-cluster-web-terminal-different-permissions-with-tmux/td-p/26461) that was never really answered.I am having the same problem.  If I am in the raw terminal, I can a...

  • 2026 Views
  • 2 replies
  • 3 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 3 kudos

Hi @Twilight, To resolve this, ensure the `tmux` session runs under the same user context as the raw terminal, verify environment variables are set correctly, initialize `tmux` with the same shell and environment settings, check for any ACLs on the `...

  • 3 kudos
1 More Replies
oripsk
by New Contributor
  • 1196 Views
  • 1 replies
  • 0 kudos

Column ordering when querying a clustered table

If I have a table which is clustered by (a, b, c) and I issue a query filtering on (b, c), will the query benefit from the optimization by the cluster of (a, b, c)?  

  • 1196 Views
  • 1 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @oripsk, When you query a table clustered by columns (a, b, c) and filter on (b, c), the query will not fully benefit from the clustering optimization. Clustering works best when the query filter includes the leading column(s) in the clustering or...

  • 0 kudos
Anonymous
by Not applicable
  • 22860 Views
  • 2 replies
  • 3 kudos
  • 22860 Views
  • 2 replies
  • 3 kudos
Latest Reply
zerasmus
Contributor
  • 3 kudos

On newer Databricks Runtime versions, %conda commands are not supported. You can use %pip commands instead:%pip list I have tested this on Databricks Runtime 15.4 LTS Beta.

  • 3 kudos
1 More Replies
ad_k
by New Contributor
  • 993 Views
  • 1 replies
  • 0 kudos

Create delta files from Unity Catalog Objects

Hello,I have tables created on unity catalog that point to the raw area , from these tables I need to create a data model (facts and dimensions) that will aggregate this data, transform certain things. Then I need to store in the Azure Datalake in de...

  • 993 Views
  • 1 replies
  • 0 kudos
Latest Reply
Retired_mod
Esteemed Contributor III
  • 0 kudos

Hi @ad_k, To create a data model from Unity Catalog tables and store it in Azure data lake in Delta format, use Databricks Notebooks with PySpark or SQL. The process involves reading raw data from Unity Catalog, transforming it into fact and dimensio...

  • 0 kudos
Labels