cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jenshumrich
by Contributor
  • 975 Views
  • 4 replies
  • 3 kudos

Databricks resets notebook all the time

Whenever I run my script it resets the notebook state:"The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.at com.databricks.spark.chauffeur.Chauffeur.onDriverStateChange(Chauffeur.scala:1467)"T...

  • 975 Views
  • 4 replies
  • 3 kudos
Latest Reply
jenshumrich
Contributor
  • 3 kudos

To get closer to the error:There is same mystical size limit.

  • 3 kudos
3 More Replies
alesventus
by Contributor
  • 897 Views
  • 0 replies
  • 0 kudos

Effectively refresh Power BI report based on Delta Lake

Hi, I have several Power BI reports based on Delta Lake tables that are refreshed every 4 hours. ETL process in Databricks is much cheaper that refresh of these Power BI reports. My questions are: if approach described below is correct and if there i...

alesventus_0-1723191725173.png alesventus_1-1723192163389.png
  • 897 Views
  • 0 replies
  • 0 kudos
reachrishav
by New Contributor II
  • 1090 Views
  • 2 replies
  • 0 kudos

XML to Parquet files

I have a requirement where I need to ingest large xml files and flatten the data before saving it as parquet files. I have created a python function to flatten the complex types (array & struct) from the ingested xml dataframe. I'm using the spark-xm...

  • 1090 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @reachrishav ,Since 14.3 there is a native support for read and write XML files. Maybe check if it works faster than the library that you've used:Read and write XML files | Databricks on AWSAnd you've mentioned that you write python function to fl...

  • 0 kudos
1 More Replies
NSJ
by New Contributor II
  • 339 Views
  • 0 replies
  • 1 kudos

Setup learning environment failed: Configuration dbacademy.library.version is not available.

Using 1.3 Getting Started with the Databricks Platform Lab.  to self learning. When I run DE 2.1 to setup environment, got following error:Configuration dbacademy.library.version is not available.Following is the code in the common setup.specified_ve...

  • 339 Views
  • 0 replies
  • 1 kudos
YS1
by Contributor
  • 484 Views
  • 2 replies
  • 0 kudos

DLT - Importing Python Package

Hello,I'm creating a DLT pipeline where I read a Kafka stream, perform transformations using UDFs, and save the data in multiple tables. When I define the functions directly in the same notebook, the code works fine. However, if I move the code into ...

YS1_1-1723071739598.png YS1_0-1723071683421.png
  • 484 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @YS1 ,Have you added the python file in the Pipeline settings, in the list of source code?     

  • 0 kudos
1 More Replies
skolukmar
by New Contributor
  • 482 Views
  • 1 replies
  • 0 kudos

Delta Live Tables: control microbatch size

A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?I am thinking about a solution used by spark structured streaming that enables control of batch size using:.optio...

  • 482 Views
  • 1 replies
  • 0 kudos
Latest Reply
lprevost
Contributor
  • 0 kudos

One other thought -- if you are considering using pandas_udf api, there is a way to control batch size there:pandas_udf guide   note the comments there about arrow batch size params.

  • 0 kudos
gpierard
by New Contributor III
  • 14149 Views
  • 3 replies
  • 0 kudos

Resolved! how to list all spark session config variables

In databricks I can set a config variable at session level, but it is not found in the context variables:spark.conf.set(f"dataset.bookstore", '123') #dataset_bookstore spark.conf.get(f"dataset.bookstore")#123 scf = spark.sparkContext.getConf() allc =...

  • 14149 Views
  • 3 replies
  • 0 kudos
Latest Reply
RyanHager
Contributor
  • 0 kudos

A while back I think I found a way to get python to list all the config values.  I was not able to re-create it.  Just make one of your notebook code sections scala (first line) and use the second line: %scala(spark.conf.getAll).foreach(println)

  • 0 kudos
2 More Replies
Twilight
by New Contributor III
  • 486 Views
  • 1 replies
  • 1 kudos

web terminal accessing /Workspace/Users under tmux

I found this old post (https://community.databricks.com/t5/data-engineering/databricks-cluster-web-terminal-different-permissions-with-tmux/td-p/26461) that was never really answered.I am having the same problem.  If I am in the raw terminal, I can a...

  • 486 Views
  • 1 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
I found this old post (https://community.databricks.com/t5/data-engineering/databricks-cluster-web-terminal-different-permissions-with-tmux/td-p/26461) that was never really answered.I am having the same problem.  If I am in the raw terminal, I can a...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.
Anonymous
by Not applicable
  • 16453 Views
  • 2 replies
  • 3 kudos
  • 16453 Views
  • 2 replies
  • 3 kudos
Latest Reply
zerasmus
Contributor
  • 3 kudos

On newer Databricks Runtime versions, %conda commands are not supported. You can use %pip commands instead:%pip list I have tested this on Databricks Runtime 15.4 LTS Beta.

  • 3 kudos
1 More Replies
Mangeysh
by New Contributor
  • 272 Views
  • 0 replies
  • 0 kudos

Azure data bricks API for JSON output , displaying on UI

Hello AllI am new to Azure Data Bricks and trying to show the Azure data bricks table data onto UI using react JS. Lets say there 2 tables Emplyee and Salary , I need to join these two tables with empid and generate JSON out put and calling API (end ...

  • 272 Views
  • 0 replies
  • 0 kudos
TimB
by New Contributor III
  • 9174 Views
  • 9 replies
  • 3 kudos

Passing multiple paths to .load in autoloader

I am trying to use autoloader to load data from two different blobs from within the same account so that spark will discover the data asynchronously. However, when I try this, it doesn't work and I get the error outlined below. Can anyone point out w...

  • 9174 Views
  • 9 replies
  • 3 kudos
Latest Reply
TimB
New Contributor III
  • 3 kudos

If were were to upgrade to ADLSg2, but retain the same structure, would there be scope for this method above to be improved (besides moving to notification mode)?

  • 3 kudos
8 More Replies
Venky
by New Contributor III
  • 77769 Views
  • 18 replies
  • 19 kudos

Resolved! i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

image
  • 77769 Views
  • 18 replies
  • 19 kudos
Latest Reply
Alexis
New Contributor III
  • 19 kudos

Hiyou can try: my_df = spark.read.format("csv")      .option("inferSchema","true")  # to get the types from your data      .option("sep",",")            # if your file is using "," as separator      .option("header","true")       # if you...

  • 19 kudos
17 More Replies
mexcram
by New Contributor II
  • 824 Views
  • 1 replies
  • 1 kudos

Glue database and saveAsTable

Hello all,I am saving my data frame as a Delta Table to S3 and AWS Glue using pyspark and `saveAsTable`, so far I can do this but something curious happens when I try to change the `path` (as an option or as an argument of `saveAsTable`).The location...

  • 824 Views
  • 1 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
Hello all,I am saving my data frame as a Delta Table to S3 and AWS Glue using pyspark and `saveAsTable`, so far I can do this but something curious happens when I try to change the `path` (as an option or as an argument of `saveAsTable`).The location...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.
hpant
by New Contributor III
  • 2252 Views
  • 5 replies
  • 1 kudos

Autoloader error "Failed to infer schema for format json from existing files in input"

I have two json files in one of the location in Azure gen 2 storage e.g. '/mnt/abc/Testing/'. When I trying to read the files using autoloader I am getting this error: "Failed to infer schema for format json from existing files in input path /mnt/abc...

  • 2252 Views
  • 5 replies
  • 1 kudos
Latest Reply
holly
Databricks Employee
  • 1 kudos

Hi @hpant would you consider testing the new VARIANT type for your JSON data? I appreciate it will require rewriting the next step in your pipeline, but should be more robust wrt errors.  Disclaimer: I haven't personally tested variant with Autoloade...

  • 1 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels