cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

NSJ
by New Contributor
  • 198 Views
  • 0 replies
  • 0 kudos

Setup learning environment failed: Configuration dbacademy.library.version is not available.

Using 1.3 Getting Started with the Databricks Platform Lab.  to self learning. When I run DE 2.1 to setup environment, got following error:Configuration dbacademy.library.version is not available.Following is the code in the common setup.specified_ve...

  • 198 Views
  • 0 replies
  • 0 kudos
YS1
by Contributor
  • 356 Views
  • 2 replies
  • 0 kudos

DLT - Importing Python Package

Hello,I'm creating a DLT pipeline where I read a Kafka stream, perform transformations using UDFs, and save the data in multiple tables. When I define the functions directly in the same notebook, the code works fine. However, if I move the code into ...

YS1_1-1723071739598.png YS1_0-1723071683421.png
  • 356 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @YS1 ,Have you added the python file in the Pipeline settings, in the list of source code?     

  • 0 kudos
1 More Replies
skolukmar
by New Contributor
  • 350 Views
  • 2 replies
  • 0 kudos

Delta Live Tables: control microbatch size

A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?I am thinking about a solution used by spark structured streaming that enables control of batch size using:.optio...

  • 350 Views
  • 2 replies
  • 0 kudos
Latest Reply
lprevost
Contributor
  • 0 kudos

One other thought -- if you are considering using pandas_udf api, there is a way to control batch size there:pandas_udf guide   note the comments there about arrow batch size params.

  • 0 kudos
1 More Replies
gpierard
by New Contributor III
  • 11744 Views
  • 4 replies
  • 0 kudos

Resolved! how to list all spark session config variables

In databricks I can set a config variable at session level, but it is not found in the context variables:spark.conf.set(f"dataset.bookstore", '123') #dataset_bookstore spark.conf.get(f"dataset.bookstore")#123 scf = spark.sparkContext.getConf() allc =...

  • 11744 Views
  • 4 replies
  • 0 kudos
Latest Reply
RyanHager
Contributor
  • 0 kudos

A while back I think I found a way to get python to list all the config values.  I was not able to re-create it.  Just make one of your notebook code sections scala (first line) and use the second line: %scala(spark.conf.getAll).foreach(println)

  • 0 kudos
3 More Replies
Twilight
by New Contributor III
  • 341 Views
  • 2 replies
  • 1 kudos

web terminal accessing /Workspace/Users under tmux

I found this old post (https://community.databricks.com/t5/data-engineering/databricks-cluster-web-terminal-different-permissions-with-tmux/td-p/26461) that was never really answered.I am having the same problem.  If I am in the raw terminal, I can a...

  • 341 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Twilight, To resolve this, ensure the `tmux` session runs under the same user context as the raw terminal, verify environment variables are set correctly, initialize `tmux` with the same shell and environment settings, check for any ACLs on the `...

  • 1 kudos
1 More Replies
oripsk
by New Contributor
  • 199 Views
  • 1 replies
  • 0 kudos

Column ordering when querying a clustered table

If I have a table which is clustered by (a, b, c) and I issue a query filtering on (b, c), will the query benefit from the optimization by the cluster of (a, b, c)?  

  • 199 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @oripsk, When you query a table clustered by columns (a, b, c) and filter on (b, c), the query will not fully benefit from the clustering optimization. Clustering works best when the query filter includes the leading column(s) in the clustering or...

  • 0 kudos
Anonymous
by Not applicable
  • 15224 Views
  • 2 replies
  • 3 kudos
  • 15224 Views
  • 2 replies
  • 3 kudos
Latest Reply
zerasmus
Contributor
  • 3 kudos

On newer Databricks Runtime versions, %conda commands are not supported. You can use %pip commands instead:%pip list I have tested this on Databricks Runtime 15.4 LTS Beta.

  • 3 kudos
1 More Replies
ad_k
by New Contributor
  • 218 Views
  • 1 replies
  • 0 kudos

Create delta files from Unity Catalog Objects

Hello,I have tables created on unity catalog that point to the raw area , from these tables I need to create a data model (facts and dimensions) that will aggregate this data, transform certain things. Then I need to store in the Azure Datalake in de...

  • 218 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ad_k, To create a data model from Unity Catalog tables and store it in Azure data lake in Delta format, use Databricks Notebooks with PySpark or SQL. The process involves reading raw data from Unity Catalog, transforming it into fact and dimensio...

  • 0 kudos
KFries
by New Contributor
  • 374 Views
  • 1 replies
  • 0 kudos

SQL Notebook Tab Spacing

My SQL notebooks in databricks suffer from having at least several different counts of spaces between tab marks.  It makes it very difficult to maintain pretty code spacing.  What sets the tab spacing in SQL language notebooks, and how is it set/adju...

  • 374 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @KFries, It sounds like you're dealing with a common issue with tab spacing in Databricks SQL notebooks. Databricks uses 2 spaces per tab by default, and while you can't change this setting globally, you can manually format your notebook via the "...

  • 0 kudos
Mangeysh
by New Contributor
  • 155 Views
  • 0 replies
  • 0 kudos

Azure data bricks API for JSON output , displaying on UI

Hello AllI am new to Azure Data Bricks and trying to show the Azure data bricks table data onto UI using react JS. Lets say there 2 tables Emplyee and Salary , I need to join these two tables with empid and generate JSON out put and calling API (end ...

  • 155 Views
  • 0 replies
  • 0 kudos
TimB
by New Contributor III
  • 8182 Views
  • 9 replies
  • 3 kudos

Passing multiple paths to .load in autoloader

I am trying to use autoloader to load data from two different blobs from within the same account so that spark will discover the data asynchronously. However, when I try this, it doesn't work and I get the error outlined below. Can anyone point out w...

  • 8182 Views
  • 9 replies
  • 3 kudos
Latest Reply
TimB
New Contributor III
  • 3 kudos

If were were to upgrade to ADLSg2, but retain the same structure, would there be scope for this method above to be improved (besides moving to notification mode)?

  • 3 kudos
8 More Replies
Leigh_Turner
by New Contributor
  • 248 Views
  • 0 replies
  • 0 kudos

dataframe checkpoint when checkpoint location on abfss

 I'm trying to switch checkpoint locations from dbfs to abfss and i have noticed the following behaviour.The spark.sparkContext.setCheckpointDir will fail unless I call...dbutils.fs.mkdirs(checkpoint_dir) in the same cell.On top of this, the df = df....

  • 248 Views
  • 0 replies
  • 0 kudos
Venky
by New Contributor III
  • 69832 Views
  • 21 replies
  • 20 kudos

Resolved! i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

image
  • 69832 Views
  • 21 replies
  • 20 kudos
Latest Reply
Alexis
New Contributor III
  • 20 kudos

Hiyou can try: my_df = spark.read.format("csv")      .option("inferSchema","true")  # to get the types from your data      .option("sep",",")            # if your file is using "," as separator      .option("header","true")       # if you...

  • 20 kudos
20 More Replies
mexcram
by New Contributor II
  • 518 Views
  • 2 replies
  • 2 kudos

Resolved! Glue database and saveAsTable

Hello all,I am saving my data frame as a Delta Table to S3 and AWS Glue using pyspark and `saveAsTable`, so far I can do this but something curious happens when I try to change the `path` (as an option or as an argument of `saveAsTable`).The location...

  • 518 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @mexcram, When saving a DataFrame as a Delta Table to S3 and AWS Glue using PySpark's `saveAsTable`, changing the `path` option or argument often results in the Glue table location being set to a placeholder path (e.g., `s3://my-bucket/my_table-__...

  • 2 kudos
1 More Replies
SeyedA
by New Contributor
  • 216 Views
  • 1 replies
  • 0 kudos

Debug UDFs using VSCode extension

I am trying to debug my python script using Databricks VSCode extension. I am using udf and pandas_udf in my script. Everything works fine except when the execution gets to the udf and pandas_udf usages. It then complains that "SparkContext or SparkS...

  • 216 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @SeyedA, To resolve this, first, ensure your SparkSession is properly initialized in your script. Be aware of the limitations of Databricks Connect, which might affect UDFs, and consider running UDFs locally in a simple Spark environment for debug...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels