cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RJB
by New Contributor II
  • 15660 Views
  • 6 replies
  • 0 kudos

Resolved! How to pass outputs from a python task to a notebook task

I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...

  • 15660 Views
  • 6 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 0 kudos

Just a note that this feature, Task Values, has been generally available for a while.

  • 0 kudos
5 More Replies
hari
by Contributor
  • 26765 Views
  • 3 replies
  • 7 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

  • 26765 Views
  • 3 replies
  • 7 kudos
Latest Reply
hari
Contributor
  • 7 kudos

Updated the description

  • 7 kudos
2 More Replies
Anonymous
by Not applicable
  • 1240 Views
  • 0 replies
  • 1 kudos

Heads up! November Community Social!  On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure ...

Heads up! November Community Social! On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social...

  • 1240 Views
  • 0 replies
  • 1 kudos
Taha_Hussain
by Databricks Employee
  • 2105 Views
  • 0 replies
  • 8 kudos

Ask your technical questions at Databricks Office Hours October 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register...

Ask your technical questions at Databricks Office HoursOctober 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register Here (NEW EMEA Office Hours)Databricks Office Hours connects you directly with experts to answer all...

  • 2105 Views
  • 0 replies
  • 8 kudos
pen
by New Contributor II
  • 2976 Views
  • 2 replies
  • 2 kudos

Pyspark will error while I pack source zip package without dir.

If I send the package made by zipfile on spark.submit.pyFiles which zip by this code. import zipfile, os def make_zip(source_dir, output_filename): with zipfile.ZipFile(output_filename, 'w') as zipf: pre_len = len(os.path....

  • 2976 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

I checked, and your code is ok. If you set source_dir and output_filename please remember to start path with /dbfsIf you work on the community edition you can get problems with access to underlying filesystem.

  • 2 kudos
1 More Replies
mghildiy
by New Contributor
  • 1969 Views
  • 1 replies
  • 1 kudos

Checking spark performance locally

I am experimenting with spark, on my local machine. So, is there some tool/api available to check the performance of the code I write?For eg. I write:val startTime = System.nanoTime() invoicesDF .select( count("*").as("Total Number Of Inv...

  • 1969 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Please check the details about your code (task in jobs) in Spark UI.

  • 1 kudos
g96g
by New Contributor III
  • 6821 Views
  • 1 replies
  • 1 kudos

Resolved! how can I pass the df columns as a parameter

Im doing the self study and want pass df column name as a parameter.I have defined the widget column_name= dbutils.widgets.get('column_name')which is executing succefuly ( giving me a column name)then Im reading the df and do some transformation and ...

  • 6821 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

df2.select([column_name]).writeORdf2.select(column_name).write

  • 1 kudos
Mado
by Valued Contributor II
  • 32656 Views
  • 2 replies
  • 6 kudos

Resolved! Difference between "spark.table" & "spark.read.table"?

Hi,I want to make a PySpark DataFrame from a Table. I would like to ask about the difference of the following commands:spark.read.table(TableName)&spark.table(TableName)Both return PySpark DataFrame and look similar. Thanks.

  • 32656 Views
  • 2 replies
  • 6 kudos
Latest Reply
Mado
Valued Contributor II
  • 6 kudos

Hi @Kaniz Fatma​ I selected answer from @Kedar Deshpande​ as the best answer.

  • 6 kudos
1 More Replies
829023
by Databricks Partner
  • 3966 Views
  • 2 replies
  • 0 kudos

Faced error using Databricks SQL Connector

I installed databricks-sql-connector in Pycharm.Then i run the query below based on docs.I refer this docs.(https://docs.databricks.com/dev-tools/python-sql-connector.html)==========================================from databricks import sqlimport osw...

  • 3966 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

It seems that one of your environment variables is incorrect. Please print them and compare them with the connection settings from the cluster or SQL warehouse endpoint.

  • 0 kudos
1 More Replies
ramankr48
by Databricks Partner
  • 51899 Views
  • 6 replies
  • 11 kudos

Resolved! how to find the size of a table in python or sql?

let's suppose there is a database db, inside that so many tables are there and , i want to get the size of tables . how to get in either sql, python, pyspark.even if i have to get one by one it's fine.

  • 51899 Views
  • 6 replies
  • 11 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 11 kudos

@Raman Gupta​ - could you please try the below %python spark.sql("describe detail delta-table-name").select("sizeInBytes").collect()

  • 11 kudos
5 More Replies
User16835756816
by Databricks Employee
  • 9668 Views
  • 1 replies
  • 6 kudos

How can I simplify my data ingestion by processing the data as it arrives in cloud storage?

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables. Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delt...

  • 9668 Views
  • 1 replies
  • 6 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 6 kudos

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delta...

  • 6 kudos
ricperelli
by New Contributor II
  • 3390 Views
  • 0 replies
  • 1 kudos

How can i save a parquet file using pandas with a data factory orchestrated notebook?

Hi guys,this is my first question, feel free to correct me if i'm doing something wrong.Anyway, i'm facing a really strange problem, i have a notebook in which i'm performing some pandas analysis, after that i save the resulting dataframe in a parque...

  • 3390 Views
  • 0 replies
  • 1 kudos
venkad
by Contributor
  • 2115 Views
  • 0 replies
  • 4 kudos

Default location for Schema/Database in Unity

Hello Bricksters,We organize the delta lake in multiple storage accounts. One storage account per data domain and one container per database. This helps us to isolate the resources and cost on the business domain level.Earlier, when a schema/database...

  • 2115 Views
  • 0 replies
  • 4 kudos
vizoso
by Databricks Partner
  • 2233 Views
  • 1 replies
  • 3 kudos

Cluster list in Microsoft.Azure.Databricks.Client fails because ClusterSource enum does not include MODELS. When you have a model serving cluster, Clu...

Cluster list in Microsoft.Azure.Databricks.Client fails because ClusterSource enum does not include MODELS.When you have a model serving cluster, ClustersApiClient.List method fails to deserialize the API response because that cluster has MODELS as C...

  • 2233 Views
  • 1 replies
  • 3 kudos
saurabh12521
by Databricks Partner
  • 5156 Views
  • 3 replies
  • 4 kudos

Unity through terraform

I am working on automation of Unity through terraform. I have referred below link link to get started :https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/unity-catalog-azureI am facing issue when I create metastore using...

image
  • 5156 Views
  • 3 replies
  • 4 kudos
Latest Reply
Pat
Esteemed Contributor
  • 4 kudos

Not sure if you got this working, but I noticed you are using provider: `databrickslabs/databricks`, hence why this is not avaialable. You should be using new provider: `databricks/databricks`: https://registry.terraform.io/providers/databricks/datab...

  • 4 kudos
2 More Replies
Labels