cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pshah83
by New Contributor II
  • 2056 Views
  • 0 replies
  • 2 kudos

Use output of SHOW PARTITION commands in Sub-Query/CTE/Function

I am using SHOW PARTITIONS <<table_name>> to get all the partitions of a table. I want to use max() on the output of this command to get the latest partition for the table.However, I am not able to use SHOW PARTITIONS <<table_name>> in a CTE/sub-quer...

  • 2056 Views
  • 0 replies
  • 2 kudos
christys
by Databricks Employee
  • 641 Views
  • 0 replies
  • 2 kudos

Want to influence the Databricks product roadmap and services?  We are looking for feedback from you - our Databricks Community members - to give your...

Want to influence the Databricks product roadmap and services? We are looking for feedback from you - our Databricks Community members - to give your feedback and thoughts about your experience with Databricks over the last 6 months in a ~10 minute s...

  • 641 Views
  • 0 replies
  • 2 kudos
FD_MR
by New Contributor II
  • 1356 Views
  • 0 replies
  • 1 kudos

Delta Live Tables executing repeatedly and returning empty DF

Still relatively new to Spark and even more so to Delta Live Tables so apologies if I've missed something fundamental but here goes.We are trying to run a notebook via Delta Live Tables, which contains 2 functions decorated by the `dlt.table` decorat...

  • 1356 Views
  • 0 replies
  • 1 kudos
Jack
by New Contributor II
  • 3655 Views
  • 1 replies
  • 0 kudos

Applying a formula to list of python dataframes produces error: object of type 'builtin_function_or_method' has no len(). How to fix?

I have a df where I am calculating values by month. When I run this code on my df it generates the desired results:for i in range(12,len(df.index)): df.iloc[i, 1] = df.iloc[i-12,1]*(((df.iloc[i,3]/100)+(df.iloc[i,6]/100))+1)So far so good. I want...

  • 3655 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Jack Homareau​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from y...

  • 0 kudos
Sri_H
by New Contributor III
  • 1809 Views
  • 2 replies
  • 1 kudos

Databricks Academy - Access to training recording attended during Data & AI Summit 2022

Hi All,I attended a 2 day ML training during the Data & AI 2022 summit and I received an email from the events team (ataaisummit@typeaevents.com) telling that the recordings for training and related material will be available in my Databricks Academy...

  • 1809 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sri H​ ! I am checking on this for you - hang tight! I'll try and get an update asap from the Academy Team.

  • 1 kudos
1 More Replies
AJ270990
by Contributor II
  • 18077 Views
  • 3 replies
  • 0 kudos

Resolved! I am getting ParseException: error while running the spark SQL query

I am using below code to create the Spark session and also loading the csv file. Spark session and loading csv is running well. However SQL query is generating the Parse Exception.%pythonfrom pyspark.sql import SparkSession     # Create a SparkSessio...

  • 18077 Views
  • 3 replies
  • 0 kudos
Latest Reply
AJ270990
Contributor II
  • 0 kudos

This is resolved. Below query works fine nowsqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '89...

  • 0 kudos
2 More Replies
Jhaji
by New Contributor
  • 825 Views
  • 0 replies
  • 0 kudos

The REFRESH TABLE command doesn't seem to invalidate the local cache. Am I missing something?

Hi Team,As part of "Data Enginering with Databricks" course section "DE 4.2 - Providing Options for External Sources", I can read total number of records of sales_csv table as 10510. The append command in Cmd17 is supposed to increase this number 2x,...

image
  • 825 Views
  • 0 replies
  • 0 kudos
159312
by New Contributor III
  • 2729 Views
  • 3 replies
  • 0 kudos

When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data. No inference should be necessary. Is this right?

When trying to ingest parquet files with autoloader with the following codedf = (spark   .readStream   .format("cloudFiles")   .option("cloudfiles.format","parquet")   .load(filePath))I get the following error:java.lang.UnsupportedOperationException:...

  • 2729 Views
  • 3 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

Hi @Ben Bogart​ This is supported in DBR 11.1 and above.The below document suggests the same:https://docs.databricks.com/ingestion/auto-loader/schema.html#schema-inference-and-evolution-in-auto-loaderPlease try in DBR 11.1 and please let us know if y...

  • 0 kudos
2 More Replies
Zair
by New Contributor III
  • 1961 Views
  • 2 replies
  • 2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

  • 1961 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Databricks Employee
  • 2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

  • 2 kudos
1 More Replies
AJ270990
by Contributor II
  • 9516 Views
  • 3 replies
  • 4 kudos

Resolved! How to bold a text ?

I have searched several ways on applying a bold to a text however unable to achieve it.Have added '\033[1m' then my text and followed by '\033[0m', however cant see the text as bold.I need to apply Bold to the Header "Ocean" in below image which is i...

image
  • 9516 Views
  • 3 replies
  • 4 kudos
Latest Reply
AJ270990
Contributor II
  • 4 kudos

I have used plt.text() to make text bold

  • 4 kudos
2 More Replies
RaymondLC92
by New Contributor II
  • 2283 Views
  • 2 replies
  • 1 kudos

Resolved! How to obtain run_id without using dbutils in python?

We would like to be able to get the run_id in a job run and we have the unfortunate restriction that we cannot use dbutils, is there a way to get it in python?I know for Job ID it's possible to retrieve it from the environment variables.

  • 2283 Views
  • 2 replies
  • 1 kudos
Latest Reply
artsheiko
Databricks Employee
  • 1 kudos

Hi, please refer to the following thread : https://community.databricks.com/s/question/0D58Y00008pbkj9SAA/how-to-get-the-job-id-and-run-id-and-save-into-a-databaseHope this helps

  • 1 kudos
1 More Replies
Rahul_Samant
by Contributor
  • 6926 Views
  • 4 replies
  • 5 kudos

Resolved! High Concurrency Pass Through Cluster : pyarrow optimization not working while converting to pandasdf

i need to convert a spark dataframe to pandas dataframe with arrow optimization spark.conf.set("spark.sql.execution.arrow.enabled", "true")data_df=df.toPandas()but getting one of the below error randomly while doing so Exception: arrow is not support...

  • 6926 Views
  • 4 replies
  • 5 kudos
Latest Reply
AlexanderBij
New Contributor II
  • 5 kudos

Can you confirm this is a known issue?Running into same issue, example to test in 1 cell.# using Arrow fails on HighConcurrency-cluster with PassThrough in runtime 10.4 (and 10.5 and 11.0)   spark.conf.set("spark.sql.execution.arrow.pyspark.enabled",...

  • 5 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels