cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

159312
by New Contributor III
  • 3790 Views
  • 3 replies
  • 0 kudos

When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data. No inference should be necessary. Is this right?

When trying to ingest parquet files with autoloader with the following codedf = (spark   .readStream   .format("cloudFiles")   .option("cloudfiles.format","parquet")   .load(filePath))I get the following error:java.lang.UnsupportedOperationException:...

  • 3790 Views
  • 3 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

Hi @Ben Bogart​ This is supported in DBR 11.1 and above.The below document suggests the same:https://docs.databricks.com/ingestion/auto-loader/schema.html#schema-inference-and-evolution-in-auto-loaderPlease try in DBR 11.1 and please let us know if y...

  • 0 kudos
2 More Replies
Zair
by New Contributor III
  • 2710 Views
  • 2 replies
  • 2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

  • 2710 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Databricks Employee
  • 2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

  • 2 kudos
1 More Replies
AJ270990
by Contributor II
  • 11236 Views
  • 3 replies
  • 4 kudos

Resolved! How to bold a text ?

I have searched several ways on applying a bold to a text however unable to achieve it.Have added '\033[1m' then my text and followed by '\033[0m', however cant see the text as bold.I need to apply Bold to the Header "Ocean" in below image which is i...

image
  • 11236 Views
  • 3 replies
  • 4 kudos
Latest Reply
AJ270990
Contributor II
  • 4 kudos

I have used plt.text() to make text bold

  • 4 kudos
2 More Replies
RaymondLC92
by New Contributor II
  • 3717 Views
  • 2 replies
  • 1 kudos

Resolved! How to obtain run_id without using dbutils in python?

We would like to be able to get the run_id in a job run and we have the unfortunate restriction that we cannot use dbutils, is there a way to get it in python?I know for Job ID it's possible to retrieve it from the environment variables.

  • 3717 Views
  • 2 replies
  • 1 kudos
Latest Reply
artsheiko
Databricks Employee
  • 1 kudos

Hi, please refer to the following thread : https://community.databricks.com/s/question/0D58Y00008pbkj9SAA/how-to-get-the-job-id-and-run-id-and-save-into-a-databaseHope this helps

  • 1 kudos
1 More Replies
Rahul_Samant
by Contributor
  • 8659 Views
  • 4 replies
  • 5 kudos

Resolved! High Concurrency Pass Through Cluster : pyarrow optimization not working while converting to pandasdf

i need to convert a spark dataframe to pandas dataframe with arrow optimization spark.conf.set("spark.sql.execution.arrow.enabled", "true")data_df=df.toPandas()but getting one of the below error randomly while doing so Exception: arrow is not support...

  • 8659 Views
  • 4 replies
  • 5 kudos
Latest Reply
AlexanderBij
New Contributor II
  • 5 kudos

Can you confirm this is a known issue?Running into same issue, example to test in 1 cell.# using Arrow fails on HighConcurrency-cluster with PassThrough in runtime 10.4 (and 10.5 and 11.0)   spark.conf.set("spark.sql.execution.arrow.pyspark.enabled",...

  • 5 kudos
3 More Replies
NickMendes
by New Contributor III
  • 3416 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks SQL duplicates alert e-mail

Hi everyone, I've been working in DBC SQL and creating few e-mails alerts and have noticed that when it triggers, e-mail notification is getting duplicated. I've been trying lots of testing in different situations, however, it keeps duplicating in my...

  • 3416 Views
  • 3 replies
  • 1 kudos
Latest Reply
NickMendes
New Contributor III
  • 1 kudos

After lots of testing, I've finally figured out one solution. I've changed "Notifications" settings to "When triggered, send notification At most every 1 day" and "Refresh" to "Refresh every 1 day". Now it is working perfectly.

  • 1 kudos
2 More Replies
harsha4u
by New Contributor II
  • 1488 Views
  • 1 replies
  • 2 kudos

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices aro...

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices around creating a right size driver/worker nodes?

  • 1488 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16766737456
Databricks Employee
  • 2 kudos

Autoscaling should help in sizing the clusters according to the workload. You may want to consider the recommendations here: https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-sizing-considerations

  • 2 kudos
Valentin1
by New Contributor III
  • 2764 Views
  • 3 replies
  • 0 kudos

docs.databricks.com

Is there an any example that contains at least one widget that works with the Databricks SQL Create Dashboard API? I tried the following simple dashboard:{ "name": "Delta View", "dashboard_filters_enabled": false, "widgets": [ { ...

  • 2764 Views
  • 3 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

@Valentin Rosca​ , Right now, Databricks also does not recommend creating new widgets via queries and dashboards API (https://docs.databricks.com/sql/api/queries-dashboards.html#operation/sql-analytics-create-dashboard).  Also, copying a dashboard fr...

  • 0 kudos
2 More Replies
Isaac_Low
by New Contributor II
  • 2825 Views
  • 2 replies
  • 3 kudos
  • 2825 Views
  • 2 replies
  • 3 kudos
Latest Reply
Isaac_Low
New Contributor II
  • 3 kudos

All good. I just imported the training material manually using the dbc link. Didn't need repos for that.

  • 3 kudos
1 More Replies
davidvb
by New Contributor II
  • 3092 Views
  • 2 replies
  • 1 kudos

I have a big problem creating a community account

It is impossible for me create a community account. I put my data on web and in the next step, when the website show me the 3 type of data ( google, amazn etc) and I click on the “ "Get started with community account" the web show me this  I have try...

problem
  • 3092 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @david vazquez​,It seems like the website was down due to maintenance. You can check the status page next time to check why the website is down https://status.databricks.com/

  • 1 kudos
1 More Replies
darshan
by New Contributor III
  • 2370 Views
  • 2 replies
  • 1 kudos

job init takes longer than notebook run

I am trying to understand why running a job takes longer than running the notebook manually.And if I try to run jobs concurrently using workflow or threads then is there a way to reduce job init time ?

  • 2370 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 1 kudos

Hi @darshan doshi​ , Jobs creates a job cluster in the backend before it starts the task execution and this cluster creation may take extra time when compared to running a notebook on a existing cluster.1) If you run a multi-task job, you could selec...

  • 1 kudos
1 More Replies
karthik_p
by Esteemed Contributor
  • 4156 Views
  • 3 replies
  • 9 kudos

Resolved! Unable to create Databricks workspace using Terraform on AWS

HI Team,we are using below workspace config scripts, when we try to create workspace previously from EC2 Instance, we are able to create Workspace without any issue. but when we are trying to run through Github actions, we are getting below errorErro...

  • 4156 Views
  • 3 replies
  • 9 kudos
Latest Reply
Prabakar
Databricks Employee
  • 9 kudos

@karthik p​ this can be fixed by setting timeout. Please check this https://kb.databricks.com/en_US/cloud/failed-credential-validation-checks-error-with-terraform

  • 9 kudos
2 More Replies
MrsBaker
by Databricks Employee
  • 1756 Views
  • 1 replies
  • 1 kudos

display() not updating after 1000 rows

Hello folks! I am calling display() on a streaming query sourced from a delta table. The output from display() displays the new rows added to the source table. But as soon as the output results hit 1000 rows, the output is not updated anymore. As a r...

  • 1756 Views
  • 1 replies
  • 1 kudos
Latest Reply
MrsBaker
Databricks Employee
  • 1 kudos

aggregate function followed by timestamp field sorted in descending order did the trick:streaming_df.groupBy("field1", "time_field").max("field2").orderBy(col("time_field").desc()).display()

  • 1 kudos
KateK
by New Contributor II
  • 2944 Views
  • 2 replies
  • 1 kudos

How do you correctly access the spark context in DLT pipelines?

I have some code that uses RDDs, and the sc.parallelize() and rdd.toDF() methods to get a dataframe back out. The code works in a regular notebook (and if I run the notebook as a job) but fails if I do the same thing in a DLT pipeline. The error mess...

  • 2944 Views
  • 2 replies
  • 1 kudos
Latest Reply
KateK
New Contributor II
  • 1 kudos

Thanks for your help Alex, I ended up re-writing my code with spark UDFs -- maybe there is a better solution with only the Dataframe API but I couldn't find it. To summarize my problem: I was trying to un-nest a large json blob (the fake data in my f...

  • 1 kudos
1 More Replies
palak231
by New Contributor
  • 2163 Views
  • 0 replies
  • 0 kudos

A/B Testing:-  A/B testing is the process of comparing two variations as well as two versions of same item and offer the best one in between both of t...

A/B Testing:-  A/B testing is the process of comparing two variations as well as two versions of same item and offer the best one in between both of them. Well before doing A/B testing you need to be focus on one problem that you want to resolve and ...

  • 2163 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels