cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

athjain
by New Contributor III
  • 3859 Views
  • 5 replies
  • 7 kudos

Resolved! How to query deltatables stored in s3 through databricks SQL Endpoint?

the delta tables after ETL are stored in s3 in csv or parquet format, so now question is how to allow databricks sql endpoint to run query over s3 saved files

  • 3859 Views
  • 5 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hey @Athlestan Jain​ How are you doing?Thanks for posting your question. Do you think you were able to resolve the issue?We'd love to hear from you.

  • 7 kudos
4 More Replies
_Orc
by New Contributor
  • 2070 Views
  • 2 replies
  • 1 kudos

Resolved! Checkpoint is getting created even the though the microbatch append has failed

Use caseRead data from source table using structured spark streaming(Round the clock).Apply transformation logic etc etc and finally merge the dataframe in the target table.If there is any failure during transformation or merge ,databricks job should...

  • 2070 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Om Singh​ Hope you are doing well. Just wanted to check in and see if you were able to find a solution to your question?Cheers

  • 1 kudos
1 More Replies
Databricks_7045
by New Contributor III
  • 2790 Views
  • 2 replies
  • 4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

  • 2790 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rajesh Vinukonda​ Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

  • 4 kudos
1 More Replies
RRO
by Contributor
  • 24998 Views
  • 7 replies
  • 7 kudos

Resolved! Performance for pyspark dataframe is very slow after using a @pandas_udf

Hello,I am currently working on a time series forecasting with FBProphet. Since I have data with many time series groups (~3000) I use a @pandas_udf to parallelize the training. @pandas_udf(schema, PandasUDFType.GROUPED_MAP) def forecast_netprofit(pr...

  • 24998 Views
  • 7 replies
  • 7 kudos
Latest Reply
RRO
Contributor
  • 7 kudos

Thank you for the answers. Unfortunately this did not solve the performance issue.What I did now is I saved the results into a table:results.write.mode("overwrite").saveAsTable("db.results") This is probably not the best solution but after I do that ...

  • 7 kudos
6 More Replies
sarvesh242
by Contributor
  • 1299 Views
  • 2 replies
  • 2 kudos

Resolved! java.lang.NoSuchMethodError in databricks

I have created a package. Now I am calling a method from this package in my notebook but it is throwing me java.lang.NoSuchMethodError in databricks. The method exists in the package. Can you please guide me regarding the same.Thanks!

  • 1299 Views
  • 2 replies
  • 2 kudos
Latest Reply
sarvesh242
Contributor
  • 2 kudos

Hi! I am sharing the error stack with you. I can't share the code with you due to confidentiality of the code. Can you please guide me ?java.lang.NoSuchMethodError: com.iig.utils.common.IIGCommonConstants$.flowProperties()Ljava/lang/String; at com.ii...

  • 2 kudos
1 More Replies
DavideCagnoni
by Contributor
  • 3549 Views
  • 8 replies
  • 3 kudos

Resolved! How to force pandas_on_spark plots to use all dataframe data?

When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...

  • 3549 Views
  • 8 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Davide Cagnoni​ , The Ideas Portal lets you influence the Databricks product roadmap by providing feedback directly to the product team. Use the Ideas Portal to:Enter feature requests.View, comment, and vote up other users’ requests.Monitor the p...

  • 3 kudos
7 More Replies
BeginnerBob
by New Contributor III
  • 3699 Views
  • 4 replies
  • 4 kudos

Resolved! Bronze silver gold layers

Is there a best practise guide on setting up the delta lake for these 3 layers. ​I'm looking for document or scripts to run that will assist me.

  • 3699 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @Lloyd Vickery​ , Did you find any of the above answers helpful? Would you like to tell us if you solved it?

  • 4 kudos
3 More Replies
michaelh
by New Contributor III
  • 3548 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks runtime from docker hub image

Hello,are databricks runtimes from docker hub ( https://hub.docker.com/r/databricksruntime/standard ) same as actual runtimes inside Databricks? I mean when we made our own docker image from databricksruntime/standard will be there same dependencies...

image.png image
  • 3548 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @michael henzl​ ,Just checking if you still need help with this or not any more? please let us know

  • 2 kudos
2 More Replies
Sandesh87
by New Contributor III
  • 2077 Views
  • 2 replies
  • 2 kudos

Resolved! create a dataframe with all the responses from the api requests within foreachPartition

I am trying to execute an api call to get an object(json) from amazon s3 and I am using foreachPartition to execute multiple calls in paralleldf.rdd.foreachPartition(partition => { //Initialize list buffer var buffer_accounts1 = new ListBuffer[St...

  • 2077 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @Sandesh Puligundla​ ,Thank you for sharing the solution. We will mark it as "best" response so, in the future is another user has the same question, they will be able to find the solution right away.

  • 2 kudos
1 More Replies
Constantine
by Contributor III
  • 1195 Views
  • 2 replies
  • 3 kudos

Resolved! Can't view files of different types in databricks

I am reading a Kafka input using Spark Streaming on databricks and trying to deserialize it. The input is in the form of thrift. I want to create a file of .thrift format to provide schema but am unable to do it. Even if I create the file locally and...

  • 1195 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Hi @John Constantine​ ,Just checking if you still need help or not anymore. If you do, please share as much details and logs as possible, so we would be able to help better.

  • 3 kudos
1 More Replies
KKo
by Contributor III
  • 1380 Views
  • 3 replies
  • 7 kudos

Resolved! ETL in Databricks

I use Azure Databricks for ETL. I read/write data from and to raw/stage/curate folders. I write dataframe to a path (eg: /mnt/datalake/curated/....). In final step I read data from the path, convert that to dataframe and write it to the Azure SQL DB/...

  • 1380 Views
  • 3 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Moderator
  • 7 kudos

Hi @Kris Koirala​ ,Just checking if you still have any follow-up questions? please let us know.

  • 7 kudos
2 More Replies
Jreco
by Contributor
  • 2604 Views
  • 4 replies
  • 1 kudos

Resolved! Method iterableAsScalaIterable does not exist Pydeequ

Hello,I'm using Databricks and pydeequ to build a QA step in structured streaming.One of the Analyzers that I need to use is the Uniqueness.If I try to add another one like Completeness, work properly, but if y add the Uniqueness I get an error:py4j....

155309688-d4d03acc-1012-42ec-8d40-9cbf4b8d12c3.png 155311239-2259d89e-e2b2-45c1-b57c-1a841ebe189e 155309988-fd6ec25f-53ec-4f7a-a37a-e3596cefe10e
  • 2604 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I think it is because you did not attach the libraries to the cluster.When you work with a notebook, the sparksession is already created.To add libraries, you should install them on the cluster (in the compute tab) using f.e. pypi/maven etc.

  • 1 kudos
3 More Replies
wgsing
by New Contributor
  • 2496 Views
  • 4 replies
  • 0 kudos

Resolved! Databricks Cluster create fail

i facing the problem here in creating cluster in databricks. Error as below :MessageCluster terminated.Reason:Unexpected launch failureAn unexpected error was encountered while setting up the cluster. Please retry and contact Databricks if the proble...

  • 2496 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Giin Sing Wong​ ,Just a friendly follow-up. Is this issue still happening or you were able to resolve it by increasing your account's quota? Please let us know.

  • 0 kudos
3 More Replies
keunsoop
by New Contributor
  • 46501 Views
  • 8 replies
  • 2 kudos

Resolved! Run stored bash in Databricks with %sh

Hi, I made bash file in databricks and I can see that the file is stored as the following picture. I was supposed to run this bash file through %sh cell, but as you see the following picture, I could not find bash file, which I could find through d...

0693f000007OoILAA0 0693f000007OoIMAA0
  • 46501 Views
  • 8 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @keunsoop​ ,Are you able to run your code using an init script? I would like to share some docs in case you might have some questions https://docs.databricks.com/clusters/init-scripts.html

  • 2 kudos
7 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels