cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Databricks_Venk
by New Contributor
  • 7110 Views
  • 1 replies
  • 0 kudos
  • 7110 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi there,My name is Piper, and I'm one of the moderators for Databricks. Thank you for coming to us with this. Let's give our members a chance to respond first, then we'll come back to see how things went.

  • 0 kudos
lsoewito
by New Contributor
  • 5443 Views
  • 1 replies
  • 1 kudos

How to configure Databricks Connect to 'Assume Role' when accessing file from an AWS S3 bucket?

I have a Databricks cluster configured with an instance profile to assume role when accessing an AWS S3 bucket.Accessing the bucket from the notebook using the cluster works properly (the instance profile can assume role to access the bucket).However...

  • 5443 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hello, @lsoewito​ - My name is Piper, and I'm a moderator for the Databricks community. Welcome and thank you for coming to us with your question. I'm sorry to hear that you're having trouble. Let's give your peers a chance to answer your question. W...

  • 1 kudos
Scouty
by New Contributor
  • 6206 Views
  • 2 replies
  • 3 kudos

Resolved! How to reset an autoloader?

Hii'm using an autoloader with Azure Databricks:df = (spark.readStream.format("cloudFiles")   .options(**cloudfile)   .load("abfss://dev@std******.dfs.core.windows.net/**/*****)) at my target checkpointLocation folder there are some files and subdirs...

  • 6206 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Aman Sehgal​ - My name is Piper, and I'm one of the moderators for Databricks. I wanted to jump in real quick to thank you for being so generous with your knowledge.

  • 3 kudos
1 More Replies
irfanaziz
by Contributor II
  • 7773 Views
  • 3 replies
  • 2 kudos

Resolved! Issue in reading parquet file in pyspark databricks.

One of the source systems generates from time to time a parquet file which is only 220kb in size.But reading it fails."java.io.IOException: Could not read or convert schema for file: 1-2022-00-51-56.parquetCaused by: org.apache.spark.sql.AnalysisExce...

  • 7773 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@nafri A​ - Howdy! My name is Piper, and I'm a community moderator for Databricks. Would you be happy to mark @Hubert Dudek​'s answer as best if it solved the problem? That will help other members find the answer more quickly. Thanks

  • 2 kudos
2 More Replies
SailajaB
by Valued Contributor III
  • 4131 Views
  • 3 replies
  • 5 kudos

Ways to validate final Dataframe schema against JSON schema config file

Hi Team,We have to validate transformed dataframe output schema with json schema config file.Here is the scenario Our input json schema and target json schema are different. Using Databricks we are doing the required schema changes. Now, we need to v...

  • 4131 Views
  • 3 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Sailaja B​ - Hi! My name is Piper, and I'm a moderator for the community. Thanks for your question. Please let us know how things go. If @welder martins​' response answers your question, would you be happy to come back and mark their answer as best?...

  • 5 kudos
2 More Replies
SailajaB
by Valued Contributor III
  • 7521 Views
  • 1 replies
  • 5 kudos

Resolved! Best practices for implementing Unit Test cases in databricks and Azure devops

Hello,Please suggest the best practices/ ways to implement the unit test cases in Databricks python to pass code coverage at Azure devops

  • 7521 Views
  • 1 replies
  • 5 kudos
Latest Reply
User16753725182
Contributor III
  • 5 kudos

Hi, the process is like traditional software development practices.Docs to refer: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops#unit-tests-in-azure-databricks-notebooksAzure DevOps Best Practices: https://docs.m...

  • 5 kudos
mayuri18kadam
by New Contributor II
  • 4688 Views
  • 3 replies
  • 0 kudos

Resolved! com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch

Hi, I am getting the following error:com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4...

  • 4688 Views
  • 3 replies
  • 0 kudos
Latest Reply
mayuri18kadam
New Contributor II
  • 0 kudos

yes, I can read from notebook with DBR 6.4, when I specify this path: wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18but the same using DBR 6.4 from spark-submit, it f...

  • 0 kudos
2 More Replies
Ian
by New Contributor III
  • 7374 Views
  • 4 replies
  • 0 kudos

Resolved! Databricks-Connect and Change Data Feed query error

I have installed Databricks-Connect (9.1 LTS). I am able to send queries to the cluster. However, when the query includes a call to the 'table_changes' function that is a part of Change Data Feed, I get the following error:AnalysisException("could ...

  • 7374 Views
  • 4 replies
  • 0 kudos
Latest Reply
Ian
New Contributor III
  • 0 kudos

Hi @Kaniz Fatma​ , the table_changes function is an internal Databricks function used in Change Data Feed (CDF).Please refer to the article below. It discusses the table_changes function.https://docs.databricks.com/delta/delta-change-data-feed.html

  • 0 kudos
3 More Replies
SailajaB
by Valued Contributor III
  • 3425 Views
  • 4 replies
  • 6 kudos

Resolved! how to create a nested(unflatten) json from flatten json

Hi ,Is there any function in pyspark which can convert flatten json to nested json.Ex : if we have attribute in flatten is like a_b_c : 23then in unflatten it should be{"a":{"b":{"c":23}}}Thank you

  • 3425 Views
  • 4 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

As @Chuck Connell​ said can you share more of your source json as that example is not json. Additionally flatten is usually to change something like {"status": {"A": 1,"B": 2}} to {"status.A": 1, "status.B": 2} which can be done easily with spark da...

  • 6 kudos
3 More Replies
irfanaziz
by Contributor II
  • 2790 Views
  • 3 replies
  • 1 kudos

Resolved! What is the difference between passing the schema in the options or using the .schema() function in pyspark for a csv file?

I have observed a very strange behavior with some of our integration pipelines. This week one of the csv files was getting broken when read with read function given below.def ReadCSV(files,schema_struct,header,delimiter,timestampformat,encode="utf8...

  • 2790 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @nafri A​ ,What is the error you are getting, can you share it please? Like @Hubert Dudek​ mentioned, both will call the same APIs

  • 1 kudos
2 More Replies
boomerangairpla
by New Contributor
  • 396 Views
  • 0 replies
  • 0 kudos

Liftndrift is a paper airplane blog, that helps the world to learn paper airplanes through easy and simple illustrated instructions, we are specialize...

Liftndrift is a paper airplane blog, that helps the world to learn paper airplanes through easy and simple illustrated instructions, we are specialized in teaching how to make a paper airplane. how to make the world record paper airplane

  • 396 Views
  • 0 replies
  • 0 kudos
wyzer
by Contributor II
  • 1920 Views
  • 2 replies
  • 1 kudos

Resolved! Are we using the advantage of "Map & Reduce" ?

Hello,We are new on Databricks and we would like to know if our working method are good.Currently, we are working like this :spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")With this method, are we us...

  • 1920 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Spark will handle the map/reduce for you.So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.You just care about what you want as a result.And afterwards when you are more...

  • 1 kudos
1 More Replies
Databricks_7045
by New Contributor III
  • 2419 Views
  • 3 replies
  • 0 kudos

Resolved! Encapsulate Databricks Pyspark/SparkSql code

Hi All ,I have Custom code ( Pyspark & SparkSQL) (notebooks) which I want to deploy at customer location and encapsulate so that end customers don't see the actual code. Currently we have all code in Notebooks (Pyspark/spark sql). Could you please l...

  • 2419 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

With notebooks that is not possible.You can write your code in scala/java and build a jar, which you then run with spark-submit.(example)Or use python and deploy a wheel.(example)This can become quite complex when you have dependencies.Also: a jar et...

  • 0 kudos
2 More Replies
ghiet
by New Contributor II
  • 3284 Views
  • 7 replies
  • 6 kudos

Resolved! Cannot sign up to Databricks community edition - CAPTCHA error

Hello. I cannot sign up to have access to the community edition. I always get an error message "CAPTCHA error... contact our sales team". I do not have this issue if I try to create a trial account for Databricks hosted on AWS. However, I do not have...

  • 3284 Views
  • 7 replies
  • 6 kudos
Latest Reply
joao_hoffmam
New Contributor III
  • 6 kudos

Hi @Guillaume Hiet​ ,I was facing the same issue. Try signing up using your mobile phone, it worked for me!

  • 6 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels