cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Harun
by Honored Contributor
  • 9284 Views
  • 2 replies
  • 0 kudos

Issue with Pyspark GroupBy GroupedData

Hi Guys,I am working on streaming data movement from bronze to silver. My bronze table is having a entity_name column, based on the entity_name column i need to create multiple silver tables.I tried the below approach, But it is failing with error "'...

  • 9284 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Harun Raseed Basheer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

  • 0 kudos
1 More Replies
Rsa
by New Contributor II
  • 5093 Views
  • 2 replies
  • 2 kudos

Resolved! Error while using Array_contains function in left join condition

'Item_id' is column in array format like ["ba1b-5fbe1547ddd5", "88f9-ac3b93334f69", "8bba-4075a47eb814"] in table1 and table2 has column Id with single value like ba1b-5fbe1547ddd5.While join two table select table1.*,table2.*from table1left join tab...

  • 5093 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Rishabh Shanker​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
1 More Replies
Direo
by Contributor
  • 2786 Views
  • 2 replies
  • 1 kudos

Resolved! How does pyspark work in these two scenarios?

I have two scenarios with different outcomes:Scenario 1:from pyspark.sql.functions import *# create sample dataframesdf1 = spark.createDataFrame([(1, 2, 3), (2, 3, 4)], ["a", "b", "c"])df2 = spark.createDataFrame([(1, 5, 6, 7), (2, 8, 9, 10)], ["a", ...

  • 2786 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Direo Direo​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 1 kudos
1 More Replies
MerelyPerfect
by New Contributor II
  • 3586 Views
  • 3 replies
  • 1 kudos

read base64 json column with Autoloader and inferschema.

I have json files falling in our blob with two fields, 1. offset(integer), 2. value(base64).This value column is json with unicode. so they sent it as base64. Challenge is this json is very large with 100+ fields. so we cannot define the schema. We c...

  • 3586 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @MerelyPerfect Per​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 1 kudos
2 More Replies
Mado
by Valued Contributor II
  • 9167 Views
  • 1 replies
  • 1 kudos

Resolved! How to get today's date in the local time zone?

I am trying to get today's date in the local time zone:from pyspark.sql.functions import * date = to_date(from_utc_timestamp(current_timestamp(), 'Australia/Melbourne'))What I get using the above code is a column object. How can I get its value in a...

image
  • 9167 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hemant
Valued Contributor II
  • 1 kudos

Hi @Mohammad Saber​ , you can use pytz and datetime python package for your usecase,, attaching code snippet in below screen shot. 

  • 1 kudos
maymay1993
by New Contributor II
  • 1662 Views
  • 2 replies
  • 2 kudos
  • 1662 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @may may​ Your input matters! Help our community thrive by coming back and marking the most helpful and accurate answers. Together, we can make a difference!Regards

  • 2 kudos
1 More Replies
juned
by New Contributor III
  • 2640 Views
  • 2 replies
  • 1 kudos

How install a library that is under the /Workspace/Shared/ directory using the init.sh script in a cluster?

I would like to install a library that is under the /Workspace/Shared/ directory using the init.sh script in a cluster. How to access the /Workspace/Shared/ folder in shell? This page only shows how to access manually but doesn't show how to access i...

  • 2640 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Juned Mala​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
Mr__D
by New Contributor II
  • 6974 Views
  • 2 replies
  • 3 kudos

Do we really need Autoloader for batch processing.?

Hi All,It seem AutoLoader is good option for even driven data ingestion but if my job runs only once , do I still need autoloader ? I dont want to spend money to spin a cluster whole day.I know we have RunOnce option available while running a job but...

  • 6974 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Deepak Bhatt​ Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!Thanks and regards

  • 3 kudos
1 More Replies
Jerry01
by New Contributor III
  • 6984 Views
  • 2 replies
  • 2 kudos

Is ABAC feature enabled?

Can anyone please share me the example of how it works in terms of access controls?

  • 6984 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Naveena G​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 2 kudos
1 More Replies
EDDatabricks
by Contributor
  • 1809 Views
  • 2 replies
  • 3 kudos

DLT pipeline slow streaming (root cause needs to be identified)

Dear support,we have the following situation where a set of DLT pipelines are streaming with very low rate incoming data and we need to find the root cause of this delay.In order to provide more insight about the setup of the DLT pipelines and some m...

  • 1809 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @EDDatabricks EDDatabricks​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that ...

  • 3 kudos
1 More Replies
shiva12494
by New Contributor II
  • 4360 Views
  • 2 replies
  • 2 kudos

Issue with reading exported tables stored in parquet

Hi All, I am exported all tables from postgres snapshot into S3 in parquet format. I am trying to read the table using databricks and i am unable to do so. I get the following error: "Unable to infer schema for Parquet. It must be specified manually....

  • 4360 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @shiva charan velichala​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that bes...

  • 2 kudos
1 More Replies
mortenhaga
by Contributor
  • 8616 Views
  • 8 replies
  • 10 kudos

Resolved! New strange error on Runtime 12 and above: java.lang.AssertionError: assertion failed

Hi allI struggle to find out why this error message suddenly pops up after running a cell in a notebook. The notebook is trying to run a simple "INSERT INTO" command in SQL. When I only do a SELECT clause, the cell runs without error. Also, I only ge...

  • 8616 Views
  • 8 replies
  • 10 kudos
Latest Reply
entongshen__Dat
New Contributor III
  • 10 kudos

Thanks for reporting! We have identified a defect with an early version of DBR 12 related to INSERT INTO .. SELECT when certain query patterns are involved. The defect has since been fixed. Please let us know if you have any additional questions.

  • 10 kudos
7 More Replies
Chhaya
by New Contributor III
  • 2888 Views
  • 4 replies
  • 2 kudos

DLT PIPELINE RUN STATUS

Hi Everyone,Is there a way to find out DLT pipeline run status like if pipeline failed or succeeded ?I'm looking to have report which shows pipeline run info, expectation info ( I was able to get this from event log ) .

  • 2888 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Chhaya Vishwakarma​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fe...

  • 2 kudos
3 More Replies
agagrins
by New Contributor III
  • 9140 Views
  • 17 replies
  • 0 kudos

Running `pyspark` with `databricks-connect`

Hiya,I'm trying to run `pyspark` with `databricks-connect==11.30.b0`, but am failing.The trace I see is``` File "/home/agagrins/databricks9/lib/python3.9/site-packages/py4j/java_gateway.py", line 1321, in __call__  return_value = get_return_value( Fi...

  • 9140 Views
  • 17 replies
  • 0 kudos
Latest Reply
ryojikn
New Contributor III
  • 0 kudos

How to make it work in a cluster with Unity Catalog enabled?

  • 0 kudos
16 More Replies
Mado
by Valued Contributor II
  • 1211 Views
  • 2 replies
  • 2 kudos

Can I use a cluster created in Data Science & Engineering persona to run SQL commands in the SQL persona?

Hi,I have created a single-node cluster in Data Science & Engineering persona (Standard_DS3_v2). I don't have enough vCPU to create a SQL warehouse. Is there any way I can use the cluster to run a query in SQL persona?

  • 1211 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rajeev45
Databricks Employee
  • 2 kudos

Hi MadoYes, you can use cluster and run sql query in the notebook, please refer the following page for more details. https://docs.databricks.com/getting-started/quick-start.html#tutorial-query-data-with-notebookshttps://docs.databricks.com/getting-st...

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels