cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

LidorAbo
by New Contributor II
  • 3057 Views
  • 1 replies
  • 1 kudos

bucket ownership of s3 bucket in databricks

We had a databricks job that has strange behavior,when we passing 'output_path' to function saveAsTextFile and not output_path variable the data saved to the following path: s3://dev-databricks-hy1-rootbucket/nvirginiaprod/3219117805926709/output_pa...

s3
  • 3057 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Valued Contributor
  • 1 kudos

I suspect you provided a dbfs path to save the data hence the data saved under your workspace root bucket.For the workspace root bucket, databricks workspace will interact with databricks credential to make sure databricks has access to it and able t...

  • 1 kudos
qwerty1
by Contributor
  • 1006 Views
  • 1 replies
  • 0 kudos

Unable to create bloom filter index

I am unable to create bloom filter index on my tableCREATE BLOOMFILTER INDEX ON TABLE my_namespace.foo FOR COLUMNS (id OPTIONS (fpp = 0.1, numItems = 6000000))Gives the errorAnalysisException: Table `spark_catalog`.`my_namespace`.`foo` did not specif...

  • 1006 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, You can refer to https://issues.apache.org/jira/browse/SPARK-27617 for the above error. Please let us know if this helps, also please tag @Debayan​ with your next response which will notify me, Thank you!

  • 0 kudos
gg_047320_gg_94
by New Contributor II
  • 6113 Views
  • 1 replies
  • 1 kudos

DLT Spark readstream fails on the source table which is overwritten

I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons. df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('starting...

  • 6113 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 1 kudos

Hi, Could you please confirm DLT and DBR versions? Also please tag @Debayan​ with your next response which will notify me, Thank you!

  • 1 kudos
eyalo
by New Contributor II
  • 3573 Views
  • 6 replies
  • 0 kudos

Why the SFTP ingest doesn't work?

Hi, I did the following code but it seems like the cluster is running for a long period of time and then stops without any results. Attached my following code: (I used 'com.springml.spark.sftp' library and install it as Maven)Also i whitelisted my lo...

image
  • 3573 Views
  • 6 replies
  • 0 kudos
Latest Reply
eyalo
New Contributor II
  • 0 kudos

@Debayan Mukherjee​ Hi, I don't know if you got my reply so i am bouncing my message to you again.Thanks.

  • 0 kudos
5 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 2873 Views
  • 2 replies
  • 3 kudos

Resolved! Column is accessible after dropping the same column

Hi Today I have seen very Strang behavior of databricks.I have dropped one column from a dataframe and assigned the result to a new dataframe but I am able to use the dropped column in the filter command.In general scenario I should get an error but ...

image.png
  • 2873 Views
  • 2 replies
  • 3 kudos
Latest Reply
Sandeep
Contributor III
  • 3 kudos

@Ajay Pandey​ , this is a known behavior. Please refer this JIRA for details: https://issues.apache.org/jira/browse/SPARK-30421

  • 3 kudos
1 More Replies
KarenBT
by New Contributor III
  • 4346 Views
  • 15 replies
  • 4 kudos

Welcome 2023 Virtual hackathon participants, we're happy to have you! ✋  Please use this space to ask questions, we'll have some folks from Da...

Welcome 2023 Virtual hackathon participants, we're happy to have you! Please use this space to ask questions, we'll have some folks from Databricks and the community join to help out. We're really excited to see what you work on and if you have any ...

  • 4346 Views
  • 15 replies
  • 4 kudos
Latest Reply
sanggusti
New Contributor II
  • 4 kudos

Hi, I also have another question. Do we get any Databricks platform access for the period of hackathon? My company didn't use one and the trial is only 14 days. I'm pretty aware of the capability and since the hackathon are held by Databricks I think...

  • 4 kudos
14 More Replies
Michelle_-_Devp
by New Contributor III
  • 731 Views
  • 1 replies
  • 1 kudos

Resolved! How is brainstorming going?

Wondering if anyone is willing to share their project ideas here. It would be great to know how things are going and if anyone has a good open-source dataset they are willing to share.

  • 731 Views
  • 1 replies
  • 1 kudos
Latest Reply
bayang
New Contributor III
  • 1 kudos

Good, read their docs to get a lot of info to sharpen this hackathon

  • 1 kudos
IndihomeTV
by New Contributor
  • 752 Views
  • 1 replies
  • 0 kudos

Databricks to redash

We have an issued security in redash, if we used databrick as a connector to redash, Can you support us?https://www.databricks.com/blog/2020/06/24/welcoming-redash-to-databricks.html

  • 752 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Valued Contributor
  • 0 kudos

Hi @Probis Useetv​ Thank you for reaching out to us.Would you please elaborate your use case about the "issued security in redash" ?

  • 0 kudos
Ismail1
by New Contributor III
  • 1482 Views
  • 3 replies
  • 3 kudos

Resolved! Generating an Account console PAT token

I can't seem to find any documentation on generating an account console PAT token, Can anyone link me to it or guide me?

  • 1482 Views
  • 3 replies
  • 3 kudos
Latest Reply
fkseki
New Contributor III
  • 3 kudos

You can't create a Personal Access Token on account level to use REST APIs. If you want to use SCIM on account level, on the account console settings you'll find the user provisioning tab. In there you can generate de SCIM token. If you want to acces...

  • 3 kudos
2 More Replies
pantelis_mare
by Contributor III
  • 16140 Views
  • 31 replies
  • 15 kudos

Resolved! Repos configuration for Azure Service Principal

Hello community!I would like to update a repo from within my Azure DevOps release pipeline. In the pipeline I generate a token using a AAD Service Principal as recommended, and I setup the databricks api using that token.When I pass the databricks re...

  • 16140 Views
  • 31 replies
  • 15 kudos
Latest Reply
xiangzhu
Contributor II
  • 15 kudos

traditional PAT may have long lifespn, but the new SP feature uses an AAD token which should have a much shorter lifespqn, maybe around one hour, this could be a limiting factor.However, I haven't tested this yet, so these are merely hypotheses.​Neve...

  • 15 kudos
30 More Replies
Phani1
by Valued Contributor
  • 1797 Views
  • 2 replies
  • 1 kudos

Integration Dolly with Databricks

Hi Databricks Team,Could you please share any links /docs/Sample notebooks to integrate Dolly with Databricks, our aim is to generate SQL queries based on the free text and execute it via databricks cluster/SQL warehouse.

  • 1797 Views
  • 2 replies
  • 1 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 1 kudos

https://www.dbdemos.ai/demo.html?demoName=llm-dolly-chatbot is a good demonstration of Dolly (or really any LLM) for question answering. LLMs like this are not for SQL generation, but other LLMs are, like starcoderbase

  • 1 kudos
1 More Replies
sanjay
by Valued Contributor II
  • 1130 Views
  • 2 replies
  • 1 kudos

Resolved! How can I prioritize message in autoloader

Hi,I am using autoloader, it picks data from AWS S3 and stores in delta table. In case there are large number of messages, I like to process messages by priority. Is it possible to prioritize messages in autoloader.Regards,Sanjay

  • 1130 Views
  • 2 replies
  • 1 kudos
Latest Reply
sanjay
Valued Contributor II
  • 1 kudos

Thank you Sandeep. Other option is I can keep messages in 2 different folders in S3. Can autoloader read message from multiple folders

  • 1 kudos
1 More Replies
pauloquantile
by New Contributor III
  • 3254 Views
  • 8 replies
  • 0 kudos

Resolved! Disable scheduling of notebooks

Hi,We are wondering if it is possible to disable the possibility to disable scheduling of a notebook. A client wants to allow many analysts access to databricks, but a concern is the possibility of setting schedules (the fastest is every minute!). Is...

  • 3254 Views
  • 8 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Paulo Rijnberg​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
7 More Replies
deep_thought
by New Contributor III
  • 15186 Views
  • 16 replies
  • 9 kudos

Resolved! Schedule job to run sequentially after another job

Is there a way to schedule a job to run after some other job is complete?E.g. Schedule Job A, then upon it's completion run Job B.

  • 15186 Views
  • 16 replies
  • 9 kudos
Latest Reply
claytonseverson
New Contributor II
  • 9 kudos

Here is the User Guide for Jobs-as-Tasks - https://docs.google.com/document/d/1OJsc-g7IwAJjYooCp7T01Rxyt_xFkMPjmAAGdDGPkY4/edit#heading=h.oudvb5fyfd0n

  • 9 kudos
15 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels