cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cmilligan
by Contributor II
  • 3262 Views
  • 3 replies
  • 3 kudos

Dropdown for parameters in a job

I want to be able to denote the type of run from a predetermined list of values that a user can choose from when kicking off a run using different parameters. Our team does standardized job runs on a weekly cadence but can have timeframes that change...

  • 3262 Views
  • 3 replies
  • 3 kudos
Latest Reply
dev56
New Contributor II
  • 3 kudos

Hi @cmilligan , I have a similar requirement and would really be grateful if you could provide me with any information on how to fix this issue. Thanks a lot!

  • 3 kudos
2 More Replies
mbejarano89
by New Contributor III
  • 2294 Views
  • 2 replies
  • 0 kudos

Running a K-means (.fit) gives error:Params must be either a param map or a list/tuple of param maps but got %s." % type(params)

 am running a k-means algorithm. My feature are DoubleType and have no nulls, but I get : raise TypeError("Params must be either a param map or a list/tuple of param maps but got %s." % type(params). Anyone have any idea how to solve this?File /datab...

  • 2294 Views
  • 2 replies
  • 0 kudos
Latest Reply
mbejarano89
New Contributor III
  • 0 kudos

I found the answer just by trying several things, although I do not understand exactly what the problem was. All I had to do was to cache the input data before fitting the model:assemble=VectorAssembler(inputCols=columns_input, outputCol='features')...

  • 0 kudos
1 More Replies
kll
by New Contributor III
  • 8787 Views
  • 2 replies
  • 3 kudos

Nested struct type not supported pyspark error

I am attempting to apply a function to a pyspark DataFrame and save the API response to a new column and then parse using `json_normalize`. This works fine in pandas, however, I run into an exception with `pyspark`.  import pyspark.pandas as ps   i...

  • 8787 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Keval Shah​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 3 kudos
1 More Replies
PraveenC
by New Contributor II
  • 3018 Views
  • 4 replies
  • 3 kudos

[Databricks][JDBC](10400) Invalid type for data - column: 10, type: Array

Getting below error while mapping an Array Column to String[] entity. Please suggest if Databricks JDBC support entity mapping of Array Values [Worked the same code for below config - H2 DB version - 2.1.214 and org.hibernate.dialect.H2Dialect - ...

  • 3018 Views
  • 4 replies
  • 3 kudos
Latest Reply
Atanu
Databricks Employee
  • 3 kudos

Hello @Emmanuel Trindade​  @Praveen C​  This does not look like coming from Databricks end. Look at the error thread.javax.persistence.PersistenceException: org.hibernate.exception.DataException: Could not read entity state from ResultSet : EntityKey...

  • 3 kudos
3 More Replies
Philearner
by New Contributor II
  • 2517 Views
  • 3 replies
  • 3 kudos

Unable to find input by typing input in the Multiselect Widget

In the AWS databricks widgets.multiselect, I'm unable to find input by typing input in the mulitselect bar. It was working before. Although I can find the inputs by scrolling down the list, it's annoying if the list is long.​​Here's my script:measlis...

databrick widget problem databrick widget problem 2
  • 2517 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Philip Teu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
2 More Replies
MShee
by New Contributor II
  • 1627 Views
  • 1 replies
  • 1 kudos
  • 1627 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hello @M Shee​ ,In a drop down you can select a value from a list of provided values, not type the values in. What you might be interested in is a combobox - It is combination of text and dropdown. It allows to select a value from a provided list or ...

  • 1 kudos
jonathan-dufaul
by Valued Contributor
  • 2026 Views
  • 2 replies
  • 2 kudos

Resolved! Does anyone have a single example of a graphframe with two+ types of vertices? (e.g. user and post, not user to user)

I have gone through about 75 pages and every single example has only relationships from one type of object to the same type of object. about 90% have the exact same example of "Alice Bob" "friends."Has anyone ever made a graphframe with two types of ...

  • 2026 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

I feel your pain,I once tried to use graphframes to flatten a complex tree, ended up using graphX (which is even worse to use but at least it is more flexible).So maybe take a look at graphX? Beware, it is terrible to use.I wonder what happened to m...

  • 2 kudos
1 More Replies
Panna
by New Contributor II
  • 1773 Views
  • 1 replies
  • 3 kudos

Is there only one element type option for an array?

I'm creating an array which contains both string and double, just wondering if I can have multiple element type options for one array column? Thanks

  • 1773 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Elements of any type that share a least common type can be used, https://docs.databricks.com/sql/language-manual/functions/array.html#arguments.Please correct me if I misunderstood to understand the requirement.

  • 3 kudos
danny_edm
by New Contributor
  • 694 Views
  • 0 replies
  • 0 kudos

collect_set wired result when Proton enable

Cluster : DBR 10.4 LTS with protonSample schemaseq_no (decimal)type (string)Sample dataseq_no type1 A1 A2 A2 B2 Bcommand : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))...

  • 694 Views
  • 0 replies
  • 0 kudos
cmotla
by New Contributor III
  • 2203 Views
  • 1 replies
  • 7 kudos

Issue with complex json based data frame select

We are getting the below error when trying to select the nested columns (string type in a struct) even though we don't have more than a 1000 records in the data frame. The schema is very complex and has few columns as struct type and few as array typ...

  • 2203 Views
  • 1 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

Please share your code and some example of data.

  • 7 kudos
NAS
by New Contributor III
  • 1794 Views
  • 1 replies
  • 1 kudos

Resolved! "import pandas as pd" => [Errno 5]

When I type import pandas as pdfrom a Notebook in a Repo I get:--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/lib/python3.8/importlib/_boots...

  • 1794 Views
  • 1 replies
  • 1 kudos
Latest Reply
NAS
New Contributor III
  • 1 kudos

Thanks to Elliott Hertz, I found out that the ML Experiments cannot be stored in the repo. After I moved them to my Workspace everything seems to work.

  • 1 kudos
frank26364
by New Contributor III
  • 12545 Views
  • 4 replies
  • 0 kudos

Resolved! Command prompt won't let me type the Databricks token

Hi, I am trying to set up Databricks CLI using the command prompt on my computer. I downloaded the Python 3.9 app and successfully ran the command pip install databricks-cliWhen I try to set up the Databricks token, I am able to type my Databricks Ho...

  • 12545 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there! You're on a roll today! Thanks for letting us know.

  • 0 kudos
3 More Replies
AzureDatabricks
by New Contributor III
  • 8736 Views
  • 7 replies
  • 2 kudos

Resolved! Can we store 300 million records and what is the preferable compute type and config?

How we can persist 300 million records? What is the best option to persist data databricks hive metastore/Azure storage/Delta table?What is the limitations we have for deltatables of databricks in terms of data?We have usecase where testers should be...

  • 8736 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

You can certainly store 300 million records without any problem.The best option kinda depends on the use case. If you want to do a lot of online querying on the table, I suggest using delta lake, which is optimeized (using z-order, bloom filter, par...

  • 2 kudos
6 More Replies
sravan_enukonda
by New Contributor II
  • 2747 Views
  • 1 replies
  • 2 kudos

Resolved! I am looking for best practices in implementing Ranger type of Access control in Databricks ?

Need this to do auditing and numbers of users accessing databases and tables created in databricks

  • 2747 Views
  • 1 replies
  • 2 kudos
Latest Reply
garren_staubli
New Contributor III
  • 2 kudos

Hi Sravan, Apache Ranger is commonly used for fine-grained access controls. In your description, it sounds like you might be able to leverage Databricks audit logs, which would allow you to see user-level actions: https://docs.databricks.com/administ...

  • 2 kudos
Anonymous
by Not applicable
  • 1155 Views
  • 1 replies
  • 0 kudos
  • 1155 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

For Delta in general having Delta cache accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote locatio...

  • 0 kudos
Labels